Saturday, April 30, 2016

Comparing adding a column with default values between MySQL and Oracle, or, "Adding a new column is sloooow :("

In this post, I want to talk a little about something that seems to baffle some developers from time to time. It baffles us in the sense of, "I can't believe this simple operation is slow, what the hell is going on?" Happened today in fact with MySQL. That simple operation is nothing more than adding a column to a table! At least, adding column to a table that's full of data, perhaps millions of rows. Why should a column take so long to add? Well, it doesn't ALWAYS - it depends on whether your database has optimized column adding, and how much data your table has. Let's look into it a little...

MySQL

In MySQL land, here's what happens:
mysql> CREATE TABLE t (a INT, b INT); 
Query OK, 0 rows affected (0.03 sec)
mysql> insert into t SELECT rand(), rand() FROM INFORMATION_SCHEMA.COLUMNS c1 
CROSS JOIN INFORMATION_SCHEMA.COLUMNS C2;
Query OK, 3825936 rows affected, 2 warnings (10.99 sec)
Records: 3825936  Duplicates: 0  Warnings: 2
mysql> ALTER TABLE t ADD COLUMN c INT DEFAULT NULL;
Query OK, 3825936 rows affected (12.77 sec)
Records: 3825936  Duplicates: 0  Warnings: 0
12.8 seconds might not seem so bad, but what if this were a busy production table? What if there were 10 times as many rows? Quickly, just to prove that the time does increase linearly with number of rows, let's double the table size then try again:
mysql> INSERT INTO t SELECT * FROM t; Query OK, 3825936 rows affected (14.38 sec) Records: 3825936 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE t ADD COLUMN d INT DEFAULT NULL; Query OK, 7651872 rows affected (25.80 sec) Records: 7651872 Duplicates: 0 Warnings: 0
Yup, double the size, double the time. [Note: MySQL version is 5.5 -- in this version an alter table like the above will recreate the entire table!]

Oracle

In Oracle land, as usual they're ahead of the curve and have a solution for version 12c. Let's quote Mr. Kyte's description of this feature from http://www.oracle.com/technetwork/issue-archive/2013/13-sep/o53asktom-1999186.html:
More Online Operations: Better Column Addition. In Oracle Database 11g you were able to perform a fast add of a column to a table if it had a default value and was defined as NOT NULL. (Arup Nanda has written about this at bit.ly/16tQNCh.) However, if you attempted to add a column with a default value and that column permitted null values, the ADD COLUMN operation could take a significant amount of time, generate a large amount of undo and redo, and lock the entire table for the duration of the operation. In Oracle Database 12c, that time, volume, and locking are no longer part of the process.
To demonstrate this, I copy ALL_OBJECTS into a table and measure its space—in blocks and bytes—using the show_space utility, posted on asktom.oracle.com
SQL> create table t
  2  as
  3  select *
  4    from all_objects;
Table created.

SQL> exec show_space('T')
…
Full Blocks        ....        1,437
Total Blocks...........        1,536
Total Bytes............   12,582,912
Total MBytes...........           12
…

PL/SQL procedure successfully completed.

Now I add a column to table T, and this column will have a large default value. Because the column I’m adding is a CHAR(2000), it will always consume the full 2,000 bytes, given that the CHAR type is always blank-padded and fixed-width. Table T has more than 87,000 records, so adding a column would typically take a significant amount of time, but as you can see, the addition is practically instantaneous in Oracle Database 12c
SQL> set timing on
SQL> alter table t 
add (data char(2000) default 'x');
Table altered.
Elapsed: 00:00:00.07

I perform the identical operation in Oracle Database 11g and observe the following timing: 
SQL> set timing on
SQL> alter table t 
add (data char(2000) default 'x');
Table altered.
Elapsed: 00:00:28.59

Clearly, that’s a significant difference in runtimes.
What magic is going on under Oracle's hood? How can it get away with a near instant column addition while MySQL (and others) need to write the new default value n times. I don't precisely know what Oracle is doing, but I can offer a conjecture. What would I do to solve the problem? I would store the default once as a property of the table, and any time the table is queried, if the column value is missing, just kind of inject the default (which we have ready to go in memory at the time) in place. I found some insight in this article by Mohamed Houri at Oracle:
11.2.0.3.0> alter table t1 add C_DDL number default 42 not null;

Table altered.
Elapsed: 00:00:00.04
The C_DDL column has been added instantaneously in 11gR2 database while it took almost 49 seconds in 10gR2. What is this new mechanism that allows such an extremely rapid execution time when adding a not null column with default value to an existing table?
How could 3 million of rows be updated in 4 milliseconds?
Let’s verify visually if the update has been really done (from now and on when the Oracle version is not specified it will then refer to 11.0.2.3)
SQL> select count(1) from t1;

   COUNT(1)
  ----------
   3000000

SQL> select count(1) from t1 where c_ddl = 42;

   COUNT(1)
  ----------
   3000000 


Although Oracle has altered the t1 table instantaneously, the query is showing that the whole bunch of C_DDL column has been updated with their default value set to 42. How could this be possible? Will the execution plan be of any help here?
SQL> select * from table(dbms_xplan.display_cursor);

---------------------------------------------------------------------------
| Id | Operation        | Name | Rows | Bytes | Cost (%CPU)| Time          |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT  |      |      |       | 3016 (100) |               |
| 1 | SORT AGGREGATE    |      |    1 |     3 |            |               |
|*2 | TABLE ACCESS FULL | T1   | 2999K|  8788K| 3016   (5) | 00:00:10      |
---------------------------------------------------------------------------

 Predicate Information (identified by operation id):
 ---------------------------------------------------
     2 - filter(NVL("C_DDL",42)=42) 


Notice here again how the predicate part of the above execution plan can reveal vital information when trying to understand what is happening behind the scene. Despite I haven’t used the NVL function in my query, this one appears in the predicate part indicating that, internally, Oracle is still considering the C_DDL column as to be potentially able to contain null values (which means it has not been updated) and, as such, Oracle is replacing it with its default value 42.
So there you have it- it does pretty much what I guessed. But that's an 11g example that requires a NOT NULL constraint.

So what about when the default is 42 but NULLs are allowed? Can it optimize this situation? According to the Tom Kyte quote above, the answer appears to be yes. But how does it distinguish a "real" null with simply a non-set value due to the ADD COLUMN optimization? I don't have the answer, but perhaps Oracle writes an explicit NULL when you do something like "UPDATE t SET data=NULL" as opposed to when it is simply not set. NULL is different from missing. The Mohamed Houri article continues, with respect to default values on columns that allow NULLs:

11.2.0.3.0> alter table t1 add C_DDL_2 number default 84;

Table altered.

Elapsed: 00:00:58.25

12c> alter table t1 add C_DDL_2 number default 84;

Elapsed: 00:00:00.02


While adding the nullable C_DDL_2 column took 58 seconds to be honored in 11gR2 it has been instantaneously done in 12c.
This is a clear demonstration that in Oracle Database 12c, DDL optimization has been extended to include null columns having default values. Indeed, when you query t1 table to get the distinct values of the newly added column (C_DDL_2) you will realize that the entire table rows have seen their metadata (default value 84) updated as shown via the following query:
12c> select c_ddl_2, count(1) from t1 group by c_ddl_2;

   C_DDL_2    COUNT(1)
   -------   ----------
    84        3000000


SQL> select count(1) from t1 where c_ddl_2=84;

  COUNT(1)
 ----------
  3000000

SQL> select * from table(dbms_xplan.display_cursor);

---------------------------------------------------------------------------
| Id | Operation        | Name | Rows | Bytes | Cost (%CPU)| Time         |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT  |      |      |       | 3803 (100) |              |
| 1 | SORT AGGREGATE    |      |    1 |    13 |            |              |
|* 2| TABLE ACCESS FULL | T1   | 3538K|    43M|   3803 (1) |  00:00:01    |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
    2 - filter(DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00006$",0)),NULL,NVL("
               C_DDL_2",84),'0',NVL("C_DDL_2",84),'1',"C_DDL_2")=84)
[Mwrynn: WHOA, what is this funkiness??]
Note
-----
- dynamic statistics used: dynamic sampling (level=2) 


However, in order to ensure DDL optimization for null columns with default value, things became more complex than it used to be for not null columns in the preceding release. We went from a simple implicit use of the NVL function to a complex and exotic predicate part involving SYS_OP_VECBIT Oracle non documented function and a new internal column SYS_NC00006$ in order to honor the default value since this one has not been physically updated.

One more thought courtesy of my buddy Kevin. What happens if you change the default? So you have:

  • At time=0, table t has columns a,b with a million rows
  • At time=1, table t has columns a,b,c with c defaulting to 42. Therefore the million new c fields are set to 42 (though optimized)
  • At time=2, table t column c is changed to default to 41. Now the optimization would be broken, since all the values from time=1 should have 42. Does Oracle at this point decide to "fill in" all the 42s for existing rows?? If that's the case, this step should be slow. I think this calls for an experiment! Stay tuned...

Monday, April 04, 2016

Common Table Expressions: Postgres vs. Oracle vs. MySQL


The Gist of CTEs

SQL Common Table Expressions (CTEs) are a neat way to enhance the readability of your SQL queries. They can help you manage complexity by avoiding copied/pasted subqueries, and they remove the (what I feel is often an imagined) need for temporary tables to store temporary results for another query to work with. CTEs were defined in ANSI/ISO Standard SQL-99, so if your SQL database does not support CTEs, that means that like your favorite hair band, they still haven't caught up with the 90s.

Quick Examples of CTEs

The idea is you have this kind of ugliness:

SELECT ...
FROM (SELECT ... WHERE ...) AS subqry
WHERE ...

Maybe it's even multiple subquery levels deep:

SELECT ...
FROM (SELECT ... FROM (SELECT ... WHERE ...) AS inner_subqry WHERE ...) AS outer_subqry
WHERE ...

Hard to write, hard to edit, hard to look at. A CTE is simply the use of the "WITH" clause to separate that subquery from the main body, substituting the subquery with the assigned alias.

WITH subqry AS (SELECT ... WHERE ...)
SELECT ... FROM subqry WHERE ...

Another use case: If you ever find yourself saying "Hmm, if I put THIS resultset into a temporary table, then I can query it later..." Like so:

--in general, DON'T DO THE FOLLOWING if you can help it.
CREATE TABLE helper_table AS SELECT ... FROM ... WHERE ... ;
SELECT some_stuff
FROM ... helper_table 
...;
--maybe you use it multiple times, to avoid recomputing helper_table's query!
SELECT some_other_stuff
FROM ... helper_table 
...;
DROP TABLE helper_table;

^^^The above makes me sad. :( You're using another table when you don't need to. It's messier. It's slower. It's unnecessary DDL (and on most databases, DDL necessitates a commit, don't forget), or even if you avoid the CREATE TABLE by keeping helper_table table alive all the time (maybe you delete/insert with every usage), you're creating unnecessary maintenance; unnecessary transaction logging. It's more complex. And if you have CTEs there's (rarely) any good reason to do it.

The Postgres Optimization Fence

Unfortunately, in Postgres there is an infamous downside to the otherwise lovely CTE: The Optimization Fence. The Postgres planner will, upon seeing your CTE query, decide to resolve that WITH part first, then take the results of that, and work it into the "main" part of the query. So behind the curtain, it's actually ALMOST doing something like the "CREATE TABLE helper_table" example above -- resolve that query first, store the results in memory, use that data when it is asked for.

Here is the mention of this problem from the docs:
“…that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary sub-query. The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)”
Ok! Let's observe by comparing two examples:

First I made a table called person_location that has 50,000,000 rows -- to represent a million people residing in each of the 50 states in the US.

1) CTE to get the number of people per state
mwrynn=# EXPLAIN ANALYZE
WITH peeps_by_state AS 
  (SELECT COUNT(*) AS cnt, state FROM person_location GROUP BY state)
SELECT cnt
FROM peeps_by_state
WHERE state=13;
                                                                 QUERY PLAN                                                                  
--------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on peeps_by_state  (cost=971241.21..971245.71 rows=1 width=8) (actual time=28868.625..28868.645 rows=1 loops=1
   Filter: (state = 13)
   Rows Removed by Filter: 49
   CTE peeps_by_state
     ->  HashAggregate  (cost=971239.21..971241.21 rows=200 width=4) (actual time=28868.592..28868.603 rows=50 loops=1)
           Group Key: person_location.stat
           ->  Seq Scan on person_location  (cost=0.00..721239.14 rows=50000014 width=4) (actual time=0.016..9207.389 rows=50000000 loops=1)
Planning time: 0.168 ms
Execution time: 28868.783 ms
(9 rows)

2) replace that reference to the CTE with a subquery:

mwrynn=# EXPLAIN ANALYZE
SELECT cnt
FROM (SELECT COUNT(*) AS cnt, state FROM person_location GROUP BY state) peeps_by_state
WHERE state=13;
                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on peeps_by_state  (cost=0.00..851089.22 rows=1 width=8) (actual time=9925.626..9925.627 rows=1 loops=1)
   ->  GroupAggregate  (cost=0.00..851089.21 rows=1 width=4) (actual time=9925.624..9925.624 rows=1 loops=1)
         Group Key: person_location.state
         ->  Seq Scan on person_location  (cost=0.00..846239.20 rows=970000 width=4) (actual time=0.029..9750.655 rows=1000000 loops=1)
               Filter: (state = 13)
               Rows Removed by Filter: 49000000

 Planning time: 0.124 ms
 Execution time: 9925.680 ms
(8 rows)

Unfortunately, the uglier version of the query will be planned much better, so, sadly, Postgres gives you a performance incentive to use the ugly one. In this simple example above, it may not seem like much of a big deal to write the ugly one, but in a larger, more complex example, the ugly factor could be much more extreme.

In the above example, though, you could just apply the same filter to the CTE and probably get good results. But this is something extra to think about, something extra to deal with, and shouldn't be necessary in an ideal world. So let's try that out (looks good):

3) add the filter to the CTE:
mwrynn=# EXPLAIN ANALYZE WITH peeps_by_state AS (SELECT COUNT(*) AS cnt, state FROM person_location WHERE state=13 GROUP BY state)                              SELECT cnt                                                                      FROM peeps_by_state                                                             WHERE state=13;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on peeps_by_state  (cost=851089.21..851089.23 rows=1 width=8) (actual time=9895.863..9895.864 rows=1 loops=1)
   Filter: (state = 13)
   CTE peeps_by_state
     ->  GroupAggregate  (cost=0.00..851089.21 rows=1 width=4) (actual time=9895.851..9895.852 rows=1 loops=1)
           Group Key: person_location.state
           ->  Seq Scan on person_location  (cost=0.00..846239.20 rows=970000 width=4) (actual time=0.028..9722.994 rows=1000000 loops=1)
                 Filter: (state = 13)
                 Rows Removed by Filter: 49000000
 Planning time: 0.125 ms
 Execution time: 9895.923 ms

(10 rows)

Don't Worry, Oracle's Got This

Oracle, on the other hand, is quite a bit smarter about CTEs. It can "push down" the CTE and analyze the query as a whole. In other words, Oracle's query optimizer views the two queries as the same, and therefore they both resolve to the same (better) query plan. Good stuff. Let's observe...

--QRY 1
WITH peeps_by_state AS (SELECT COUNT(*) AS cnt, state FROM person_location GROUP BY state)
SELECT cnt
FROM peeps_by_state
WHERE state=13;




--QRY 2
SELECT cnt
FROM (SELECT COUNT(*) AS cnt, state FROM person_location GROUP BY state) peeps_by_state
WHERE state=13;


Ta-da - same plan either way. Same time to execute either way. This is what we want.

(Note that the fact that the Oracle queries took longer than Postgres doesn't mean anything. I'm running Postgres on my Macbook Pro, and Oracle is running on the tiniest RDS instance Amazon will let me use. :))

MySQL's CTE Implementation

Finally, let's check out how well MySQL (InnoDB) handles CTEs:


(Still hasn't caught up with the 90s.)