Saturday, April 30, 2016

Comparing adding a column with default values between MySQL and Oracle, or, "Adding a new column is sloooow :("

In this post, I want to talk a little about something that seems to baffle some developers from time to time. It baffles us in the sense of, "I can't believe this simple operation is slow, what the hell is going on?" Happened today in fact with MySQL. That simple operation is nothing more than adding a column to a table! At least, adding column to a table that's full of data, perhaps millions of rows. Why should a column take so long to add? Well, it doesn't ALWAYS - it depends on whether your database has optimized column adding, and how much data your table has. Let's look into it a little...

MySQL

In MySQL land, here's what happens:
mysql> CREATE TABLE t (a INT, b INT); 
Query OK, 0 rows affected (0.03 sec)
mysql> insert into t SELECT rand(), rand() FROM INFORMATION_SCHEMA.COLUMNS c1 
CROSS JOIN INFORMATION_SCHEMA.COLUMNS C2;
Query OK, 3825936 rows affected, 2 warnings (10.99 sec)
Records: 3825936  Duplicates: 0  Warnings: 2
mysql> ALTER TABLE t ADD COLUMN c INT DEFAULT NULL;
Query OK, 3825936 rows affected (12.77 sec)
Records: 3825936  Duplicates: 0  Warnings: 0
12.8 seconds might not seem so bad, but what if this were a busy production table? What if there were 10 times as many rows? Quickly, just to prove that the time does increase linearly with number of rows, let's double the table size then try again:
mysql> INSERT INTO t SELECT * FROM t; Query OK, 3825936 rows affected (14.38 sec) Records: 3825936 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE t ADD COLUMN d INT DEFAULT NULL; Query OK, 7651872 rows affected (25.80 sec) Records: 7651872 Duplicates: 0 Warnings: 0
Yup, double the size, double the time. [Note: MySQL version is 5.5 -- in this version an alter table like the above will recreate the entire table!]

Oracle

In Oracle land, as usual they're ahead of the curve and have a solution for version 12c. Let's quote Mr. Kyte's description of this feature from http://www.oracle.com/technetwork/issue-archive/2013/13-sep/o53asktom-1999186.html:
More Online Operations: Better Column Addition. In Oracle Database 11g you were able to perform a fast add of a column to a table if it had a default value and was defined as NOT NULL. (Arup Nanda has written about this at bit.ly/16tQNCh.) However, if you attempted to add a column with a default value and that column permitted null values, the ADD COLUMN operation could take a significant amount of time, generate a large amount of undo and redo, and lock the entire table for the duration of the operation. In Oracle Database 12c, that time, volume, and locking are no longer part of the process.
To demonstrate this, I copy ALL_OBJECTS into a table and measure its space—in blocks and bytes—using the show_space utility, posted on asktom.oracle.com
SQL> create table t
  2  as
  3  select *
  4    from all_objects;
Table created.

SQL> exec show_space('T')
…
Full Blocks        ....        1,437
Total Blocks...........        1,536
Total Bytes............   12,582,912
Total MBytes...........           12
…

PL/SQL procedure successfully completed.

Now I add a column to table T, and this column will have a large default value. Because the column I’m adding is a CHAR(2000), it will always consume the full 2,000 bytes, given that the CHAR type is always blank-padded and fixed-width. Table T has more than 87,000 records, so adding a column would typically take a significant amount of time, but as you can see, the addition is practically instantaneous in Oracle Database 12c
SQL> set timing on
SQL> alter table t 
add (data char(2000) default 'x');
Table altered.
Elapsed: 00:00:00.07

I perform the identical operation in Oracle Database 11g and observe the following timing: 
SQL> set timing on
SQL> alter table t 
add (data char(2000) default 'x');
Table altered.
Elapsed: 00:00:28.59

Clearly, that’s a significant difference in runtimes.
What magic is going on under Oracle's hood? How can it get away with a near instant column addition while MySQL (and others) need to write the new default value n times. I don't precisely know what Oracle is doing, but I can offer a conjecture. What would I do to solve the problem? I would store the default once as a property of the table, and any time the table is queried, if the column value is missing, just kind of inject the default (which we have ready to go in memory at the time) in place. I found some insight in this article by Mohamed Houri at Oracle:
11.2.0.3.0> alter table t1 add C_DDL number default 42 not null;

Table altered.
Elapsed: 00:00:00.04
The C_DDL column has been added instantaneously in 11gR2 database while it took almost 49 seconds in 10gR2. What is this new mechanism that allows such an extremely rapid execution time when adding a not null column with default value to an existing table?
How could 3 million of rows be updated in 4 milliseconds?
Let’s verify visually if the update has been really done (from now and on when the Oracle version is not specified it will then refer to 11.0.2.3)
SQL> select count(1) from t1;

   COUNT(1)
  ----------
   3000000

SQL> select count(1) from t1 where c_ddl = 42;

   COUNT(1)
  ----------
   3000000 


Although Oracle has altered the t1 table instantaneously, the query is showing that the whole bunch of C_DDL column has been updated with their default value set to 42. How could this be possible? Will the execution plan be of any help here?
SQL> select * from table(dbms_xplan.display_cursor);

---------------------------------------------------------------------------
| Id | Operation        | Name | Rows | Bytes | Cost (%CPU)| Time          |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT  |      |      |       | 3016 (100) |               |
| 1 | SORT AGGREGATE    |      |    1 |     3 |            |               |
|*2 | TABLE ACCESS FULL | T1   | 2999K|  8788K| 3016   (5) | 00:00:10      |
---------------------------------------------------------------------------

 Predicate Information (identified by operation id):
 ---------------------------------------------------
     2 - filter(NVL("C_DDL",42)=42) 


Notice here again how the predicate part of the above execution plan can reveal vital information when trying to understand what is happening behind the scene. Despite I haven’t used the NVL function in my query, this one appears in the predicate part indicating that, internally, Oracle is still considering the C_DDL column as to be potentially able to contain null values (which means it has not been updated) and, as such, Oracle is replacing it with its default value 42.
So there you have it- it does pretty much what I guessed. But that's an 11g example that requires a NOT NULL constraint.

So what about when the default is 42 but NULLs are allowed? Can it optimize this situation? According to the Tom Kyte quote above, the answer appears to be yes. But how does it distinguish a "real" null with simply a non-set value due to the ADD COLUMN optimization? I don't have the answer, but perhaps Oracle writes an explicit NULL when you do something like "UPDATE t SET data=NULL" as opposed to when it is simply not set. NULL is different from missing. The Mohamed Houri article continues, with respect to default values on columns that allow NULLs:

11.2.0.3.0> alter table t1 add C_DDL_2 number default 84;

Table altered.

Elapsed: 00:00:58.25

12c> alter table t1 add C_DDL_2 number default 84;

Elapsed: 00:00:00.02


While adding the nullable C_DDL_2 column took 58 seconds to be honored in 11gR2 it has been instantaneously done in 12c.
This is a clear demonstration that in Oracle Database 12c, DDL optimization has been extended to include null columns having default values. Indeed, when you query t1 table to get the distinct values of the newly added column (C_DDL_2) you will realize that the entire table rows have seen their metadata (default value 84) updated as shown via the following query:
12c> select c_ddl_2, count(1) from t1 group by c_ddl_2;

   C_DDL_2    COUNT(1)
   -------   ----------
    84        3000000


SQL> select count(1) from t1 where c_ddl_2=84;

  COUNT(1)
 ----------
  3000000

SQL> select * from table(dbms_xplan.display_cursor);

---------------------------------------------------------------------------
| Id | Operation        | Name | Rows | Bytes | Cost (%CPU)| Time         |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT  |      |      |       | 3803 (100) |              |
| 1 | SORT AGGREGATE    |      |    1 |    13 |            |              |
|* 2| TABLE ACCESS FULL | T1   | 3538K|    43M|   3803 (1) |  00:00:01    |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
    2 - filter(DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00006$",0)),NULL,NVL("
               C_DDL_2",84),'0',NVL("C_DDL_2",84),'1',"C_DDL_2")=84)
[Mwrynn: WHOA, what is this funkiness??]
Note
-----
- dynamic statistics used: dynamic sampling (level=2) 


However, in order to ensure DDL optimization for null columns with default value, things became more complex than it used to be for not null columns in the preceding release. We went from a simple implicit use of the NVL function to a complex and exotic predicate part involving SYS_OP_VECBIT Oracle non documented function and a new internal column SYS_NC00006$ in order to honor the default value since this one has not been physically updated.

One more thought courtesy of my buddy Kevin. What happens if you change the default? So you have:

  • At time=0, table t has columns a,b with a million rows
  • At time=1, table t has columns a,b,c with c defaulting to 42. Therefore the million new c fields are set to 42 (though optimized)
  • At time=2, table t column c is changed to default to 41. Now the optimization would be broken, since all the values from time=1 should have 42. Does Oracle at this point decide to "fill in" all the 42s for existing rows?? If that's the case, this step should be slow. I think this calls for an experiment! Stay tuned...

No comments: