Saturday, February 17, 2018

Oracle Table Diffing with the Mwrynn PL/SQL Table API


After about two years, I just pushed to github an update to the Mwrynn PL/SQL Table API. I've added the basics of a new feature I've been thinking about for a while -- table diffing! Too many times I've wanted to compare two tables automatically, to discover what differs between the two. I plan to add diffing utils for columns, indexes, constraints and finally, the data itself. For now I've got column-based diffs working (edit: and data diffing!). Hooray!

If you'd like to try it out for yourself, go to https://github.com/mwrynn/plsql-table and download away!

Below is a small demo...

CREATE TABLE test1(a VARCHAR2(2) PRIMARY KEY, b NUMBER, c BLOB, e DATE);
CREATE TABLE test2(b NUMBER, a VARCHAR2(2) PRIMARY KEY, c VARCHAR2(20), d TIMESTAMP);

As a human being, let's compare these two tables.
1) Columns a and b exist in both tables, are the same types, but in different order.
2) Column c exists in both tables, but it's a a BLOB in test1, and a VARCHAR(20) in test2. 
3) Column e only exists in test1.
4) Column d only exists in test2.

Now let's run the test (yeah it sets a bad example to use the SYS schema...):

DECLARE
  test1 table_obj;
  test2 table_obj;
  result INT;
BEGIN
  test1 := table_obj('TEST1','SYS',NULL);
  test2 := table_obj('TEST2','SYS',NULL);
  
  result := test1.diff(test2);
END;
/

Now we query a special results table that is automatically created called DIFF_RESULTS, and voila, we can see the diffs!

"ID" "DIFF_RAN_ON" "TABLE1" "TABLE2" "DIFF_TYPE" "RESULT"

1 17-FEB-18 03.56.56.181538000 PM "SYS.TEST1" "SYS.TEST2" "C" "SYS.TEST1.A has column_id=1 but SYS.TEST2.A has column_id=2;SYS.TEST1.B has column_id=2 but SYS.TEST2.B has column_id=1;SYS.TEST1.C has data_type=BLOB but SYS.TEST2.C has data_type=VARCHAR2, SYS.TEST1.C has data_length=4000 but SYS.TEST2.C has data_length=20;column SYS.TEST1.E not found in SYS.TEST2, column SYS.TEST2.D not found in SYS.TEST1"

As you can see, the RESULT column aligns with the differences that I noted as a human. w00t! Additionally, it tells us the difference in data_length for column C -- 4000 for the CLOB and 20 for the VARCHAR2(20).

NEXT STEP -- I'm most excited about comparing the data between two tables, so I think I will work on that next...Stay tuned!

EDIT: I've now added support for data diffing as well! When compare_data is set to true, DIFF_RESULTS will now provide the counts of rows in one table but not the other, and vice versa. An unfortunate downside is that Large Object (LOBs) are not supported by set operators, and the data diffing requires the MINUS set operator. So I have to say, at least for now, that LOBs are not supported...

Example of data diffing:

CREATE TABLE test1(a VARCHAR2(2) PRIMARY KEY, b NUMBER, c VARCHAR2(4000), e DATE);
CREATE TABLE test1a(a VARCHAR2(2) PRIMARY KEY, b NUMBER, c VARCHAR2(4000), e DATE);

/* populate tables */
INSERT INTO test1 VALUES ('ab', 3, 'NRIOSANFSDF', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1 VALUES ('cd', 4, 'SDAF', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1 VALUES ('ef', 5, 'MBKDB', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1 VALUES ('gh', 6, '001100101', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1 VALUES ('ij', 5, '001100101', to_date('2018-02-16', 'yyyy-mm-dd'));

INSERT INTO test1a VALUES ('ab', 3, 'NRIOSANFSDF', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1a VALUES ('cd', 4, 'SDAF', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1a VALUES ('ef', 5, 'MBKDB', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1a VALUES ('gh', 6, '001100101', to_date('2018-02-17', 'yyyy-mm-dd'));

INSERT INTO test1a VALUES ('ij', 4, '001100101', to_date('2018-02-16', 'yyyy-mm-dd'));

DECLARE
  test1 table_obj;
  test1a table_obj;
  result INT;
BEGIN
  test1 := table_obj('TEST1','SYS',NULL);
  test1a := table_obj('TEST1A','SYS',NULL);
  
  result := test1.diff(test1a, compare_columns => false, compare_data => true);
END;

/

And now the results...

"ID" "DIFF_RAN_ON" "TABLE1" "TABLE2" "DIFF_TYPE" "RESULT"
5 17-FEB-18 09.46.52.608436000 PM "SYS.TEST1" "SYS.TEST1A" "D" "1 rows in SYS.TEST1 not found in SYS.TEST1A;1 rows in SYS.TEST1A not found in SYS.TEST1"

That's all for now! Any questions/comments are welcome!

No comments: