Opinions, experiences and information about SQL and the databases that process it, from a guy who's been at it for over 20 years.
Sunday, December 23, 2018
Oracle Oddity
Why do some of the USER_* views such as USER_CONSTRAINTS include the OWNER column, but USER_TAB_COLUMNS does not? The world may never know...
Friday, November 23, 2018
Wednesday, October 10, 2018
Lukas Eder on Why SQL Bind Variables are Important for Performance
https://blog.jooq.org/2018/04/12/why-sql-bind-variables-are-important-for-performance/#comment-245993
Sunday, September 30, 2018
Oracle SQL PIVOT and UNPIVOT: The Complete Guide
Read all about it here! https://www.databasestar.com/oracle-sql-pivot/
Wednesday, September 19, 2018
Friday, September 14, 2018
Yet another performance problem solved...OR...Yet another weird MySQL quirk of lameness
SELECT * FROM MYTABLE WHERE EXT_ID IN (11111, 22222, 33333, 44444);
I did a little bit of digging and found the following:
- EXT_ID was actually a VARCHAR.
- There was indeed an index on EXT_ID
- EXPLAIN showed that the INDEX was not planned to be used
- The table had about 15 million rows
- The column was high cardinality (unique or very close to it, but no unique constraint as we could not guarantee true uniqueness)
- The query took over 20 seconds to run
"Hmm," I thought, "Why wouldn't we quote these VARCHARs? Maybe that has to do with it..."
So I added quotes, and voila - the query went from taking over 20 seconds to about .04 seconds. A speed improvement of something like 50,000%!
For comparisons of a string column with a number, MySQL cannot use an index on the column to look up the value quickly. Ifstr_col
is an indexed string column, the index cannot be used when performing the lookup in the following statement:SELECT * FROM tbl_name WHERE str_col=1;
The reason for this is that there are many different strings that may convert to the value1
, such as'1'
,' 1'
, or'1a'
.
So there you have it folks. If you're looking up by a VARCHAR column, even if it contains numbers, please quote your lookup value. BETTER YET: DON'T DO THIS AND BIND YOUR VARIABLES. This means if you're using JDBC (as one example), form your query as "WHERE str_col=?", use a PreparedStatement, and set the parameter appropriately. The developer who originally wrote this code (in Java) was just concatenating the list of integers (that should've been strings/varchars) and forgot the quotes.
Binding your variables is one of the most important things you can do, and saves you from so many headaches in countless ways. It saves you from having to quote your VARCHARs, it saves you from having to escape quotes and other special characters, it prevents SQL injection, and it may be critical to your database's health (I know this is true with Oracle, not sure about others.) Just make sure you're calling the PreparedStatement's setString() method and not setInt(). ;)
#usebindvariables
Friday, August 24, 2018
Lukas Eder's (of JOOQ fame) look into join elimination
https://blog.jooq.org/2017/09/01/join-elimination-an-essential-optimiser-feature-for-advanced-sql-usage/
Tuesday, March 06, 2018
MySQL Character Sets and Collations Demystified
Monday, February 19, 2018
Retrosheet/Oracle Project
SQL Developer view of the partitions
I once read a quote from a big data company CEO saying that for the sport of basketball, you could not calculate player metrics in close to real time with an old fart relational database - you needed his super-duper, high-scale big data infrastructure to do anything like that. Now I don't know much about basketball, so for all I know he's right, but I think baseball would give us a close enough comparison...
So my next mission is to aggregate this data in the form of materialized views, and get some player metrics such as batting average per year and of all time. I'm going to try to take advantage of the FAST REFRESH feature of materialized views to calculate these metrics in close to real time. Imagine data trickling into the PLAY table, and almost instantly we can update the player's batting average, and even tell you the rank of his batting average not only for the year, but for all time. I'll see if it can be done!
Saturday, February 17, 2018
Oracle Table Diffing with the Mwrynn PL/SQL Table API
"ID" "DIFF_RAN_ON" "TABLE1" "TABLE2" "DIFF_TYPE" "RESULT"
1 17-FEB-18 03.56.56.181538000 PM "SYS.TEST1" "SYS.TEST2" "C" "SYS.TEST1.A has column_id=1 but SYS.TEST2.A has column_id=2;SYS.TEST1.B has column_id=2 but SYS.TEST2.B has column_id=1;SYS.TEST1.C has data_type=BLOB but SYS.TEST2.C has data_type=VARCHAR2, SYS.TEST1.C has data_length=4000 but SYS.TEST2.C has data_length=20;column SYS.TEST1.E not found in SYS.TEST2, column SYS.TEST2.D not found in SYS.TEST1"
NEXT STEP -- I'm most excited about comparing the data between two tables, so I think I will work on that next...Stay tuned!
EDIT: I've now added support for data diffing as well! When compare_data is set to true, DIFF_RESULTS will now provide the counts of rows in one table but not the other, and vice versa. An unfortunate downside is that Large Object (LOBs) are not supported by set operators, and the data diffing requires the MINUS set operator. So I have to say, at least for now, that LOBs are not supported...
Example of data diffing:
CREATE TABLE test1(a VARCHAR2(2) PRIMARY KEY, b NUMBER, c VARCHAR2(4000), e DATE);
CREATE TABLE test1a(a VARCHAR2(2) PRIMARY KEY, b NUMBER, c VARCHAR2(4000), e DATE);
/* populate tables */
INSERT INTO test1 VALUES ('ab', 3, 'NRIOSANFSDF', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1 VALUES ('cd', 4, 'SDAF', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1 VALUES ('ef', 5, 'MBKDB', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1 VALUES ('gh', 6, '001100101', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1 VALUES ('ij', 5, '001100101', to_date('2018-02-16', 'yyyy-mm-dd'));
INSERT INTO test1a VALUES ('ab', 3, 'NRIOSANFSDF', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1a VALUES ('cd', 4, 'SDAF', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1a VALUES ('ef', 5, 'MBKDB', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1a VALUES ('gh', 6, '001100101', to_date('2018-02-17', 'yyyy-mm-dd'));
INSERT INTO test1a VALUES ('ij', 4, '001100101', to_date('2018-02-16', 'yyyy-mm-dd'));
DECLARE
test1 table_obj;
test1a table_obj;
result INT;
BEGIN
test1 := table_obj('TEST1','SYS',NULL);
test1a := table_obj('TEST1A','SYS',NULL);
result := test1.diff(test1a, compare_columns => false, compare_data => true);
END;
/