Sunday, September 30, 2018

Friday, September 14, 2018

Yet another performance problem solved...OR...Yet another weird MySQL quirk of lameness

My coworker recently came up to me asking me why his index wasn't being used for a particular query (this is on MySQL) - it was just looking up by some IDs and the query was so slow, causing our application's UI to perform poorly! He gave me a sample query. It looked something like this:

SELECT * FROM MYTABLE WHERE EXT_ID IN (11111, 22222, 33333, 44444);

I did a little bit of digging and found the following:
  • EXT_ID was actually a VARCHAR.
  • There was indeed an index on EXT_ID
  • EXPLAIN showed that the INDEX was not planned to be used
  • The table had about 15 million rows
  • The column was high cardinality (unique or very close to it, but no unique constraint as we could not guarantee true uniqueness)
  • The query took over 20 seconds to run
Seems like a straightforward case - no fancy joins, subqueries or any other complicating factors. Just a look up by a high cardinality column. So why wasn't the index being used?

"Hmm," I thought, "Why wouldn't we quote these VARCHARs? Maybe that has to do with it..."

So I added quotes, and voila - the query went from taking over 20 seconds to about .04 seconds. A speed improvement of something like 50,000%!

I sighed and commented that MySQL's planner was buggy, but it turns out this is not a bug! It is documented behavior. From the MySQL 8.0 documentation:
For comparisons of a string column with a number, MySQL cannot use an index on the column to look up the value quickly. If str_col is an indexed string column, the index cannot be used when performing the lookup in the following statement:
SELECT * FROM tbl_name WHERE str_col=1;
The reason for this is that there are many different strings that may convert to the value 1, such as '1'' 1', or '1a'.

So there you have it folks. If you're looking up by a VARCHAR column, even if it contains numbers, please quote your lookup value. BETTER YET: DON'T DO THIS AND BIND YOUR VARIABLES. This means if you're using JDBC (as one example), form your query as "WHERE str_col=?", use a PreparedStatement, and set the parameter appropriately. The developer who originally wrote this code (in Java) was just concatenating the list of integers (that should've been strings/varchars) and forgot the quotes.

Binding your variables is one of the most important things you can do, and saves you from so many headaches in countless ways. It saves you from having to quote your VARCHARs, it saves you from having to escape quotes and other special characters, it prevents SQL injection, and it may be critical to your database's health (I know this is true with Oracle, not sure about others.) Just make sure you're calling the PreparedStatement's setString() method and not setInt(). ;)

#usebindvariables