Mark W the SQL Geek: December 2024

Friday, December 06, 2024

Evil MySQL Feature: REPLACE INTO

Yes, my friends, it's time for another EVIL MYSQL FEATURE!

Often used as an “upsert” (insert if not exists, else update), REPLACE INTO is another non-standard, MySQL-specific feature that makes one ask the question, “What were they thinking??”

Fortunately, once again, we have INSERT INTO...ON DUPLICATE KEY UPDATE which, while slightly less snappy to type, is pretty much always the better solution. But the developer has to know to use this statement and to toss REPLACE INTO into the fire. 🔥 Again, standard SQL MERGE (which is also an upsert) is yet to be implemented in MySQL

REPLACE INTO suckage basically all stems from the fact that in the case when the row already exists, the row is deleted and a new one is inserted.

To dig into why that’s bad a little bit:

1. REPLACE INTO sucks for performance. I can show the full test details if anyone’s interested. But to briefly show the results of a benchmark I whipped up, that ran a whole bunch of REPLACE INTO vs. logically identical INSERT INTO...ON DUPLICATE KEY UPDATE statements on the same large table: REPLACE INTO: ~1m 9s vs. INSERT INTO...ON DUPLICATE KEY UPDATE: ~ 0m 21s (the discrepancy can be more or less pronounced depending on some factors such as number of indexes on the table; my test case had a few indexes)

2. It may cause unnecessary foreign key violations. Because REPLACE INTO runs a DELETE followed by an INSERT, it can trigger pointless foreign key violations if the table you’re working with is a parent table in a relationship.

Demo:

/* REPLACE INTO goes KABOOM */
mysql> REPLACE INTO parent (id, a, b) VALUES (1, 123, 456);
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`core`.`child`, CONSTRAINT `fk_child_parent` FOREIGN KEY (`parent_id`) REFERENCES `parent` (`id`))

/* alternatively, INSERT INTO...ON DUPLICATE KEY UPDATE LGTM */
mysql> INSERT INTO parent (id, a, b)
    -> VALUES (1, 123, 456) AS new
    -> ON DUPLICATE KEY UPDATE
    -> a = new.a,
    -> b = new.b;
Query OK, 2 rows affected (0.00 sec)

mysql> SELECT * FROM parent WHERE id = 1; /* and we see the update worked */
+----+------+------+
| id | a    | b    |
+----+------+------+
| 1 | 123 | 456 |
+----+------+------+
1 row in set (0.00 sec)

3. Similar principle: if your foreign keys are defined as ON DELETE CASCADE, you won’t get an error, but you’ll get a surprise in that the child rows are deleted! All you want to do is to update or insert in the parent, and it kills all the child rows! Say we have the same parent and child tables, but this time, child’s foreign key was defined with ON DELETE CASCADE:

mysql> SELECT * FROM parent;
+----+------+------+
| id | a    | b    |
+----+------+------+
| 1 | 123 | 456 |
+----+------+------+
1 row in set (0.00 sec)

mysql> SELECT * FROM child; /* here are the children of parent id 1 */
+----+------+------+-----------+
| id | a    | b    | parent_id |
+----+------+------+-----------+
| 1 |    9 |    9 |         1 |
| 2 |    8 |    8 |         1 |
+----+------+------+-----------+
2 rows in set (0.00 sec)

mysql> REPLACE INTO parent (id, a, b) VALUES (1, 123, 456); /* DANGER */
Query OK, 2 rows affected (0.01 sec)

mysql> SELECT * FROM child; /* OH GOD NO!! */
Empty set (0.00 sec)

<snip rolling back the above change>

mysql> INSERT INTO parent (id, a, b) /* whereas this is safe */
    -> VALUES (1, 123, 456) AS new
    -> ON DUPLICATE KEY UPDATE
    -> a = new.a,
    -> b = new.b;
Query OK, 2 rows affected (0.00 sec)

mysql> SELECT * from child; /* whew... */
+----+------+------+-----------+
| id | a    | b    | parent_id |
+----+------+------+-----------+
| 1 |    9 |    9 |         1 |
| 2 |    8 |    8 |         1 |
+----+------+------+-----------+
2 rows in set (0.00 sec)

4. Your preexisting row might get assigned new auto-incremented ID value. If the key that REPLACE INTO is checking against is the ID/primary key, then it will retain the value as expected - no problem in this case. But say you have such an ID as your PK, and a unique constraint on another column. If the unique column is instead used to match (i. e. your REPLACE INTO statement doesn’t specify ID), the row receives a new ID value.

mysql> CREATE TABLE dummy (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(10) UNIQUE, x INT); /* the key thing here is we have an auto-incremented ID and a unique constraint on name */
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO dummy (name, x) VALUES ('thing1', 1), ('thing2', 2); /* populate table with some data */
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM dummy; /* let's take a look at the newly inserted data, notice the id values */
+----+--------+------+
| id | name   | x    |
+----+--------+------+
| 1 | thing1 |    1 |
| 2 | thing2 |    2 |
+----+--------+------+
2 rows in set (0.00 sec)

mysql> REPLACE INTO dummy (name, x) VALUES ('thing1', 999); /* try to REPLACE INTO by unique name */
Query OK, 2 rows affected (0.00 sec)

mysql> SELECT * FROM dummy; /* notice id changed from 1 to 3!! OMG WTF :scream: */
+----+--------+------+
| id | name   | x    |
+----+--------+------+
| 2 | thing2 |    2 |
| 3 | thing1 | 999 |
+----+--------+------+

Maybe the above won’t happen too often, and maybe it doesn’t always matter, but I personally don’t like unexpected data changes, especially primary key changes (what if I have some cached data in my application/ORM? what if people tend to memorize or put IDs in stored requests — imagine those changing sporadically?) It’s just another odd case to worry about it - I’d rather remove that worry entirely:

mysql> INSERT INTO dummy (name, x)
    -> VALUES ('thing1', 999) AS new
    -> ON DUPLICATE KEY UPDATE
    ->     name = new.name,
    ->     x = new.x;
Query OK, 2 rows affected (0.00 sec)

mysql> SELECT * FROM dummy; /* id 1 is still id 1 (yay) */
+----+--------+------+
| id | name   | x    |
+----+--------+------+
| 1 | thing1 | 999 |
| 2 | thing2 |    2 |
+----+--------+------+

5. I have HEARD but could not whip up a quick test case to prove, that REPLACE INTO can cause more heavyweight locking/increase risk of deadlocks. (Source) However my quick tests on MySQL 8 showed the same amount of locking involved. TBD I suppose - maybe I’ll try to dig more deeply into it at some point.

All that said, if you ever really need a DELETE followed by an INSERT, REPLACE INTO may be justified. I feel like I’ve seen the need to do so on rare occasion, but I can’t think of a specific use case right now. But for upsert purposes, do not use REPLACE INTO. It is evil.

tl;dr: MySQL’s special REPLACE INTO statement is slower and breaks stuff more frequently then INSERT INTO … ON DUPLICATE KEY UPDATE . So instead of REPLACE INTO, use INSERT INTO … ON DUPLICATE KEY UPDATE.

Wednesday, December 04, 2024

Evil MySQL Feature: INSERT IGNORE

👿👿👿 WARNING: EVIL LIES AHEAD. ABANDON ALL HOPE--ETC., ETC. 👿👿👿

The non-standard MySQL IGNORE keyword, when used in conjunction with INSERT, is often thought of by SQL coders as "insert, and if duplicate key, do nothing". But that's not the entire story.

IGNORE is a hack to ignore a slew of errors, not just duplicate key violations (per https://dev.mysql.com/doc/refman/8.4/en/sql-mode.html):

    ER_BAD_NULL_ERROR
    ER_DUP_ENTRY
    ER_DUP_ENTRY_WITH_KEY_NAME
    ER_DUP_KEY
    ER_NO_PARTITION_FOR_GIVEN_VALUE
    ER_NO_PARTITION_FOR_GIVEN_VALUE_SILENT
    ER_NO_REFERENCED_ROW_2
    ER_ROW_DOES_NOT_MATCH_GIVEN_PARTITION_SET
    ER_ROW_IS_REFERENCED_2
    ER_SUBQUERY_NO_1_ROW
    ER_VIEW_CHECK_FAILED

If you don't know what all of those errors mean off the top of your head, well I don't either. But let's check into a couple of them.

ER_NO_REFERENCED_ROW_2 means:
"Message: Cannot add or update a child row: a foreign key constraint fails%s"

Demo:

mysql> SELECT id FROM parent; -- just showing the parent table's contents
+----+
| id |
+----+
| 1 |
| 2 |
+----+
2 rows in set (0.00 sec)

mysql> SELECT * FROM child; -- just showing the child table is empty
Empty set (0.01 sec)

mysql> INSERT IGNORE INTO child(a, parent_id) VALUES (1, 999); -- try to INSERT IGNORE a row with invalid parent_id
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> SELECT * FROM child; -- no data was actually inserted because of the FK violation
Empty set (0.00 sec)

So essentially INSERT IGNORE will silently ignore a foreign key constraint failure as well. (Technically not silently as it becomes a warning, but hardly anybody checks for warning responses in their code.)

How about the error ER_SUBQUERY_NO_1_ROW, meaning:
"Message: Subquery returns more than 1 row"

Demo:
mysql> INSERT IGNORE INTO child (a, parent_id)
    -> SELECT
    ->     (
    ->     /* subquery that I expect to only return 1 row, but occasionally that's not true */
    ->     SELECT x FROM some_other_table WHERE x = yat.x
    -> ),
    -> 1
    -> FROM yet_another_table yat;

Query OK, 1000000 rows affected, 65535 warnings (5.15 sec)
Records: 1000000 Duplicates: 0 Warnings: 92622

If I run SHOW WARNINGS; I see a bunch of:

    | Warning | 1242 | Subquery returns more than 1 row |

All of those 92622 rows inserted with warnings now have NULL values for a:

mysql> SELECT COUNT(*) FROM child WHERE a IS NULL;
+----------+
| COUNT(*) |
+----------+
|    92622 |
+----------+
1 row in set (0.13 sec)

Also, a fun (/s) thing is if I repeat the last experiment, except this time add a NOT NULL constraint on child, the "bad" values will be set to 0 instead of NULL.

/* snip here - I repeated the INSERT IGNORE after adding the NOT NULL constraint on `a`, but let's look at the results */

mysql> SELECT COUNT(*) FROM child WHERE a IS NULL;
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM child WHERE a = 0;
+----------+
| COUNT(*) |
+----------+
|    92622 |
+----------+
1 row in set (0.13 sec)

Now you may be wondering, "what if I use strict mode, shouldn't less automagic weirdness happen?" Well I don't want to get into additional experiments for now, so I'll just let the MySQL documentation speak about that:

...when the IGNORE keyword and strict SQL mode are both in effect, IGNORE takes precedence. This means that, although IGNORE and strict SQL mode can be considered to have opposite effects on error handling, they do not cancel when used together.

I have to wrap up this post shortly, but notice we haven't even looked at ER_NO_PARTITION_FOR_GIVEN_VALUE , ER_VIEW_CHECK_FAILED, and many other errors. So many weird cases to consider!

👿👿👿 Just don't use INSERT IGNORE. It is evil. 👿👿👿

But there's hope...The ACTUAL way to do "insert, and if duplicate key, do nothing" in MySQL is something like:

INSERT INTO your_table (column1, column2, ...)
VALUES (value1, value2, ...)
ON DUPLICATE KEY UPDATE id = id; -- there's no `DO NOTHING` no-op in MySQL, so just set a dummy update

It's unfortunate that MySQL doesn't simply support the standard SQL MERGE statement, but MySQL gonna MySQL. Example of standard MERGE:

MERGE INTO customers AS target
USING new_customers AS source
ON target.customer_id = source.customer_id
WHEN MATCHED THEN
    UPDATE SET target.name = source.name, target.email = source.email
WHEN NOT MATCHED THEN
    INSERT (customer_id, name, email)
    VALUES (source.customer_id, source.name, source.email);

I will wrap up this post by saying -- if you are aware of, and have accounted for, ALL of the possible ignored errors, or you just don't care about any of them, then have it and INSERT IGNORE yourself silly, I suppose! But for my money, if you use INSERT IGNORE with the most common intention - i.e. "insert, and if duplicate key, do nothing" - you're much better off doing the INSERT INTO ... ON DUPLICATE KEY UPDATE ...