In the world of SQL, sometimes you need to conjure up rows of literal data out of thin air, instead of pulling data from an existing table. The context of this post is when you, the SQL coder, need to define each row of literals individually. Maybe you're populating a table for testing purposes - the precise use case isn't too important. Note I am not referring to the context of generating a series by calling a function (post about how to do that), or otherwise pulling a large number of randomly rows out of thin air programmatically. Rather, you want to list the rows out one by one.
That foundation laid out, my claim is that I've generally found it is better to use your DBMS of choice's row constructor syntax than what I've more commonly seen: a bunch of SELECTs containing only literals that are UNION ALL'ed together. For example:
SELECT 1,2,3
UNION ALL
SELECT 4,5,6
UNION ALL
SELECT 7,8,9
...
Rather than writing the above syntax, there are benefits to taking a row constructor approach, which looks like the following in MySQL.
VALUES
ROW(1,2,3),
ROW(4,5,6),
ROW(7,8,9),
...
Let me clarify that last bit: this post is specifically about MySQL and
the above syntax is MySQL specific. No specific tests or under-the-hood workings demonstrated below should be generalized
beyond the scope of MySQL. That said, I've found the principle does tend to apply elsewhere, but I will not try to support my claim for any other DBMS than MySQL in this post.
Without further ado...
Why use row constructors over a bunch of UNION ALL'ed SELECTs?
1) Performance
The UNION ALL approach takes greater wall clock time to execute. I quickly ran a not-super-scientific benchmark (MySQL 8.0.x, 2024 Macbook Pro, usual YMMV disclaimer applies) of the two approaches on 10,000 rows (the same exact data is constructed in each) and the difference was night and day:
Generating 10000 constant rows...
Generating UNION ALL SQL in union.sql...
Generating VALUES ROW(...) SQL in values.sql...
=== Running UNION ALL ===
real 3.95
user 0.04
sys 0.01
=== Running VALUES ROW ===
real 0.08
user 0.03
sys 0.00
Times were about consistent with repeated trials. The discrepancy would likely be less pronounced with smaller data sets. And of course, if you're cooking up a tiny number of rows, say 5, I doubt you'll notice any tangible difference.
2) UNION ALL approach uses more resources and doesn't scale to a large number of rows
Each SELECT ... UNION ALL ... fragment gets parsed recursively and consumes thread stack space. [TODO: add citation] I bumped up the number of rows my script generates to a million, and got an error in the UNION ALL test:
ERROR 1436 (HY000) at line 1: Thread stack overrun: 242167 bytes used of a 262144 byte stack, and 20000 bytes needed. Use 'mysqld --thread_stack=#' to specify a bigger stack.
The row constructor test with a million rows ran fine, and notice that even though it had 100 times the number of rows than the 10,000-row UNION ALL test above, it took only about 50% longer to execute:
=== Running VALUES ROW ===
real 5.91
user 1.40
sys 0.09
Reminder because I see this human error way too often: the subject is specifically MySQL! You cannot generalize this behavior to all of "SQL," aka all DBMSs.
3) UNION ALL approach is logically hacky
I'll admit this point is more feels-based than technical, in other words, just my personal opinion: the UNION ALL approach feels like a hack. It says "Select some literals, each as a standalone set of a single row of data, and glue each set together via the set operator UNION ALL, and in the final output, we will have constructed many rows."
The row constructor approach, on the other hand, gets right to the point and says "construct many rows of literals." It's clearer, cleaner code, and slightly more terse as well.
There are some minor downsides to the row constructor approach:
1) Special, non-standard ROW keyword:
A minor downside of the row constructor approach in MySQL: the MySQL developers chose to require the special keyword ROW for some reason, that I'm not sure I've seen in any other SQL engine. In standard SQL - including Postgres - the syntax is similar except you don't need the special keyword ROW. This is a little more elegant, IMO:
postgres=# /* slightly cleaner/more elegant than MySQL */
VALUES
(1, 2, 3),
(4, 5, 6),
(7, 8, 9);
column1 | column2 | column3
---------+---------+---------
1 | 2 | 3
4 | 5 | 6
7 | 8 | 9
2) You need a wrapper to alias the ugly default column names
Another (very minor IMO) downside of the row constructor approach, perhaps: the default column names are as follows:
mysql> VALUES
-> ROW('val_1', 1, 'cat1', 1, 0.01, '2024-01-01 00:00:00'),
-> ROW('val_2', 2, 'cat2', 0, 0.02, '2024-01-01 00:00:00');
+----------+----------+----------+----------+----------+---------------------+
| column_0 | column_1 | column_2 | column_3 | column_4 | column_5 |
+----------+----------+----------+----------+----------+---------------------+
| val_1 | 1 | cat1 | 1 | 0.01 | 2024-01-01 00:00:00 |
| val_2 | 2 | cat2 | 0 | 0.02 | 2024-01-01 00:00:00 |
+----------+----------+----------+----------+----------+---------------------+
2 rows in set (0.00 sec)
So you need a wrapper to alias them to something pretty:
mysql> SELECT * FROM (
-> VALUES
-> ROW('val_1', 1, 'cat1', 1, 0.01, '2024-01-01 00:00:00'),
-> ROW('val_2', 2, 'cat2', 0, 0.02, '2024-01-01 00:00:00')
-> ) AS t(name, id, category, flag, score, ts);
+-------+----+----------+------+-------+---------------------+
| name | id | category | flag | score | ts |
+-------+----+----------+------+-------+---------------------+
| val_1 | 1 | cat1 | 1 | 0.01 | 2024-01-01 00:00:00 |
| val_2 | 2 | cat2 | 0 | 0.02 | 2024-01-01 00:00:00 |
+-------+----+----------+------+-------+---------------------+
2 rows in set (0.01 sec)
That's about all I wanted to say for now on the subject. Remember, all tests above and specifics about how the respective statements are processed are MySQL-specific. Your mileage may vary on other software.
Question for the crowd: have you seen benefits using row constructors in other DBMSs? (Postgres, Oracle, SQL Server, whatever else) Thanks for reading!
No comments:
Post a Comment