Saturday, June 11, 2016

Generating a series of integers as a set of rows


In my last post, Cumulative row count over date ranges, I take advantage of a postgres function called generate_series(start, stop) - this generates a series of values, from start to stop. So, for example, generate_series(1,4) returns:

SELECT * FROM generate_series(1,4);
 generate_series
-----------------
               1 
               2
               3
               4

A question that’s been on my mind is how to achieve this in other database systems?

Oracle


Oracle has a way to generate a series as well, but it’s just not quite as simple as calling a function:

SELECT level i
FROM dual
CONNECT BY level <= 4

Less intuitive but it works nicely. What it’s doing is it uses the CONNECT BY clause on the dual table - this means the 1 column, 1 row dual table is being “connected” to itself recursively — this is called a hierarchical query (https://docs.oracle.com/database/121/SQLRF/queries003.htm). “Level” is a special pseudocolumn that tells you the level of recursion depth. So, while this method is more complicated, it does work nicely!

MySQL


The MySQL way….

I didn’t know how to achieve generate_series in MySQL off the top of my head, so off to google! Unfortunately the method I found depends on selecting from a table of at least n size, where n is the number of values you wish to generate. Join on a subquery that declares a variable @num, and in your SELECT clause, increment this value per row. I found this solution here, they are using dates but it can be simplified to generate integers. http://stackoverflow.com/questions/6870499/generate-series-equivalent-in-mysql

SELECT @num:=@num+1
FROM INFORMATION_SCHEMA.COLUMNS, (select @num:=0) num
LIMIT 4


+--------------+
| @num:=@num+1 |
+--------------+
|            1 |
|            2 |
|            3 |
|            4 |
+--------------+

4 rows in set, 1 warning (0.02 sec)

Works great but with the caveat that the "dummy" table you select from, in this example INFORMATION_SCHEMA.COLUMNS, must have at least as many rows as you want values. A little scary, in my opinion, to use in a generalized way with this limitation. If you aren't sure if you have enough, you could cross join the table to itself. So for example, if INFORMATION_SCHEMA.COLUMNS only has 1000 rows, you could cross join it to itself to get a max of 1,000,000 rows like so:
 ... FROM INFORMATION_SCHEMA.COLUMNS AS a, INFORMATION_SCHEMA.COLUMNS AS b ...

MS SQL Server


Oracle-like recursion can be done in SQL Server as well, and wouldn't you know it, some nice developer wrote a generate_series function for SQL Server and shared it on github! UPDATE: SQL Server 2022+ provides a built-in generate_series() function!
 

Why is generating a series useful, anyway?

There are probably more use cases for generating a series than I can come up with! But as in the previous post I linked to at the top of this one, a series can be useful for implementing something like a for loop. Join against the series as if it were another table, and use its value (call it i) as a parameter to a function, then voila, suddenly you're calling function f for each value of i: f(1), f(2), ..., f(n). This is a great way to create more elegant queries by avoiding copied/pasted repetition. (Again as see in my last post.) 
 
It's also great for generating test data, perhaps large amounts of it. Maybe you want to test a performance hypothesis on a schema that you're currently developing. You don't have any real production data yet. You can pull rows of data out of thin air using generate_series() or similar methods, then run your queries and see how they perform. 

That's all for now!
 
 
MySQL now also supports recursive CTEs, so you could try to generate a series of rows that way. Just be careful of the recursion limit. 
 
Also worth nothing: if you're using Spark/Databricks SQL, there is the range() function which is similar to generate_series():

No comments: