Saturday, June 11, 2016

Generating a series of integers as a set of rows

In my last post, Cumulative row count over date ranges, I take advantage of a postgres function called generate_series(start, stop) - this generates a series of values, from start to stop. So, for example, generate_series(2,4) returns:

SELECT * FROM generate_series(1,4);

A question that’s been on my mind is how to achieve this in other database systems?


Oracle has a way to generate a series as well, but it’s just not quite as simple as calling a function:

SELECT level i
FROM dual
CONNECT BY level <= 4

Less intuitive but it works nicely. What it’s doing is it uses the CONNECT BY clause on the dual table - this means the 1 column, 1 row dual table is being “connected” to itself recursively — this is called a hierarchical query ( “Level” is a special pseudocolumn that tells you the level of recursion depth. So, while this method is more complicated, it does work nicely!


The MySQL way….

I didn’t know how to achieve generate_series in MySQL off the top of my head, so off to google! Unfortunately the method I found depends on selecting from a table of at least n size, where n is the number of values you wish to generate. Join on a subquery that declares a variable @num, and in your SELECT clause, increment this value per row. I found this solution here, they are using dates but it can be simplified to generate integers.

SELECT @num:=@num+1

| @num:=@num+1 |
|            1 |
|            2 |
|            3 |
|            4 |

4 rows in set, 1 warning (0.02 sec)

Works great but with the caveat that the "dummy" table you select from, in this example INFORMATION_SCHEMA.COLUMNS, must have at least as many rows as you want values. A little scary, in my opinion, to use in a generalized way with this limitation.

MS SQL Server

Oracle-like recursion can be done in SQL Server as well, and wouldn't you know it, some nice developer wrote a generate_series function for SQL Server and shared it on github

Why is generating a series useful, anyway?

There are probably more use cases for generating a series than I can come up with! But as in the previous post I linked to at the top of this one, a series can be useful for implementing something like a for loop. Join against the series as if it were another table, and use its value (call it i) as a parameter to a function, then voila, suddenly you're calling function f for each value of i: f(1), f(2), ..., f(n). This is a great way to create more elegant queries by avoiding copied/pasted repetition. (Again as see in my last post.) 

That's all for now!

No comments: