Monday, March 18, 2024

Postgres trick: Referring to columns dynamically within static SQL

The goal in this demo is to refer to columns dynamically, without the help of some other language dynamically generating SQL. This is so we can avoid repeated, copy-pasted expressions across multiple, similar columns. We want to write the expression only once, ideally.

The solution is a little complicated, and is definitely Postgres-specific. but perhaps a similar technique can be used in any SQL engine that supports similar JSON functionality.

So starting with this table to work with:

select * from t; id | url_path1 | url_path2 | url_path3 | url_path4 | url_path5 ----+-----------+-----------+-----------+-----------+----------- 1 | /a/a/a | /b/b/b | /c/c/c | /d/d/d | /e/e/e 2 | /f/f/f | /g/g/g | /h/h/h | /i/i/i | /j/j/j (2 rows)

The idea is to convert to JSON so that you can perform a function (in this demo, I’ll be simply concatenating a ‘/’ character) only once for several columns. I want to leave id as-is, but perform the concatenation on url_path*

Then we’ll convert back to relational records in the end.

I’ll be using the demonstrative technique of running one query at a time, putting each into a CTE for the subsequent step and building on that until the end result…

Step 1: 

Convert all columns we want to modify, i.e. all except for id, to JSON, then to records consisting of a key and a value...

We could alternatively do a positive match for something like 'url_path%', but - 'id' works as well for this use case.

postgres=# select id, jsonb_each_text(to_jsonb(t) - 'id') as url_path_kv from t; id | url_path_kv ----+-------------------- 1 | (url_path1,/a/a/a) 1 | (url_path2,/b/b/b) 1 | (url_path3,/c/c/c) 1 | (url_path4,/d/d/d) 1 | (url_path5,/e/e/e) 2 | (url_path1,/f/f/f) 2 | (url_path2,/g/g/g) 2 | (url_path3,/h/h/h) 2 | (url_path4,/i/i/i) 2 | (url_path5,/j/j/j) (10 rows)

Step 2:

Now let’s separate the keys and values into varchar columns, then concatenate a ‘/’ to the values.

Concatenating ‘/’ is the change to the values of url_path* that we are making. This could be replaced with something complicated that we don’t want to repeat 5 times.

with step1_kv as ( select id, jsonb_each_text(to_jsonb(t) - 'id') as url_path_kv from t ) select id, (url_path_kv).key as key, (url_path_kv).value || '/' as val_fixed from step1_kv; id | key | val_fixed ----+-----------+----------- 1 | url_path1 | /a/a/a/ 1 | url_path2 | /b/b/b/ 1 | url_path3 | /c/c/c/ 1 | url_path4 | /d/d/d/ 1 | url_path5 | /e/e/e/ 2 | url_path1 | /f/f/f/ 2 | url_path2 | /g/g/g/ 2 | url_path3 | /h/h/h/ 2 | url_path4 | /i/i/i/ 2 | url_path5 | /j/j/j/ (10 rows)

Step 3:

Convert back to JSON using jsonb_object_agg() which is an aggregate function that makes JSON, essentially

with step1_kv as (
select id, jsonb_each_text(to_jsonb(t) - 'id') as url_path_kv from t
),
step2_kv_separate as (
select id, (url_path_kv).key as key, (url_path_kv).value || '/' as val_fixed
from step1_kv
)
select id, jsonb_object_agg(key, val_fixed) as url_path_jsonb
from step2_kv_separate
group by id;

id | url_path_jsonb
----+--------------------------------------------------------------------------------------------------------------------------
2 | {"url_path1": "/f/f/f/", "url_path2": "/g/g/g/", "url_path3": "/h/h/h/", "url_path4": "/i/i/i/", "url_path5": "/j/j/j/"}
1 | {"url_path1": "/a/a/a/", "url_path2": "/b/b/b/", "url_path3": "/c/c/c/", "url_path4": "/d/d/d/", "url_path5": "/e/e/e/"}

(2 rows)


Step 4:

Convert back to relational records.


with step1_kv as (
select id, jsonb_each_text(to_jsonb(t) - 'id') as url_path_kv from t
),
step2_kv_separate as (
select id, (url_path_kv).key as key, (url_path_kv).value || '/' as val_fixed
from step1_kv
),
step3_back_to_json as (
select id, jsonb_object_agg(key, val_fixed) as url_path_jsonb
from step2_kv_separate
group by id
)
select id, r.* --back to relational records
from step3_back_to_json
cross join lateral
jsonb_to_record(url_path_jsonb) as r(url_path1 varchar, url_path2 varchar, url_path3 varchar, url_path4 varchar, url_path5 varchar);

id | url_path1 | url_path2 | url_path3 | url_path4 | url_path5
----+-----------+-----------+-----------+-----------+-----------
1 | /a/a/a/ | /b/b/b/ | /c/c/c/ | /d/d/d/ | /e/e/e/
2 | /f/f/f/ | /g/g/g/ | /h/h/h/ | /i/i/i/ | /j/j/j/

(2 rows)


Now, you could easily argue the complexity isn’t worth it...but consider if you had url_path1 through url_path99 and several complex expressions to apply to each. :) Could be worth it!

And I want to say it may be possible to do so without so many conversion steps. I’m kind of a noob with these JSON functions. Plus, I wanted to break the problem down into clear and separate steps for demonstrative purposes.

Anyway, that’s what blew my mind when I learned it was even possible to refer to columns dynamically with such a JSON trick!

Saturday, February 03, 2024

A Bonkers SQL Feature!

It’s time for a “Bonkers SQL Feature” post:

A (IMO) bonkers standard SQL behavior is that newlines between string constants cause implicit concatenation. Spaces between them do not. And if you try a similar thing but with column names or variables names (i. e. not constants), there is no implicit concatenation. Use two column names in this manner, and you end up with the second column treated as an alias for the first. :)

A quick demo follows:

/* newlines concatenate */
postgres=# select 'a'
'b'
'c'
;
 ?column?
----------
 abc
(1 row)

/* spaces do not concatenate - causes syntax error */
postgres=# select 'a' 'b' 'c';
ERROR:  syntax error at or near "'b'"
LINE 1: select 'a' 'b' 'c';
                   ^

/* let's test newlines but with column names */
postgres=# create table t (a varchar, b varchar);
CREATE TABLE

postgres=# insert into t values('aaa','bbb');
INSERT 0 1
postgres=#
postgres=# select a
b
from t;
  b
-----
 aaa
(1 row)


Per the Postgres documentation, the behavior defined by standard SQL is, and kudos to them for calling it bizarre:

Two string constants that are only separated by whitespace with at least one newline are concatenated and effectively treated as if the string had been written as one constant.

(This slightly bizarre behavior is specified by SQL; PostgreSQL is following the standard.)

A potential danger in this behavior is if you break up a list using newlines but forget to write a comma, such as:

/* basic IN check */
postgres=# SELECT 'foo' IN ('foo','bar');
 ?column?
----------
 t
(1 row)

/* here 'bar' is on the second line and I forgot to add a comma, so the check becomes `'foo' IN ('foobar')` essentially, resulting in false
postgres=# SELECT 'foo' IN ('foo'
postgres(#  'bar');

 ?column?
----------
 f
(1 row)

MySQL does similarly, BUT horrifyingly to me, they decided to change “Two string constants that are only separated by whitespace with at least one newline” to “Two string constants that are only separated by whitespace.” In other words, it takes this oddball behavior one step further and does it for spaces as well, despite what the standard says. (Although perhaps it makes more intuitive sense not to differentiate between different kinds of whitespace, standard be damned??)

MySQL:

/* newline test is consistent with Standard and Postgres */
mysql> select
    -> 'a'
    -> 'b'
    -> 'c'
    -> ;
+-----+
| a   |
+-----+
| abc |
+-----+
1 row in set (0.00 sec)

/* space test - WAT!! */
mysql> select 'a' 'b' 'c';
+-----+
| a   |
+-----+
| abc |
+-----+
1 row in set (0.00 sec)

/* column name test is the same */
mysql> create table t (col1 varchar(3), col2 varchar(3));
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t (col1, col2) values ('aaa', 'bbb');
Query OK, 1 row affected (0.01 sec)

mysql>
mysql> select col1
    -> col2
    -> from t;
+------+
| col2 |
+------+
| aaa  |
+------+
1 row in set (0.00 sec)

Not sure what else to say except...Be careful out there. :)