The goal in this demo is to refer to columns dynamically, without the help of some other language dynamically generating SQL. This is so we can avoid repeated, copy-pasted expressions across multiple, similar columns. We want to write the expression only once, ideally.
The solution is a little complicated, and is definitely Postgres-specific. but perhaps a similar technique can be used in any SQL engine that supports similar JSON functionality. (Edit: I pulled off a similar trick in Databricks/Spark SQL recently. Maybe that's worth a separate post.)
So starting with this table to work with:
select * from t; id | url_path1 | url_path2 | url_path3 | url_path4 | url_path5 ----+-----------+-----------+-----------+-----------+----------- 1 | /a/a/a | /b/b/b | /c/c/c | /d/d/d | /e/e/e 2 | /f/f/f | /g/g/g | /h/h/h | /i/i/i | /j/j/j (2 rows)
The idea is to convert to JSON so that you can perform a function (in this demo, I’ll be simply concatenating a ‘/’ character) only once for several columns. I want to leave id as-is, but perform the concatenation on url_path*
Then we’ll convert back to relational records in the end.
I’ll be using the demonstrative technique of running one query at a time, putting each into a CTE for the subsequent step and building on that until the end result…
Step 1:
Convert all columns we want to modify, i.e. all except for id, to JSON, then to records consisting of a key and a value...
We could alternatively do a positive match for something like 'url_path%', but - 'id' works as well for this use case.
postgres=# select id, jsonb_each_text(to_jsonb(t) - 'id') as url_path_kv from t; id | url_path_kv ----+-------------------- 1 | (url_path1,/a/a/a) 1 | (url_path2,/b/b/b) 1 | (url_path3,/c/c/c) 1 | (url_path4,/d/d/d) 1 | (url_path5,/e/e/e) 2 | (url_path1,/f/f/f) 2 | (url_path2,/g/g/g) 2 | (url_path3,/h/h/h) 2 | (url_path4,/i/i/i) 2 | (url_path5,/j/j/j) (10 rows)
Step 2:
Now let’s separate the keys and values into varchar columns, then concatenate a ‘/’ to the values.
Concatenating ‘/’ is the change to the values of url_path* that we are making. This could be replaced with something complicated that we don’t want to repeat 5 times.
with step1_kv as ( select id, jsonb_each_text(to_jsonb(t) - 'id') as url_path_kv from t ) select id, (url_path_kv).key as key, (url_path_kv).value || '/' as val_fixed from step1_kv; id | key | val_fixed ----+-----------+----------- 1 | url_path1 | /a/a/a/ 1 | url_path2 | /b/b/b/ 1 | url_path3 | /c/c/c/ 1 | url_path4 | /d/d/d/ 1 | url_path5 | /e/e/e/ 2 | url_path1 | /f/f/f/ 2 | url_path2 | /g/g/g/ 2 | url_path3 | /h/h/h/ 2 | url_path4 | /i/i/i/ 2 | url_path5 | /j/j/j/ (10 rows)
Step 3:
Convert back to JSON using jsonb_object_agg() which is an aggregate function that makes JSON, essentially
with step1_kv as (
select id, jsonb_each_text(to_jsonb(t) - 'id') as url_path_kv from t
),
step2_kv_separate as (
select id, (url_path_kv).key as key, (url_path_kv).value || '/' as val_fixed
from step1_kv
)
select id, jsonb_object_agg(key, val_fixed) as url_path_jsonb
from step2_kv_separate
group by id;
id | url_path_jsonb
----+--------------------------------------------------------------------------------------------------------------------------
2 | {"url_path1": "/f/f/f/", "url_path2": "/g/g/g/", "url_path3": "/h/h/h/", "url_path4": "/i/i/i/", "url_path5": "/j/j/j/"}
1 | {"url_path1": "/a/a/a/", "url_path2": "/b/b/b/", "url_path3": "/c/c/c/", "url_path4": "/d/d/d/", "url_path5": "/e/e/e/"}
(2 rows)
Step 4:
Convert back to relational records.
with step1_kv as (
select id, jsonb_each_text(to_jsonb(t) - 'id') as url_path_kv from t
),
step2_kv_separate as (
select id, (url_path_kv).key as key, (url_path_kv).value || '/' as val_fixed
from step1_kv
),
step3_back_to_json as (
select id, jsonb_object_agg(key, val_fixed) as url_path_jsonb
from step2_kv_separate
group by id
)
select id, r.* --back to relational records
from step3_back_to_json
cross join lateral
jsonb_to_record(url_path_jsonb) as r(url_path1 varchar, url_path2 varchar, url_path3 varchar, url_path4 varchar, url_path5 varchar);
id | url_path1 | url_path2 | url_path3 | url_path4 | url_path5
----+-----------+-----------+-----------+-----------+-----------
1 | /a/a/a/ | /b/b/b/ | /c/c/c/ | /d/d/d/ | /e/e/e/
2 | /f/f/f/ | /g/g/g/ | /h/h/h/ | /i/i/i/ | /j/j/j/
(2 rows)
Now, you could easily argue the complexity isn’t worth it...but consider if you had url_path1 through url_path99 and several complex expressions to apply to each. :) Could be worth it!
And I want to say it may be possible to do so without so many conversion steps. I’m kind of a noob with these JSON functions. Plus, I wanted to break the problem down into clear and separate steps for demonstrative purposes.
Anyway, that’s what blew my mind when I learned it was even possible to refer to columns dynamically with such a JSON trick!
No comments:
Post a Comment