1) Get a cumulative sum - maybe you have a table of individual purchases and want to see how the sum of purchases progresses over time.
2) Perform your typical GROUP BY query using an aggregate function, BUT you don't want to roll up the data. You want to see every row in the table you're querying, with the AVG (or MAX, SUM, whatever) alongside it. See the first example in this Postgres doc.
But did you know Postgres now has Analytics too, as of 8.4? They are called Window Functions in the Postgres world. (I suppose you could call them "queries that use window functions", or even "analytic queries" as Oracle can't force you not to use the phrase.)
I had a requirement. Stripped down to the core, the requirement was to translate this:
ID | expire
---+---------
7 | 8/20/2011
6 |
5 | 8/15/2011
4 | 8/15/2011
3 |
2 |
1 | 8/10/2011
To this:
ID | expire
---+---------
7 | 8/20/2011
6 | 8/15/2011
5 | 8/15/2011
4 | 8/15/2011
3 | 8/10/2011
2 | 8/10/2011
1 | 8/10/2011
The translation is: If expire is null, look down to the next older expire. If THAT one is null too, look down again, and so on. The solution that worked very well for me was to use Window Functions, plus recursive ("WITH RECURSIVE") queries.
Edit: A helpful commenter provided this simpler solution: select id, max(expire) over (order by id rows between unbounded preceding and current row) as expire from t order by id desc
The windowing clause in this query says: order the rows by id ascending, and for each row, get the max value of expire looking backwards from this row to all previous rows. Pretty simple! The Oracle solution should be the same, btw.
I'll leave my older code and thought processes in this article for posterity.
ID prev_ID next_id expire
--- ------- ------- --------
7 null 6 8/20/2011
6 7 5
5 6 4 8/15/2011
4 5 3 8/15/2011
3 4 2
2 3 1
1 2 null 8/10/2011
Step 2 was to use WITH RECURSIVE to traverse from row to row, via the pointers in our linked list, to form the end resultset we're looking for. This entailed another little trick. Coalesce two expires (the current and the parent) to make the non-null values bubble up.
Creating the view, transaction, with window functions:
CREATE VIEW transaction AS
select
id,
lag(id) over (partition by account_id order by id asc) lag_id,
lead(id) over (partition by account_id order by id asc) lead_id,
expire
from trans;
select
id,
lag(id) over (partition by account_id order by id asc) lag_id,
lead(id) over (partition by account_id order by id asc) lead_id,
expire
from trans;
Recursive query against transaction:
WITH RECURSIVE recur(
id,
expire
) AS (
select
id,
expire
from transaction where lag_id is null
union all
select
transaction.id,
coalesce(transaction.expire, recur.expire)
from transaction, recur
where transaction.lag_id=recur.id
)
select * from recur;
id,
expire
) AS (
select
id,
expire
from transaction where lag_id is null
union all
select
transaction.id,
coalesce(transaction.expire, recur.expire)
from transaction, recur
where transaction.lag_id=recur.id
)
select * from recur;
I'm really pleased with the advanced querying features Postgres has been putting out. Actually, WITH RECURSIVE is ANSI standard, as it turns out, and you can even find it in SQL Server, DB2 and Oracle 11g (though Oracle has had its own way of performing recursive queries using CONNECT BY, for ages).
3 comments:
This is probably obvious, but note that if you don't index both the columns in the recursive join (in my example, "where transaction.lag_id=recur.id"), the recursive query may run very slow! In my tests, once I got into 10s of thousands of rows it started becoming quite slow. (Probably something like n^2 performance...)
Here's a way to get the answer with a window function, without recursion:
select
id, max(expire) over (order by id rows between unbounded preceding and current row) as expire
from t
order by id desc
Thanks RobJ - didn't know about that one. Nice!
Post a Comment