Please review my old post that described how it worked in previous versions: https://mwrynn.blogspot.com/2016/04/common-table-expressions-postgres-vs.html
In short, when you used a CTE, it materialized the results in memory, then ran the subsequent query against that.
You can still do that if you want in Postgres 12 with the "materialized" keyword, but you're not forced to.
So let's repeat the test I did 3 and a half years ago! Postgres 12 only - I won't repeat on Oracle, nor will I make fun of MySQL this time.
Here's what I've got:
mwrynn=# EXPLAIN ANALYZE
WITH peeps_by_state AS NOT MATERIALIZED (SELECT COUNT(*) AS cnt, state FROM person_location GROUP BY state)
SELECT cnt FROM peeps_by_state WHERE state=13;
SELECT cnt FROM peeps_by_state WHERE state=13;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on peeps_by_state (cost=1000.00..484733.72 rows=50 width=8) (actual time=3687.870..3687.871 rows=1 loops=1)
-> Finalize GroupAggregate (cost=1000.00..484733.22 rows=50 width=12) (actual time=3687.869..3687.869 rows=1 loops=1)
Group Key: person_location.state
-> Gather (cost=1000.00..484732.22 rows=100 width=12) (actual time=3687.732..3694.247 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=0.00..483722.22 rows=50 width=12) (actual time=3682.099..3682.099 rows=1 loops=3)
Group Key: person_location.state
-> Parallel Seq Scan on person_location (cost=0.00..481655.75 rows=413195 width=4) (actual time=0.448..3600.027 rows=333333 loops=3)
Filter: (state = 13)
Rows Removed by Filter: 16333333
Planning Time: 0.159 ms
Execution Time: 3694.318 ms
(13 rows)
Now let's use the "MATERIALIZED" keyword and compare.
mwrynn=# EXPLAIN ANALYZE
WITH peeps_by_state AS MATERIALIZED (SELECT COUNT(*) AS cnt, state FROM person_location GROUP BY state)
SELECT cnt FROM peeps_by_state WHERE state=13;
SELECT cnt FROM peeps_by_state WHERE state=13;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on peeps_by_state (cost=534753.70..534754.83 rows=1 width=8) (actual time=11828.277..11828.392 rows=1 loops=1)
Filter: (state = 13)
Rows Removed by Filter: 49
CTE peeps_by_state
-> Finalize GroupAggregate (cost=534741.03..534753.70 rows=50 width=12) (actual time=11828.191..11828.328 rows=50 loops=1)
Group Key: person_location.state
-> Gather Merge (cost=534741.03..534752.70 rows=100 width=12) (actual time=11828.181..11835.081 rows=150 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=533741.01..533741.14 rows=50 width=12) (actual time=11818.697..11818.703 rows=50 loops=3)
Sort Key: person_location.state
Sort Method: quicksort Memory: 27kB
Worker 0: Sort Method: quicksort Memory: 27kB
Worker 1: Sort Method: quicksort Memory: 27kB
-> Partial HashAggregate (cost=533739.10..533739.60 rows=50 width=12) (actual time=11818.610..11818.624 rows=50 loops=3)
Group Key: person_location.state
-> Parallel Seq Scan on person_location (cost=0.00..429572.40 rows=20833340 width=4) (actual time=0.521..3374.912 rows=16666667 loops=3)
Planning Time: 0.300 ms
Execution Time: 11835.398 ms
(19 rows)
So there we have it - a significant difference! (Note that the times here cannot be compared to the times in the previous post, as it's a different server, different config, etc.)
I'll end the post here, but I may go over the steps in this execution plan and try to explain what's going on - maybe in a subsequent post or maybe I'll just edit this one.
No comments:
Post a Comment