The Query Planner is God
EXPLAIN
shows us how a query would
be run if the Query Planner were to run it right then
Understanding how our "high level" program is run by the "low level" system can help us squeeze the most performance from the system.
Used to be very common among the C and Fortran crowds to examine assembly before compilers became as good, if not better than, humans.
EXPLAIN
Our window into the Query Planner
Contains exactly what will run as well as cost estimates
Movie Lens 20million reviews dataset
% wget -x http://files.grouplens.org/datasets/movielens/ml-20m.zip % wget -x http://files.grouplens.org/datasets/movielens/ml-20m.zip.md5 % unzip ml-20m.zip % createdb planner_example % psql planner_example psql (9.5.3, server 9.5.1) planner_example=# create table movies ( movie_id integer, title text, genres_text text, genres text[]); planner_example=# copy movies (movie_id, title, genres_text) from '/monongahela/movielens/files.grouplens.org/datasets/movielens/ml-20m/movies.csv' with (format csv, header); planner_example=# update movies set genres = regexp_split_to_array(genres_text, '\|'); planner_example=# alter table movies drop genres_text; planner_example=# create table ratings ( user_id integer not null, movie_id integer not null references movies, rating real not null, ts bigint not null); planner_example=# COPY ratings from '/root/files.grouplens.org/datasets/movielens/ml-20m/ratings.csv' with csv header; planner_example=# create unique index ratings_pk on ratings (user_id, movie_id);
Some Common Nodes
Can be thought of as a DAG or Tree
planner_example=# explain select * from movies; QUERY PLAN ------------------------------------------------------------- Seq Scan on movies (cost=0.00..958.78 rows=27278 width=77) (1 row)
planner_example=# explain select * from movies where movie_id = 6645; QUERY PLAN ---------------------------------------------------------- Seq Scan on movies (cost=0.00..1026.97 rows=1 width=77) Filter: (movie_id = 6645) (2 rows)
width
Average size in bytes of the row returned
Integers are 4-bytes and the table statistics list the average
size of title
to be 28, which is close to the actual average
length!
planner_example=# explain select * from movies where movie_id = 6645; QUERY PLAN ---------------------------------------------------------- Seq Scan on movies (cost=0.00..1026.97 rows=1 width=77) Filter: (movie_id = 6645) (2 rows) planner_example=# explain select movie_id, title from movies where movie_id = 6645; QUERY PLAN ---------------------------------------------------------- Seq Scan on movies (cost=0.00..1026.97 rows=1 width=32) Filter: (movie_id = 6645) (2 rows) planner_example=# select avg(length(title)) from movies ; avg --------------------- 27.8817361976684508 (1 row)
cost
Computed from the number of pages needing to be read by an execution node and the number of CPU-bound operations being performed.
Parameter | Description | Default | Relative cost |
---|---|---|---|
random_page_cost | Non-sequential disk page fetch. | 4.00 | 4x slower |
seq_page_cost | Sequentially disk page fetch. | 1.00 | baseline |
cpu_tuple_cost | Processing each tuple. | 0.01 | 100x faster |
cpu_index_tuple_cost | Processing each entry during an index scan. | 0.005 | 200x faster |
cpu_operator_cost | Processing each operator or function call. | 0.0025 | 400x faster |
planner_example=# explain analyze select * from movies where movie_id = 6645; QUERY PLAN ------------------------------------------------------------------------------- Seq Scan on movies (cost=0.00..1026.97 rows=1 width=77) (actual time=1.530..5.437 rows=1 loops=1) Filter: (movie_id = 6645) Rows Removed by Filter: 27277 Planning time: 0.083 ms Execution time: 5.465 ms (5 rows)
ORDER BY
clauses.planner_example=# create unique index movies_movie_id_idx on movies (movie_id); planner_example=# alter table movies add primary key using index movies_movie_id_idx ; planner_example=# \d movies Table "public.movies" Column | Type | Modifiers ----------+---------+----------- movie_id | integer | not null title | text | genres | text[] | Indexes: "movies_movie_id_idx" PRIMARY KEY, btree (movie_id)s
Allows for directly reading the index and returning the tuples that match it.
planner_example=# explain select * from movies where movie_id = 6645; QUERY PLAN ----------------------------------------------------------------------------------- Index Scan using movies_movie_id_idx on movies (cost=0.29..8.30 rows=1 width=77) Index Cond: (movie_id = 6645) (2 rows)
There is no index on title, so it will Seq Scan as expected
planner_example=# explain select * from movies where title like '%THX%'; QUERY PLAN ---------------------------------------------------------- Seq Scan on movies (cost=0.00..1026.97 rows=3 width=77) Filter: (title ~~ '%THX%'::text) (2 rows)
There is an index on title and it still Seq Scans!
planner_example=# create index movies_title_idx on movies (title); CREATE INDEX planner_example=# \d movies Table "public.movies" Column | Type | Modifiers ----------+---------+----------- movie_id | integer | not null title | text | genres | text[] | Indexes: "movies_movie_id_idx" PRIMARY KEY, btree (movie_id) "movies_title_idx" btree (title) planner_example=# explain select * from movies where title like '%THX%'; QUERY PLAN ---------------------------------------------------------- Seq Scan on movies (cost=0.00..1026.97 rows=3 width=77) Filter: (title ~~ '%THX%'::text) (2 rows)
b-tree indices can only be used on exact, less than, or greater than matches
planner_example=# explain select * from movies where title = '%THX%'; QUERY PLAN -------------------------------------------------------------------------------- Index Scan using movies_title_idx on movies (cost=0.41..8.43 rows=1 width=77) Index Cond: (title = '%THX%'::text) (2 rows)
Use an index suitable for the data and condition being queried
planner_example=# create extension pg_trgm; CREATE EXTENSION planner_example=# create index movies_title_gist_idx on movies using gist (title gist_trgm_ops); CREATE INDEX planner_example=# \d movies Table "public.movies" Column | Type | Modifiers ----------+---------+----------- movie_id | integer | not null title | text | genres | text[] | Indexes: "movies_movie_id_idx" PRIMARY KEY, btree (movie_id) "movies_title_gist_idx" gist (title gist_trgm_ops) "movies_title_idx" btree (title) planner_example=# explain select * from movies where title like '%THX%'; QUERY PLAN ------------------------------------------------------------------------------------ Bitmap Heap Scan on movies (cost=4.30..15.74 rows=3 width=77) Recheck Cond: (title ~~ '%THX%'::text) -> Bitmap Index Scan on movies_title_gist_idx (cost=0.00..4.30 rows=3 width=0) Index Cond: (title ~~ '%THX%'::text) (4 rows)
planner_example=# explain select * from movies where title like '%THX%'; QUERY PLAN ------------------------------------------------------------------------------------ Bitmap Heap Scan on movies (cost=4.30..15.74 rows=3 width=77) Recheck Cond: (title ~~ '%THX%'::text) -> Bitmap Index Scan on movies_title_gist_idx (cost=0.00..4.30 rows=3 width=0) Index Cond: (title ~~ '%THX%'::text) (4 rows)
Index Scan
and Heap
Scan
are paired. The Index Scan
builds the
Bitmap; other methods of building it include AND
ing or
OR
ing indicesRecheck Cond
is because
during execution, the Bitmap may become lossy and only
remember the pages, and not exact tuples, that matched the
condition. This may be based on memory usage or the type of data
being examined.planner_example=# explain select * from movies natural join ratings; QUERY PLAN -------------------------------------------------------------------------- Hash Join (cost=1299.76..603697.03 rows=20000264 width=93) Hash Cond: (ratings.movie_id = movies.movie_id) -> Seq Scan on ratings (cost=0.00..327393.64 rows=20000264 width=20) -> Hash (cost=958.78..958.78 rows=27278 width=77) -> Seq Scan on movies (cost=0.00..958.78 rows=27278 width=77) (5 rows)
movies
to build hash-map in memoryratings
looking up movies
in the in-memory mapmovies
? Random disk access is slowplanner_example=# explain analyze select * from movies natural join ratings; QUERY PLAN ----------------------------------------------------------------------------------- Hash Join (cost=1299.76..603697.03 rows=20000264 width=93) (actual time=21.025..13266.715 rows=20000263 loops=1) Hash Cond: (ratings.movie_id = movies.movie_id) -> Seq Scan on ratings (cost=0.00..327393.64 rows=20000264 width=20) (actual time=0.704..3896.970 rows=20000263 loops=1) -> Hash (cost=958.78..958.78 rows=27278 width=77) (actual time=20.217..20.217 rows=27278 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 3205kB -> Seq Scan on movies (cost=0.00..958.78 rows=27278 width=77) (actual time=0.311..9.357 rows=27278 loops=1) Planning time: 0.366 ms Execution time: 14652.557 ms (8 rows)
Memory usage and estimated vs real row retrieval. (Quick check on stats.)
If the Planner decides there are few enough random accesses. With the defaults, the costs favour random access when there are fewer than 4x as many random access as pages to read.
planner_example=# create index ratings_rating_idx on ratings (rating); planner_example=# explain analyze select * from movies natural join ratings where ratings.rating = 5.0; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=56258.49..260720.52 rows=2936039 width=93) (actual time=833.523..6494.302 rows=2898660 loops=1) Hash Cond: (ratings.movie_id = movies.movie_id) -> Bitmap Heap Scan on ratings (cost=54958.74..219050.23 rows=2936039 width=20) (actual time=628.215..4853.079 rows=2898660 loops=1) Recheck Cond: (rating = '5'::double precision) Rows Removed by Index Recheck: 10559576 Heap Blocks: exact=44250 lossy=79069 -> Bitmap Index Scan on ratings_rating_idx (cost=0.00..54224.73 rows=2936039 width=0) (actual time=615.051..615.051 rows=2898660 loops=1) Index Cond: (rating = '5'::double precision) -> Hash (cost=958.78..958.78 rows=27278 width=77) (actual time=205.207..205.207 rows=27278 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 3205kB -> Seq Scan on movies (cost=0.00..958.78 rows=27278 width=77) (actual time=171.577..194.052 rows=27278 loops=1) Planning time: 0.401 ms Execution time: 6703.280 ms (13 rows) planner_example=# explain analyze select * from movies natural join ratings where ratings.rating = 5.1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.72..16.66 rows=1 width=93) (actual time=0.057..0.057 rows=0 loops=1) -> Index Scan using ratings_rating_idx on ratings (cost=0.44..8.35 rows=1 width=20) (actual time=0.056..0.056 rows=0 loops=1) Index Cond: (rating = '5.1'::double precision) -> Index Scan using movies_movie_id_idx on movies (cost=0.29..8.30 rows=1 width=77) (never executed) Index Cond: (movie_id = ratings.movie_id) Planning time: 179.430 ms Execution time: 0.112 ms (7 rows)
Note the nested loop after index
creation! Few rows from the ratings
match to not warrant a Seq
Scan
The rating = 5.0
is part of
a filter and not an index search because the cardinality on it is too high, and
the cardinality of the Index Cond: (movie_id = movies.movie_id)
too
small to warrant using it.
The ratings
pk couldn't be used because it was a b-tree index on 2 fields
and the first isn't being used
planner_example=# explain analyze select * from movies natural join ratings where ratings.rating = 5.0 and movie_id IN (1,2,3,4); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=54980.01..230085.96 rows=431 width=93) (actual time=624.737..5250.770 rows=15270 loops=1) Hash Cond: (ratings.movie_id = movies.movie_id) -> Bitmap Heap Scan on ratings (cost=54958.74..219050.23 rows=2936039 width=20) (actual time=624.551..4823.864 rows=2898660 loops=1) Recheck Cond: (rating = '5'::double precision) Rows Removed by Index Recheck: 10559576 Heap Blocks: exact=44250 lossy=79069 -> Bitmap Index Scan on ratings_rating_idx (cost=0.00..54224.73 rows=2936039 width=0) (actual time=611.547..611.547 rows=2898660 loops=1) Index Cond: (rating = '5'::double precision) -> Hash (cost=21.22..21.22 rows=4 width=77) (actual time=0.023..0.023 rows=4 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Index Scan using movies_movie_id_idx on movies (cost=0.29..21.22 rows=4 width=77) (actual time=0.008..0.015 rows=4 loops=1) Index Cond: (movie_id = ANY ('{1,2,3,4}'::integer[])) Planning time: 0.438 ms Execution time: 5252.670 ms (14 rows) planner_example=# create index ratings_movie_id_idx on ratings (movie_id); CREATE INDEX planner_example=# explain analyze select * from movies natural join ratings where ratings.rating = 5.0 and movie_id IN (1,2,3,4); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=65.95..45355.23 rows=431 width=93) (actual time=23.935..1168.402 rows=15270 loops=1) -> Index Scan using movies_movie_id_idx on movies (cost=0.29..21.22 rows=4 width=77) (actual time=0.008..0.132 rows=4 loops=1) Index Cond: (movie_id = ANY ('{1,2,3,4}'::integer[])) -> Bitmap Heap Scan on ratings (cost=65.66..11328.59 rows=491 width=20) (actual time=10.610..290.380 rows=3818 loops=4) Recheck Cond: (movie_id = movies.movie_id) Filter: (rating = '5'::double precision) Rows Removed by Filter: 18040 Heap Blocks: exact=79172 -> Bitmap Index Scan on ratings_movie_id_idx (cost=0.00..65.54 rows=3347 width=0) (actual time=6.201..6.201 rows=21857 loops=4) Index Cond: (movie_id = movies.movie_id) Planning time: 0.538 ms Execution time: 1170.279 ms (12 rows)
Different types of aggregation happen depending on the number of rows being aggregated
Nested loop keeps the ordering of the movies
pk, which
is the ordering we asked for, hence the lack of a sorting node
planner_example=# explain analyze select rating, avg(rating) from ratings group by rating order by rating; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Sort (cost=427395.25..427395.28 rows=10 width=4) (actual time=11003.378..11003.379 rows=10 loops=1) Sort Key: rating Sort Method: quicksort Memory: 25kB -> HashAggregate (cost=427394.96..427395.09 rows=10 width=4) (actual time=11003.356..11003.360 rows=10 loops=1) Group Key: rating -> Seq Scan on ratings (cost=0.00..327393.64 rows=20000264 width=4) (actual time=0.029..3739.103 rows=20000263 loops=1) Planning time: 56.621 ms Execution time: 11003.451 ms (8 rows) planner_example=# explain analyze select movie_id, avg(rating) from movies natural join ratings where movie_id IN (1,2,3,4) group by movie_id order by movie_id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=66.66..45449.57 rows=4 width=8) (actual time=699.289..1248.084 rows=4 loops=1) Group Key: movies.movie_id -> Nested Loop (cost=66.66..45434.86 rows=2933 width=8) (actual time=24.004..1209.296 rows=87429 loops=1) -> Index Only Scan using movies_movie_id_idx on movies (cost=0.29..17.22 rows=4 width=4) (actual time=0.009..0.107 rows=4 loops=1) Index Cond: (movie_id = ANY ('{1,2,3,4}'::integer[])) Heap Fetches: 0 -> Bitmap Heap Scan on ratings (cost=66.38..11320.94 rows=3347 width=8) (actual time=10.534..292.288 rows=21857 loops=4) Recheck Cond: (movie_id = movies.movie_id) Heap Blocks: exact=79172 -> Bitmap Index Scan on ratings_movie_id_idx (cost=0.00..65.54 rows=3347 width=0) (actual time=6.226..6.226 rows=21857 loops=4) Index Cond: (movie_id = movies.movie_id) Planning time: 0.531 ms Execution time: 1248.239 ms (13 rows)
planner_example=# explain analyze select avg(rating) from movies natural join ratings where movie_id IN (1,2,3,4); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=45442.19..45442.20 rows=1 width=4) (actual time=1209.008..1209.008 rows=1 loops=1) -> Nested Loop (cost=66.66..45434.86 rows=2933 width=4) (actual time=24.464..1180.629 rows=87429 loops=1) -> Index Only Scan using movies_movie_id_idx on movies (cost=0.29..17.22 rows=4 width=4) (actual time=0.097..0.192 rows=4 loops=1) Index Cond: (movie_id = ANY ('{1,2,3,4}'::integer[])) Heap Fetches: 0 -> Bitmap Heap Scan on ratings (cost=66.38..11320.94 rows=3347 width=8) (actual time=10.617..285.251 rows=21857 loops=4) Recheck Cond: (movie_id = movies.movie_id) Heap Blocks: exact=79172 -> Bitmap Index Scan on ratings_movie_id_idx (cost=0.00..65.54 rows=3347 width=0) (actual time=6.313..6.313 rows=21857 loops=4) Index Cond: (movie_id = movies.movie_id) Planning time: 0.742 ms Execution time: 1209.165 ms (12 rows)