Query Planner Overview and Best Practices

James Keener

2016-08-22

What is the Query Planner?

The Query Planner is God

Why is the Query Planner?

SQL is a Declarative language.

We never express how to access data, only the conditions and transformations we want done to it.
The Query Planner has knowledge of the hardware, but also the current state of the data.
Similar to a compiler, it turns our code into something that a "lower" system (database) can run.

Viewing "lower level" commands

EXPLAIN shows us how a query would be run if the Query Planner were to run it right then

Understanding how our "high level" program is run by the "low level" system can help us squeeze the most performance from the system.

Used to be very common among the C and Fortran crowds to examine assembly before compilers became as good, if not better than, humans.

The Query Planner is not yet better than humans

Currently much better than the days of hinting (which never existed in PG)
Postgres's Query Planner is pretty good
Only works at the query-level
- Multiple queries vs one with multiple conditions
- Contention over write access to critical rows
- Underlying computer's resource may be under load
Sometimes still get unexpected query plans, leading to poor performing queries

Some Unexpected Query Plan Causes

Often the optimizations we humans want weren't correctly expressed to the system (e.g. lack of indices, clustering, &c)
What we want requires other computations we ugly bags of mostly water didn't realize
The Query Planner doesn't have an up-to-date understanding of the data (e.g. bad table statics)

`EXPLAIN`

Our window into the Query Planner

Contains exactly what will run as well as cost estimates

Example Data

Movie Lens 20million reviews dataset

% wget -x http://files.grouplens.org/datasets/movielens/ml-20m.zip
% wget -x http://files.grouplens.org/datasets/movielens/ml-20m.zip.md5
% unzip ml-20m.zip
% createdb planner_example
% psql planner_example
psql (9.5.3, server 9.5.1)
planner_example=# create table movies (
  movie_id integer, 
  title text, 
  genres_text text, 
  genres text[]);
planner_example=# copy movies (movie_id, title, genres_text)
from
'/monongahela/movielens/files.grouplens.org/datasets/movielens/ml-20m/movies.csv'
with (format csv, header);
planner_example=# update movies set genres = regexp_split_to_array(genres_text, '\|');
planner_example=# alter table movies drop genres_text;
planner_example=# create table ratings ( 
  user_id integer not null, 
  movie_id integer not null references movies, 
  rating real not null, 
  ts bigint not null);
planner_example=# COPY ratings
  from '/root/files.grouplens.org/datasets/movielens/ml-20m/ratings.csv'
  with csv header;
planner_example=# create unique index ratings_pk on ratings (user_id, movie_id);

Execution Nodes

explain.c

Some Common Nodes

Index Scan / Index Only Scan
Bitmap Index Scan / Bitmap Heap Scan
Sequence Scan
Merge Join
Hash Join
Aggregate
Sort
Loop

Can be thought of as a DAG or Tree

Simple Example

 planner_example=# explain select * from movies;
QUERY PLAN                          
-------------------------------------------------------------
 Seq Scan on movies  (cost=0.00..958.78 rows=27278 width=77)
 (1 row)

Spice things up

 planner_example=# explain select * from movies where movie_id = 6645;
QUERY PLAN                        
----------------------------------------------------------
 Seq Scan on movies  (cost=0.00..1026.97 rows=1 width=77)
 Filter: (movie_id = 6645)
 (2 rows)

`width`

Average size in bytes of the row returned

Integers are 4-bytes and the table statistics list the average size of title to be 28, which is close to the actual average length!

planner_example=# explain select * from movies where movie_id = 6645;
                        QUERY PLAN                        
----------------------------------------------------------
 Seq Scan on movies  (cost=0.00..1026.97 rows=1 width=77)
   Filter: (movie_id = 6645)
(2 rows)

planner_example=# explain select movie_id, title from movies where movie_id = 6645;
                        QUERY PLAN                        
----------------------------------------------------------
 Seq Scan on movies  (cost=0.00..1026.97 rows=1 width=32)
   Filter: (movie_id = 6645)
(2 rows)

planner_example=# select avg(length(title)) from movies ;
         avg         
---------------------
 27.8817361976684508
(1 row)

`cost`

Computed from the number of pages needing to be read by an execution node and the number of CPU-bound operations being performed.

Parameter	Description	Default	Relative cost
random_page_cost	Non-sequential disk page fetch.	4.00	4x slower
seq_page_cost	Sequentially disk page fetch.	1.00	baseline
cpu_tuple_cost	Processing each tuple.	0.01	100x faster
cpu_index_tuple_cost	Processing each entry during an index scan.	0.005	200x faster
cpu_operator_cost	Processing each operator or function call.	0.0025	400x faster

Estimates are great, but what really happens

planner_example=# explain analyze select * from movies where movie_id = 6645;
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Seq Scan on movies  (cost=0.00..1026.97 rows=1 width=77)
                     (actual time=1.530..5.437 rows=1 loops=1)
   Filter: (movie_id = 6645)
   Rows Removed by Filter: 27277
 Planning time: 0.083 ms
 Execution time: 5.465 ms
(5 rows)

Indices!

Frequently accessed data can/should be indexed to allow for faster retrieval.
Some indices can be ordered to help with ORDER BY clauses.
For the speed up during access, there can be slow downs when updating.
Indices can also be used to enforce unique-constraints.

planner_example=# create unique index movies_movie_id_idx on movies (movie_id);
planner_example=# alter table movies add primary key using index movies_movie_id_idx ;
planner_example=# \d movies
     Table "public.movies"
  Column  |  Type   | Modifiers 
----------+---------+-----------
 movie_id | integer | not null
 title    | text    | 
 genres   | text[]  | 
Indexes:
    "movies_movie_id_idx" PRIMARY KEY, btree (movie_id)

How does it affect the Planner

Allows for directly reading the index and returning the tuples that match it.

planner_example=# explain select * from movies where movie_id = 6645;
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
 Index Scan using movies_movie_id_idx on movies  (cost=0.29..8.30 rows=1 width=77)
   Index Cond: (movie_id = 6645)
(2 rows)

Not surprising stuff

There is no index on title, so it will Seq Scan as expected

planner_example=# explain select * from movies where title like '%THX%';
                        QUERY PLAN                        
----------------------------------------------------------
 Seq Scan on movies  (cost=0.00..1026.97 rows=3 width=77)
   Filter: (title ~~ '%THX%'::text)
(2 rows)

Surprising stuff

There is an index on title and it still Seq Scans!

planner_example=# create index movies_title_idx on movies (title);
CREATE INDEX
planner_example=# \d movies
     Table "public.movies"
  Column  |  Type   | Modifiers 
----------+---------+-----------
 movie_id | integer | not null
 title    | text    | 
 genres   | text[]  | 
Indexes:
    "movies_movie_id_idx" PRIMARY KEY, btree (movie_id)
    "movies_title_idx" btree (title)

planner_example=# explain select * from movies where title like '%THX%';
                        QUERY PLAN                        
----------------------------------------------------------
 Seq Scan on movies  (cost=0.00..1026.97 rows=3 width=77)
   Filter: (title ~~ '%THX%'::text)
(2 rows)

What's happening

b-tree indices can only be used on exact, less than, or greater than matches

planner_example=# explain select * from movies where title = '%THX%';
                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Index Scan using movies_title_idx on movies  (cost=0.41..8.43 rows=1 width=77)
   Index Cond: (title = '%THX%'::text)
(2 rows)

What to do about it?

Use an index suitable for the data and condition being queried

planner_example=# create extension pg_trgm;
CREATE EXTENSION
planner_example=# create index movies_title_gist_idx 
                  on movies using gist (title gist_trgm_ops);
CREATE INDEX
planner_example=# \d movies
     Table "public.movies"
  Column  |  Type   | Modifiers 
----------+---------+-----------
 movie_id | integer | not null
 title    | text    | 
 genres   | text[]  | 
Indexes:
    "movies_movie_id_idx" PRIMARY KEY, btree (movie_id)
    "movies_title_gist_idx" gist (title gist_trgm_ops)
    "movies_title_idx" btree (title)

planner_example=# explain select * from movies where title like '%THX%';
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Bitmap Heap Scan on movies  (cost=4.30..15.74 rows=3 width=77)
   Recheck Cond: (title ~~ '%THX%'::text)
   ->  Bitmap Index Scan on movies_title_gist_idx  (cost=0.00..4.30 rows=3 width=0)
         Index Cond: (title ~~ '%THX%'::text)
(4 rows)

Bitmap Heap Scan?

planner_example=# explain select * from movies where title like '%THX%';
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Bitmap Heap Scan on movies  (cost=4.30..15.74 rows=3 width=77)
   Recheck Cond: (title ~~ '%THX%'::text)
   ->  Bitmap Index Scan on movies_title_gist_idx  (cost=0.00..4.30 rows=3 width=0)
         Index Cond: (title ~~ '%THX%'::text)
(4 rows)

No, we're not breaking out MS Paint
The Index Scan and Heap Scan are paired. The Index Scan builds the Bitmap; other methods of building it include ANDing or ORing indices
The Recheck Cond is because during execution, the Bitmap may become lossy and only remember the pages, and not exact tuples, that matched the condition. This may be based on memory usage or the type of data being examined.

Joins!

planner_example=# explain select * from movies natural join ratings;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Hash Join  (cost=1299.76..603697.03 rows=20000264 width=93)
   Hash Cond: (ratings.movie_id = movies.movie_id)
   ->  Seq Scan on ratings  (cost=0.00..327393.64 rows=20000264 width=20)
   ->  Hash  (cost=958.78..958.78 rows=27278 width=77)
         ->  Seq Scan on movies  (cost=0.00..958.78 rows=27278 width=77)
(5 rows)

Seq Scans movies to build hash-map in memory
Seq Scans ratings looking up movies in the in-memory map
Why not use the pk on movies? Random disk access is slow

More Info!

planner_example=# explain analyze select * from movies natural join ratings;
                              QUERY PLAN
-----------------------------------------------------------------------------------
 Hash Join  (cost=1299.76..603697.03 rows=20000264 width=93)
            (actual time=21.025..13266.715 rows=20000263 loops=1)
   Hash Cond: (ratings.movie_id = movies.movie_id)
   ->  Seq Scan on ratings  (cost=0.00..327393.64 rows=20000264 width=20)
                            (actual time=0.704..3896.970 rows=20000263 loops=1)
   ->  Hash  (cost=958.78..958.78 rows=27278 width=77)
             (actual time=20.217..20.217 rows=27278 loops=1)
         Buckets: 32768  Batches: 1  Memory Usage: 3205kB
         ->  Seq Scan on movies  (cost=0.00..958.78 rows=27278 width=77)
                                 (actual time=0.311..9.357 rows=27278 loops=1)
 Planning time: 0.366 ms
 Execution time: 14652.557 ms
(8 rows)

Memory usage and estimated vs real row retrieval. (Quick check on stats.)

Will indices ever be used?

If the Planner decides there are few enough random accesses. With the defaults, the costs favour random access when there are fewer than 4x as many random access as pages to read.

planner_example=# create index ratings_rating_idx on ratings (rating);
planner_example=# explain analyze select * from movies natural join ratings where ratings.rating = 5.0;
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=56258.49..260720.52 rows=2936039 width=93) (actual time=833.523..6494.302 rows=2898660 loops=1)
   Hash Cond: (ratings.movie_id = movies.movie_id)
   ->  Bitmap Heap Scan on ratings  (cost=54958.74..219050.23 rows=2936039 width=20) (actual time=628.215..4853.079 rows=2898660 loops=1)
         Recheck Cond: (rating = '5'::double precision)
         Rows Removed by Index Recheck: 10559576
         Heap Blocks: exact=44250 lossy=79069
         ->  Bitmap Index Scan on ratings_rating_idx  (cost=0.00..54224.73 rows=2936039 width=0)
                                                      (actual time=615.051..615.051 rows=2898660 loops=1)
               Index Cond: (rating = '5'::double precision)
   ->  Hash  (cost=958.78..958.78 rows=27278 width=77) (actual time=205.207..205.207 rows=27278 loops=1)
         Buckets: 32768  Batches: 1  Memory Usage: 3205kB
         ->  Seq Scan on movies  (cost=0.00..958.78 rows=27278 width=77) (actual time=171.577..194.052 rows=27278 loops=1)
 Planning time: 0.401 ms
 Execution time: 6703.280 ms
(13 rows)

planner_example=# explain analyze select * from movies natural join ratings where ratings.rating = 5.1;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.72..16.66 rows=1 width=93) (actual time=0.057..0.057 rows=0 loops=1)
   ->  Index Scan using ratings_rating_idx on ratings  (cost=0.44..8.35 rows=1 width=20) (actual time=0.056..0.056 rows=0 loops=1)
         Index Cond: (rating = '5.1'::double precision)
   ->  Index Scan using movies_movie_id_idx on movies  (cost=0.29..8.30 rows=1 width=77) (never executed)
         Index Cond: (movie_id = ratings.movie_id)
 Planning time: 179.430 ms
 Execution time: 0.112 ms
(7 rows)

Variations on a Theme

Note the nested loop after index creation! Few rows from the ratings match to not warrant a Seq Scan

The rating = 5.0 is part of a filter and not an index search because the cardinality on it is too high, and the cardinality of the Index Cond: (movie_id = movies.movie_id) too small to warrant using it.

The ratings pk couldn't be used because it was a b-tree index on 2 fields and the first isn't being used

planner_example=# explain analyze select * from movies natural join ratings where ratings.rating = 5.0 and  movie_id IN (1,2,3,4);
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=54980.01..230085.96 rows=431 width=93) (actual time=624.737..5250.770 rows=15270 loops=1)
   Hash Cond: (ratings.movie_id = movies.movie_id)
   ->  Bitmap Heap Scan on ratings  (cost=54958.74..219050.23 rows=2936039 width=20) (actual time=624.551..4823.864 rows=2898660 loops=1)
         Recheck Cond: (rating = '5'::double precision)
         Rows Removed by Index Recheck: 10559576
         Heap Blocks: exact=44250 lossy=79069
         ->  Bitmap Index Scan on ratings_rating_idx  (cost=0.00..54224.73 rows=2936039 width=0) (actual time=611.547..611.547 rows=2898660 loops=1)
               Index Cond: (rating = '5'::double precision)
   ->  Hash  (cost=21.22..21.22 rows=4 width=77) (actual time=0.023..0.023 rows=4 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Index Scan using movies_movie_id_idx on movies  (cost=0.29..21.22 rows=4 width=77) (actual time=0.008..0.015 rows=4 loops=1)
               Index Cond: (movie_id = ANY ('{1,2,3,4}'::integer[]))
 Planning time: 0.438 ms
 Execution time: 5252.670 ms
(14 rows)

planner_example=# create index ratings_movie_id_idx on ratings (movie_id);
CREATE INDEX

planner_example=# explain analyze select * from movies natural join ratings where ratings.rating = 5.0 and  movie_id IN (1,2,3,4);
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=65.95..45355.23 rows=431 width=93) (actual time=23.935..1168.402 rows=15270 loops=1)
   ->  Index Scan using movies_movie_id_idx on movies  (cost=0.29..21.22 rows=4 width=77) (actual time=0.008..0.132 rows=4 loops=1)
         Index Cond: (movie_id = ANY ('{1,2,3,4}'::integer[]))
   ->  Bitmap Heap Scan on ratings  (cost=65.66..11328.59 rows=491 width=20) (actual time=10.610..290.380 rows=3818 loops=4)
         Recheck Cond: (movie_id = movies.movie_id)
         Filter: (rating = '5'::double precision)
         Rows Removed by Filter: 18040
         Heap Blocks: exact=79172
         ->  Bitmap Index Scan on ratings_movie_id_idx  (cost=0.00..65.54 rows=3347 width=0) (actual time=6.201..6.201 rows=21857 loops=4)
               Index Cond: (movie_id = movies.movie_id)
 Planning time: 0.538 ms
 Execution time: 1170.279 ms
(12 rows)

Aggregates

Different types of aggregation happen depending on the number of rows being aggregated

Nested loop keeps the ordering of the movies pk, which is the ordering we asked for, hence the lack of a sorting node

planner_example=# explain analyze select rating, avg(rating) from ratings group by rating order by rating;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=427395.25..427395.28 rows=10 width=4) (actual time=11003.378..11003.379 rows=10 loops=1)
   Sort Key: rating
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=427394.96..427395.09 rows=10 width=4) (actual time=11003.356..11003.360 rows=10 loops=1)
         Group Key: rating
         ->  Seq Scan on ratings  (cost=0.00..327393.64 rows=20000264 width=4) (actual time=0.029..3739.103 rows=20000263 loops=1)
 Planning time: 56.621 ms
 Execution time: 11003.451 ms
(8 rows)

planner_example=# explain analyze select movie_id, avg(rating) from movies natural join ratings where  movie_id IN (1,2,3,4) group by movie_id order by movie_id;
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=66.66..45449.57 rows=4 width=8) (actual time=699.289..1248.084 rows=4 loops=1)
   Group Key: movies.movie_id
   ->  Nested Loop  (cost=66.66..45434.86 rows=2933 width=8) (actual time=24.004..1209.296 rows=87429 loops=1)
         ->  Index Only Scan using movies_movie_id_idx on movies  (cost=0.29..17.22 rows=4 width=4) (actual time=0.009..0.107 rows=4 loops=1)
               Index Cond: (movie_id = ANY ('{1,2,3,4}'::integer[]))
               Heap Fetches: 0
         ->  Bitmap Heap Scan on ratings  (cost=66.38..11320.94 rows=3347 width=8) (actual time=10.534..292.288 rows=21857 loops=4)
               Recheck Cond: (movie_id = movies.movie_id)
               Heap Blocks: exact=79172
               ->  Bitmap Index Scan on ratings_movie_id_idx  (cost=0.00..65.54 rows=3347 width=0) (actual time=6.226..6.226 rows=21857 loops=4)
                     Index Cond: (movie_id = movies.movie_id)
 Planning time: 0.531 ms
 Execution time: 1248.239 ms
(13 rows)

Sample of an aggregate on the whole table

planner_example=# explain analyze select avg(rating) from movies natural join ratings where  movie_id IN (1,2,3,4);
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=45442.19..45442.20 rows=1 width=4) (actual time=1209.008..1209.008 rows=1 loops=1)
   ->  Nested Loop  (cost=66.66..45434.86 rows=2933 width=4) (actual time=24.464..1180.629 rows=87429 loops=1)
         ->  Index Only Scan using movies_movie_id_idx on movies  (cost=0.29..17.22 rows=4 width=4) (actual time=0.097..0.192 rows=4 loops=1)
               Index Cond: (movie_id = ANY ('{1,2,3,4}'::integer[]))
               Heap Fetches: 0
         ->  Bitmap Heap Scan on ratings  (cost=66.38..11320.94 rows=3347 width=8) (actual time=10.617..285.251 rows=21857 loops=4)
               Recheck Cond: (movie_id = movies.movie_id)
               Heap Blocks: exact=79172
               ->  Bitmap Index Scan on ratings_movie_id_idx  (cost=0.00..65.54 rows=3347 width=0) (actual time=6.313..6.313 rows=21857 loops=4)
                     Index Cond: (movie_id = movies.movie_id)
 Planning time: 0.742 ms
 Execution time: 1209.165 ms
(12 rows)

Take Aways

Looking at how a query is being executed can help us understand how to improve its performance
Sometimes there is no quick-and-easy solution
Indices are not always useful for all types lookups