About Me!

This blog is about my musings and thoughts. I hope you find it useful, at most, and entertaining, at least.

Résumé [PDF]

Other Pages



Oak Island

Items for Sale

Presence Elsewhere









Fun and Interesting Waypoints

Date: 20160721

Tags: trivia air-travel

None of the maps or data presented here should be used for actual navigation

At lunch yesterday some coworkers and I got onto the topic flight waypoints and I was reminded of funny air waypoints. While waiting for some code to run, I decided to look up some for Pittsburgh. I found the STARs and IAPs and saw obvious gems like TERBL TOWEL, MYRON COPPE, PENGN GUINZ, RIVERZ, and HEINZ. I knew I had to look deeper! I had some trouble finding a list of waypoints, but stumbled upon fplan and its 2015 database. With a little `sed` magic, I was able to extract a file qgis could read. I quickly found a handful more!

I attempted to catalouge those as well as some interesting names. The theory on why each of the blue beacons is named as it is are my own. If you have better data or a good theory for some others, let me know!

ID Type Thoughts
COPPE RNAV-WP Myron Cope (see MYRON)
CRSBY RNAV-WP Sidney Crosby (see CYDNY)
CYDNY RNAV-WP Sidney Crosby (See CRSBY)
DIXIN REP-PT Mason-Dixon Line
FLURY RNAV-WP Marc Andre Fleury
GOLDH RNAV-WP Golden Triangle? (Not sure at all)
GUINZ RNAV-WP Penguins Hockey (see PENGN)
JOEPA RNAV-WP Joe Paterno :-\
JOHNB REP-PT Former Mayor of Pittsburgh / Constitutional Convention Deligate
KEDKE RNAV-WP KDKA (1st Commerical radio station)?
LEMEW RNAV-WP Mario Lemieux
LMBRT RNAV-WP Jack Lambert
MLKIN RNAV-WP Yevgenie Mulkin
MYRON RNAV-WP Myron Cope (see COPPE)
PEETE REP-PT Pengiuns’ first mascot??????? (not sure at all)
PENGN RNAV-WP Penguins Hockey (see GUINZ)
PITTZ REP-PT Pittsburgh
RACOO REP-PT Racoon Creek State Park
RIVRZ RNAV-WP Three Rivers
STARG REP-PT Willie Stargell (see WIILE)
STILR RNAV-WP Pittsburgh Steelers
TERBL RNAV-WP Terrible Towel (see TOWEL)
TOWEL RNAV-WP Terrible Towel (see TERBL)
WIILE REP-PT Willie Stargell (see STARG)
WOLES RNAV-WP Wholey’s ?
WYLER REP-PT Heinz Brand??
YINZZ RNAV-WP Pittsburgheese Plural You

Interpolating Old Census Data

Date: 20160315

Tags: census postgis postgres gis


A group I’m part of is trying to learn about racial displacement around Pittsburgh. To do so we looked at the census data. We quickly learned that the census geometries (e.g. blocks and tracts) aren’t static between censuses.

While this isn’t entirely surprising, it does complicate the matter of comparing the demographics of blocks and tracts, making it more difficult to compute the overlaps easily.

I was looking at the Longitudinal Tract Database and clicked maps and didn’t like the terms there. (I later found that the LTDB has a separate download section but I couldn’t find the terms of use there. Also, while I can appreciate SPSS being very common it upsets me to see people continuing to distribute data in non-free formats. Thanks be for PSPP !

Regardless, I couldn’t find terms of use and many of their menus don’t work, so I decided to figure this out on my own. Looking further into the issue, I found that the census publishes a series of Relationship Files that contains which geometries were merged and split between 2000 and 2010. Moreover, it contains the area of each AND the area of the intersection between them. (The intersection part is important, because as can be seen in the image over, the geometries don’t align properly and the 2000 geometries are “rougher” than the 2010 ones anyway. So It would be much more difficult to do the intersection area based on the published geometries.)


Having the intersection areas allows us to do “areal interpolation.” I used a simple method of $$ attribute_{target} = attribute_{source} * \frac {area_{intersection}}{area_{source}} $$. Since each geometries is only overlapping at most a handful of immediately adjacent ones, and often with 10% or more of its total area, I felt this simpler method was warranted.

Now on to the creation of the interpolation layers. First we need census data to interpolate. I’m in Pennsylvania, so we’ll focus on it. I was also specifically focusing on Race, which means all my examples are for Census 2010 Table P3 and Census 2000 Table P7

Importing the Census 2000 Summary Files

wget -x -c -m -r -np http://www2.census.gov/census_2000/datasets/Summary_File_1/Pennsylvania/ will pull all the tables for Summary File 1 which is unsampled data. (Summary File 3 is a 1-in-6 sample and contains information that was later moved into the ACS .)

I then gathered the schema for the 2000 Census from various files such as the MS Access Shell and SAS Programs. (There is a Free Software readers for MS Access called MDB Tools [Warning: SourceForge Link]. Dap is an application that can read SAS files and do much of what SAS can. For my purposes, I was able to get what I needed directly from the SAS program file itself.)

I’ve been using PostgreSQL, but you should be able to modify the queries if you choose to use another RDBMS.

For the sake of time, I created a schema file (it contains all tables, but not all indecies a full import would want) and an import query for the geoheader. Remember to import the sf1geo file into the census2000.tmp_geoheader first and then run the import query. I find it easier to work in fips ids, so I create them, as can be seen as te last line in the import query.

The census data files, once unziped, can be imported via a COPY command (\copy in psql), e.g. \copy census2000.sf10001 from "/mnt/tmp/www2.census.gov/census_2000/datasets/Summary_File_1/Pennsylvania/pa00001.uf1"

Importing the 2010 Census data is similar. I’ll dedicate another post to the details of the census and ACS imports.

I wasn’t able to find broken out views or human-readable-field-name views, so I’m working on find a file similar to the 2010 field id to name mapping put out by the census. Alternatively, I might attempt to extract that information from the text of the SF1 Technical Document or from FactFinder . Either way, that’s for another blog post!

Create Interpolations

Relationship Files

I then grabbed the Relationship File for the blocks, created this schema and used a COPY command to import the file. From there I ran a query to update the conversion factors (to make later queries simpler).

update public.tabblock_00_10 set c10_00 = case when arealand_2010 = 0 then 0 else arealand_int::double precision / arealand_2010::double precision end,
                                 c00_10 = case when arealand_2000 = 0 then 0 else arealand_int::double precision / arealand_2000::double precision end;


First I began by doing the interpolation at the following summary levels [PDF] (HTML table without the block level code):

Summary Level Description
101 Block Level (State-County-County Subdivision-Place/Remainder-Census Tract-Block Group-Block)
140 State-County-Census Tract
050 State-County
040 State

create table c00p7proj2010 as select

state_2010 as state, county_2010 as county, tract_2010 as tract, blk_2010 as block, concat(state_2010, county_2010, tract_2010, blk_2010) as geoid10, sum(P007001 * c00_10) AS Total_population, sum(P007002 * c00_10) AS White_alone, sum(P007003 * c00_10) AS Black_or_African_American_alone, sum(P007004 * c00_10) AS American_Indian_and_Alaska_Native_alone, sum(P007005 * c00_10) AS Asian_alone, sum(P007006 * c00_10) AS Native_Hawaiian_and_Other_Pacific_Islander_alone, sum(P007007 * c00_10) AS Some_Other_Race_alone, sum(P007008 * c00_10) AS Two_or_More_Races from census2000.sf10002 inner join census2000.sf1geo using (logrecno) inner join tabblock_00_10 using (geoid00) where sumlev in (‘101’) group by rollup(state_2010, county_2010, tract_2010, blk_2010);

The table c00p7proj2010 now contains the Census 2000 counts interpolated into the Census 2010 blocks. Now I wanted to compute the values for geometries higher in the hierarchy [PDF] made up of the blocks we just computed. “Why not just interpolate each individually?” you may be asking. The reason is that if a tract were split in such a way that an area of low population (e.g. a park) was made its own tract or moved to a different tract, the interpolation would move people who never lived there to the other tract. By performing the interpolation at the smalled possible geometry, this issue is largely avoided as areas of low population are their own block, not incorporated into others.

create table c00p7proj2010_rollup as select

state, county, tract, block, concat(state, county, tract, block) as geoid10, sum(Total_population) AS Total_population, sum(White_alone) AS White_alone, sum(Black_or_African_American_alone) AS Black_or_African_American_alone, sum(American_Indian_and_Alaska_Native_alone) AS American_Indian_and_Alaska_Native_alone, sum(Asian_alone) AS Asian_alone, sum(Native_Hawaiian_and_Other_Pacific_Islander_alone) AS Native_Hawaiian_and_Other_Pacific_Islander_alone, sum(Some_Other_Race_alone) AS Some_Other_Race_alone, sum(Two_or_More_Races) AS Two_or_More_Races from c00p7proj2010 group by rollup(state, county, tract, block);

In order to deal with the tables that are the same name, we’re going to tease out the fields we want from the 2010 Census.

create table c10p3 as select

state, county, tract, block, geoid10, P0030001 AS Total_population, P0030002 AS White_alone, P0030003 AS Black_or_African_American_alone, P0030004 AS American_Indian_and_Alaska_Native_alone, P0030005 AS Asian_alone, P0030006 AS Native_Hawaiian_and_Other_Pacific_Islander_alone, P0030007 AS Some_Other_Race_alone, P0030008 AS Two_or_More_Races from census2010.sf1_00003 inner join census2010.geo_header_sf1 using (logrecno) where sumlev in (‘101’, ‘140’, ‘050’, ‘040’);

Now we want to compute the percent each group makes of the total population.

create table c10p3_percent as select

geoid10, Total_population, Total_population::double precision / Total_population::double precision AS Total_population_percent, White_alone, White_alone / Total_population::double precision AS White_alone_percent, Black_or_African_American_alone, Black_or_African_American_alone / Total_population::double precision AS Black_or_African_American_alone_percent, American_Indian_and_Alaska_Native_alone, American_Indian_and_Alaska_Native_alone / Total_population::double precision AS American_Indian_and_Alaska_Native_alone_percent, Asian_alone, Asian_alone / Total_population::double precision AS Asian_alone_percent, Native_Hawaiian_and_Other_Pacific_Islander_alone, Native_Hawaiian_and_Other_Pacific_Islander_alone / Total_population::double precision AS Native_Hawaiian_and_Other_Pacific_Islander_alone_percent, Some_Other_Race_alone, Some_Other_Race_alone / Total_population::double precision AS Some_Other_Race_alone_percent, Two_or_More_Races, Two_or_More_Races / Total_population::double precision AS Two_or_More_Races_percent from c10p3 where Total_population::double precision > 0 ;

create table c00p7proj2010_percent as select

geoid10, Total_population, Total_population / Total_population AS Total_population_percent, White_alone, White_alone / Total_population AS White_alone_percent, Black_or_African_American_alone, Black_or_African_American_alone / Total_population AS Black_or_African_American_alone_percent, American_Indian_and_Alaska_Native_alone, American_Indian_and_Alaska_Native_alone / Total_population AS American_Indian_and_Alaska_Native_alone_percent, Asian_alone, Asian_alone / Total_population AS Asian_alone_percent, Native_Hawaiian_and_Other_Pacific_Islander_alone, Native_Hawaiian_and_Other_Pacific_Islander_alone / Total_population AS Native_Hawaiian_and_Other_Pacific_Islander_alone_percent, Some_Other_Race_alone, Some_Other_Race_alone / Total_population AS Some_Other_Race_alone_percent, Two_or_More_Races, Two_or_More_Races / Total_population AS Two_or_More_Races_percent from c00p7proj2010 where Total_population > 0;

Finally, we create a table that contains the absolute differences in population along with the difference in percentage (as percentage points, i.e. not a ratio).

create table diff_00p10_10_00p7_10p3 as select

geoid10, c10p3.Total_population AS c10p3_Total_population, c00p7.Total_population AS c00p7_Total_population, c10p3.Total_population-c00p7.Total_population AS diff_Total_population, c10p3.Total_population_percent AS c10p3_Total_population_percent, c00p7.Total_population_percent AS c00p7_Total_population_percent, c10p3.Total_population_percent-c00p7.Total_population_percent AS diff_Total_population_percent, c10p3.White_alone AS c10p3_White_alone, c00p7.White_alone AS c00p7_White_alone, c10p3.White_alone-c00p7.White_alone AS diff_White_alone, c10p3.White_alone_percent AS c10p3_White_alone_percent, c00p7.White_alone_percent AS c00p7_White_alone_percent, c10p3.White_alone_percent-c00p7.White_alone_percent AS diff_White_alone_percent, c10p3.Black_or_African_American_alone AS c10p3_Black_or_African_American_alone, c00p7.Black_or_African_American_alone AS c00p7_Black_or_African_American_alone, c10p3.Black_or_African_American_alone-c00p7.Black_or_African_American_alone AS diff_Black_or_African_American_alone, c10p3.Black_or_African_American_alone_percent AS c10p3_Black_or_African_American_alone_percent, c00p7.Black_or_African_American_alone_percent AS c00p7_Black_or_African_American_alone_percent, c10p3.Black_or_African_American_alone_percent-c00p7.Black_or_African_American_alone_percent AS diff_Black_or_African_American_alone_percent, c10p3.American_Indian_and_Alaska_Native_alone AS c10p3_American_Indian_and_Alaska_Native_alone, c00p7.American_Indian_and_Alaska_Native_alone AS c00p7_American_Indian_and_Alaska_Native_alone, c10p3.American_Indian_and_Alaska_Native_alone-c00p7.American_Indian_and_Alaska_Native_alone AS diff_American_Indian_and_Alaska_Native_alone, c10p3.American_Indian_and_Alaska_Native_alone_percent AS c10p3_American_Indian_and_Alaska_Native_alone_percent, c00p7.American_Indian_and_Alaska_Native_alone_percent AS c00p7_American_Indian_and_Alaska_Native_alone_percent, c10p3.American_Indian_and_Alaska_Native_alone_percent-c00p7.American_Indian_and_Alaska_Native_alone_percent AS diff_American_Indian_and_Alaska_Native_alone_percent, c10p3.Asian_alone AS c10p3_Asian_alone, c00p7.Asian_alone AS c00p7_Asian_alone, c10p3.Asian_alone-c00p7.Asian_alone AS diff_Asian_alone, c10p3.Asian_alone_percent AS c10p3_Asian_alone_percent, c00p7.Asian_alone_percent AS c00p7_Asian_alone_percent, c10p3.Asian_alone_percent-c00p7.Asian_alone_percent AS diff_Asian_alone_percent, c10p3.Native_Hawaiian_and_Other_Pacific_Islander_alone AS c10p3_Native_Hawaiian_and_Other_Pacific_Islander_alone, c00p7.Native_Hawaiian_and_Other_Pacific_Islander_alone AS c00p7_Native_Hawaiian_and_Other_Pacific_Islander_alone, c10p3.Native_Hawaiian_and_Other_Pacific_Islander_alone-c00p7.Native_Hawaiian_and_Other_Pacific_Islander_alone AS diff_Native_Hawaiian_and_Other_Pacific_Islander_alone, c10p3.Native_Hawaiian_and_Other_Pacific_Islander_alone_percent AS c10p3_Native_Hawaiian_and_Other_Pacific_Islander_alone_percent, c00p7.Native_Hawaiian_and_Other_Pacific_Islander_alone_percent AS c00p7_Native_Hawaiian_and_Other_Pacific_Islander_alone_percent, c10p3.Native_Hawaiian_and_Other_Pacific_Islander_alone_percent-c00p7.Native_Hawaiian_and_Other_Pacific_Islander_alone_percent AS diff_Native_Hawaiian_and_Other_Pacific_Islander_alone_percent, c10p3.Some_Other_Race_alone AS c10p3_Some_Other_Race_alone, c00p7.Some_Other_Race_alone AS c00p7_Some_Other_Race_alone, c10p3.Some_Other_Race_alone-c00p7.Some_Other_Race_alone AS diff_Some_Other_Race_alone, c10p3.Some_Other_Race_alone_percent AS c10p3_Some_Other_Race_alone_percent, c00p7.Some_Other_Race_alone_percent AS c00p7_Some_Other_Race_alone_percent, c10p3.Some_Other_Race_alone_percent-c00p7.Some_Other_Race_alone_percent AS diff_Some_Other_Race_alone_percent, c10p3.Two_or_More_Races AS c10p3_Two_or_More_Races, c00p7.Two_or_More_Races AS c00p7_Two_or_More_Races, c10p3.Two_or_More_Races-c00p7.Two_or_More_Races AS diff_Two_or_More_Races from c10p3_percent as c10p3 inner join c00p7proj2010_percent as c00p7 using (geoid10);

In order to display the values on a map, we need to join the values with the Census 2010 Tiger geometries.

create or replace view blk_00p10_10_00p7_10p3_geom as select * from diff_00p10_10_00p7_10p3 inner join tiger2010.tabblock using (geoid10);
create or replace view tract_00p10_10_00p7_10p3_geom as select * from diff_00p10_10_00p7_10p3 inner join tiger2010.tract using (geoid10);

2016 New Years Resolutions

Date: 20160125

Tags: resolutions time-management

Starting December 31st and running to the 11th of January I spent some time learning emac’s orgmode and thinking out this year. What would I like to accomplish? More importantly, though, What do I have time to accomplish?

Understanding PostgreSQL's EXPLAIN output and Query Planner

Date: 20150907

Tags: postgres development

We’ll begin our exploration of the EXPLAIN (more information from the docs) clause and query planner by downloading the MovieLens 20 million ratings file. The README.txt file contained in the zip has informational we’ll use to build our test. (Note: The zip is 132 MB. I wanted a large dataset for reasons we’ll get into.)

(Note: While the EXPLAIN output and syntax shouldn’t change drastically, I would note that all the output comes from PostgreSQL 9.3.9.)

jim@lilith /monongahela/movielens% wget -x http://files.grouplens.org/datasets/movielens/ml-20m.zip
jim@lilith /monongahela/movielens% wget -x http://files.grouplens.org/datasets/movielens/ml-20m.zip.md5
jim@lilith /monongahela/movielens/files.grouplens.org/datasets/movielens% [ "`md5sum ml-20m.zip | awk '{print $1}'`" = "`cat ml-20m.zip.md5 | awk '{print $4}'`" ] && echo "OK"
jim@lilith /monongahela/movielens/files.grouplens.org/datasets/movielens% unzip ml-20m.zip

Then we’ll create the database and the movies table. (We’ll get to the other tables later.) We’ll then import the csv file directly. Lastly, we’ll turn the list of genres each movie has, stored as a pipe-delimeted string, into an text array field.

jim@lilith /monongahela/movielens% createdb planner_example
jim@lilith /monongahela/movielens% psql planner_example
[local]:5432 jim@planner_example=# create table movies (
  movie_id integer, 
  title text, 
  genres_text text, 
  genres text[]);
[local]:5432 jim@planner_example=# copy movies (movie_id, title, genres_text) from '/monongahela/movielens/files.grouplens.org/datasets/movielens/ml-20m/movies.csv' with (format csv, header);
[local]:5432 jim@planner_example=# update movies set genres = regexp_split_to_array(genres_text, '\|');
[local]:5432 jim@planner_example=# alter table movies drop genres_text;

To start, we will do a simple selection by the movie_id. (Note: We havn’t yet created an index, let alone a primary key). Then we’ll want to look at the EXPLAIN output.

[local]:5432 jim@planner_example=# select * from movies where movie_id = 6645;
 movie_id |      title      |             genres              
     6645 | THX 1138 (1971) | {Action,Adventure,Drama,Sci-Fi}
(1 row)
Time: 7.989 ms
[local]:5432 jim@planner_example=# explain select * from movies where movie_id = 6645;
                        QUERY PLAN                        
 Seq Scan on movies  (cost=0.00..1026.97 rows=1 width=77)
   Filter: (movie_id = 6645)
(2 rows)
Time: 0.493 ms

The first thing to notice about the EXPLAIN output is that it’s named “Query Plan” and the second is that it has two lines, the second being indented. The default output1 indents additional steps that need to perform the step above it. Here, we are told that this query will do a “Seq Scan”, a.k.a. Sequence Scan, meaning that it will scan every row in the table in sequence. The indented row tells us that the results of the Seq Scan will be filtered, and what the filter is.

The Seq Scan row has some additional information attached to it: cost, rows, and width. Let’s take these backwards, as cost is the most complex to describe. Width is the approximate average size in bytes of the returned rows.

[local]:5432 jim@planner_example=# explain select title from movies where movie_id = 6645;
                        QUERY PLAN                        
 Seq Scan on movies  (cost=0.00..1026.97 rows=1 width=28)
   Filter: (movie_id = 6645)
(2 rows)
Time: 0.587 ms
[local]:5432 jim@planner_example=# explain select genres from movies where movie_id = 6645;
                        QUERY PLAN                        
 Seq Scan on movies  (cost=0.00..1026.97 rows=1 width=45)
   Filter: (movie_id = 6645)
(2 rows)
Time: 0.521 ms
[local]:5432 jim@planner_example=# explain select movie_id, title from movies where movie_id = 6645;
                        QUERY PLAN                        
 Seq Scan on movies  (cost=0.00..1026.97 rows=1 width=32)
   Filter: (movie_id = 6645)
(2 rows)
Time: 0.568 ms

In the first query we see that the integer type is 4-bytes long, which isn’t entirely unexpected. The title is a text field, which can be arbitrarily long. We see that the table statistics kept by PostgreSQL have it listed as 28-bytes long on average.

[local]:5432 jim@planner_example=# select avg(length(title)) from movies ;
(1 row)
Time: 23.780 ms

Those statistics aren’t half bad! The final query, which selects only the movie_id and title has a width of 32, which not-so-suprisingly is 4+28. The width in our original query is 77, again, which is 4+28+45.

Rows is a guess at how many rows will be returned. PostgreSQL keeps statistics on the number of total rows and the cardnality of the values in the row, along with other information, to expect how many rows will be returned. This is important because part of cost, which we cover next, is how much disk IO will be performed. Returned the majority of the table has different IO needs than returning a handful of rows.

Cost is made of two values: start-up cost and approximate cost to complete the step. Cost is not a direct measure of time, but should be proportional. The values configuration constants used by the planner are as follows

Parameter Description Default Relative cost vs seq_page_cost
random_page_cost Nonsequential disk page fetch. 4.00 4x slower
seq_page_cost Sequentially disk page fetch. 1.00
cpu_tuple_cost Processing each tuple. 0.01 100x faster
cpu_index_tuple_cost Processing each entry during an index scan. 0.005 200x faster
cpu_operator_cost Processing each operator or function call. 0.0025 400x faster

Some vocabulary:

  • Tuple is a row of information.
  • Disk page is a block of information that can be read from the disk in a single operation; this may contain multiple tuples.

Each operation that the Query Planner performs is broken into some multiple of each of these operations. As mentioned earlier, the planner attempts to estimate the number of rows returned. This information is partly used to predict if sequential or nonsequential reads will be required. If it becomes cheaper to read the entire table from disk than to read a lot of random table pages, the planner may opt to do so — manifesting in a Seq Scan as opposed to an Index Scan, for exmaple.

These values, taken from many years of experience, can be tweeked in postgresql.conf, though I wouldn’t recommend that unless you are very sure of what you’re doing. For instance, an SSD may have a random-access cost similar to a sequential read.

Additionally, this is why I opted to do this tutorial on such a large example set. When the table size is small, it is often faster to Seq Scan, than it is to read the index and then read the tuples required. When the planner thinks it’ll be reading the majority of the rows anyway, it may forgo indices, even in very large tables, for the sake of decreasing the total amount of disk IO.

Since movie_id is supposed to be the primary key, let’s create a unique index and set it as the primary key.

[local]:5432 jim@planner_example=# create unique index on movies (movie_id);
[local]:5432 jim@planner_example=# alter table movies add primary key using index movies_movie_id_idx ;
[local]:5432 jim@planner_example=# \d movies
     Table "public.movies"
  Column  |  Type   | Modifiers 
 movie_id | integer | not null
 title    | text    | 
 genres   | text[]  | 
    "movies_movie_id_idx" PRIMARY KEY, btree (movie_id)

Now, let’s rerun the query and see what’s changed.

[local]:5432 jim@planner_example=# select * from movies where movie_id = 6645; movie_id |      title      |             genres              
     6645 | THX 1138 (1971) | {Action,Adventure,Drama,Sci-Fi}
(1 row)
Time: 0.737 ms

The first thing we should notice is that the query ran 10x faster! Let’s look at the explain.

[local]:5432 jim@planner_example=# explain select * from movies where movie_id = 6645;
                                    QUERY PLAN                                     
 Index Scan using movies_movie_id_idx on movies  (cost=0.29..8.30 rows=1 width=77)
   Index Cond: (movie_id = 6645)
(2 rows)
Time: 0.551 ms

We should notice that that Seq Scan from before has been transformed to an Index Scan, which additionally tells us the index being used, and which condition is being used in the index (the indented Index Cond line). The index scan allows us to not read pages from disk and skip directly to the one that will fulfull our query. This is evident in the cost: over 1000 to 8.

Another use of EXPLAIN is EXPLAIN ANALYZE, which actually runs the query and compares the actual values to the predicted ones.

[local]:5432 jim@planner_example=# explain analyze select * from movies where movie_id = 6645;
                                                         QUERY PLAN                                                          
 Index Scan using movies_movie_id_idx on movies  (cost=0.29..8.30 rows=1 width=77) (actual time=0.023..0.025 rows=1 loops=1)
   Index Cond: (movie_id = 6645)
 Total runtime: 0.057 ms
(3 rows)
Time: 0.561 ms

Again, cost and actual time are in different units and will never be identical — cost is unitless and actual time is in ms. There is a rows value which is the same as in the predicted values. A hint here is that if the predicted number of rows is vastly different from the actual, a VACUUM ANALYZE may need to be run to update table and index statistics. The predicted number of rows won’t often match the actual rows exactly, but should be close. There is an additional value in the actual paramters: loops. This is an indication of how many times this part of the plan was run. We’ll come back to this value.

Now, let’s look at searching on the title.

[local]:5432 jim@planner_example=# select * from movies where title like '%THX%';
 movie_id |                  title                   |             genres              
     6645 | THX 1138 (1971)                          | {Action,Adventure,Drama,Sci-Fi}
   116157 | Electronic Labyrinth THX 1138 4EB (1967) | {Sci-Fi}
(2 rows)
Time: 14.442 ms

We see that the full-length motion picture and the short film that inspired it. Let’s see what the planner did to do the search.

[local]:5432 jim@planner_example=# explain select * from movies where title like '%THX%';
                        QUERY PLAN                        
 Seq Scan on movies  (cost=0.00..1026.97 rows=3 width=77)
   Filter: (title ~~ '%THX%'::text)
(2 rows)
Time: 0.580 ms

PostgreSQL does a Seq Scan, which isn’t surprising considering we haven’t created an index on the title field. Let’s create an index and see what happens.

[local]:5432 jim@planner_example=# create index on movies (title);
Time: 205.577 ms
[local]:5432 jim@planner_example=# \d movies
     Table "public.movies"
  Column  |  Type   | Modifiers 
 movie_id | integer | not null
 title    | text    | 
 genres   | text[]  | 
    "movies_movie_id_idx" PRIMARY KEY, btree (movie_id)
    "movies_title_idx" btree (title)
[local]:5432 jim@planner_example=# select * from movies where title like '%THX%';
 movie_id |                  title                   |             genres              
     6645 | THX 1138 (1971)                          | {Action,Adventure,Drama,Sci-Fi}
   116157 | Electronic Labyrinth THX 1138 4EB (1967) | {Sci-Fi}
(2 rows)
Time: 14.207 ms
[local]:5432 jim@planner_example=# explain select * from movies where title like '%THX%';
                        QUERY PLAN                        
 Seq Scan on movies  (cost=0.00..1026.97 rows=3 width=77)
   Filter: (title ~~ '%THX%'::text)
(2 rows)
Time: 0.738 ms

It doesn’t use the index! What!

Taking a look at table description, we see that movies_title_idx is a btree index. These indecies are fast to use and update, but can only handle equality and inequality comparisions (<, <=, =, >=, and >), not set contains or wild-card searches. Luckily! there is the pg_trgm (trgm = trigram) which creates an index based on all substrings up-to 3 characters long: Widdershins is broken into ‘e’, ‘n’, ‘W’, ‘r’, ‘h’, ‘i’, ‘d’, ‘s’, ‘ns’, ‘er’, ‘Wi’, ‘in’, ‘de’, ‘dd’, ‘rs’, ‘hi’, ‘id’, ‘sh’, ‘dde’, ‘ers’, ‘shi’, ‘der’, ‘ins’, ‘hin’, ‘rsh’, ‘idd’, and ‘Wid’.

[local]:5432 jim@planner_example=# create extension pg_trgm;
[local]:5432 jim@planner_example=# create index on movies using gist (title gist_trgm_ops);
[local]:5432 jim@planner_example=# \d movies
     Table "public.movies"
  Column  |  Type   | Modifiers 
 movie_id | integer | not null
 title    | text    | 
 genres   | text[]  | 
    "movies_movie_id_idx" PRIMARY KEY, btree (movie_id)
    "movies_title_idx" btree (title)
    "movies_title_idx1" gist (title gist_trgm_ops)
[local]:5432 jim@planner_example=# drop index movies_title_idx;
[local]:5432 jim@planner_example=# select * from movies where title like '%THX%';
 movie_id |                  title                   |             genres              
     6645 | THX 1138 (1971)                          | {Action,Adventure,Drama,Sci-Fi}
   116157 | Electronic Labyrinth THX 1138 4EB (1967) | {Sci-Fi}
(2 rows)
Time: 10.321 ms
[local]:5432 jim@planner_example=# explain select * from movies where title like '%THX%';
                                   QUERY PLAN                                   
 Bitmap Heap Scan on movies  (cost=4.30..15.74 rows=3 width=77)
   Recheck Cond: (title ~~ '%THX%'::text)
   -&gt;  Bitmap Index Scan on movies_title_idx1  (cost=0.00..4.30 rows=3 width=0)
         Index Cond: (title ~~ '%THX%'::text)
(4 rows)
Time: 0.648 ms

There is a lot going on there. pg_trgm is a built-in extension, meaning that, like hstore, it needs enabled with CREATE EXTENSION, but that 3rd part software need not be installed. Next, we’re using GiST index. While I’ll leave an in-depth examination of the types of indices for another post, I will say that b-trees are tree-like structure that keeps data in sorted order and are fast to update and use; GiST indices can keep arbitrary, multi-dimentional indices which are more flexible, a little slower to used than b-trees, and much slower to update. (Note: GiST isn’t an index, actually. It’s an interface for an index to be presented as and therefor may have different performance characteristics for different types. However, all types should offer the same types of operations that can be accelerated by the index.)

We see our query runs a little faster, but not significantly faster. Let’s take a look at what steps go into performing our query. First, we see that a “Bitmap Heap Scan” is being done. Looking beneath, to the -&gt; we see that to do the Bitmap Heap Scan, we need to do the Bitmap Index Scan, which traverses the index and ascertains if a disk page contains a tuple matching the Index Cond, additionally, we are told what index and what condition is being used. The Recheck Cond done by the Bitmap Heap Scan is what condition is used to filter the correct tuple from the disk page (remember! the Bitmap Index Scan only tells us a tuple is on the page, not where!)


1 PostgreSQL has multiple output formats for EXPLAIN, some of which are suitable for consumption by other tools.

[local]:5432 jim@planner_example=# explain (format xml) select * from movies where movie_id = 6645;
                        QUERY PLAN                        
 &lt;explain xmlns="http://www.postgresql.org/2009/explain"&gt;+
   &lt;Query&gt;                                               +
     &lt;Plan&gt;                                              +
       &lt;Node-Type&gt;Index Scan&lt;/Node-Type&gt;                 +
       &lt;Scan-Direction&gt;Forward&lt;/Scan-Direction&gt;          +
       &lt;Index-Name&gt;movies_movie_id_idx&lt;/Index-Name&gt;      +
       &lt;Relation-Name&gt;movies&lt;/Relation-Name&gt;             +
       &lt;Alias&gt;movies&lt;/Alias&gt;                             +
       &lt;Startup-Cost&gt;0.29&lt;/Startup-Cost&gt;                 +
       &lt;Total-Cost&gt;8.30&lt;/Total-Cost&gt;                     +
       &lt;Plan-Rows&gt;1&lt;/Plan-Rows&gt;                          +
       &lt;Plan-Width&gt;77&lt;/Plan-Width&gt;                       +
       &lt;Index-Cond&gt;(movie_id = 6645)&lt;/Index-Cond&gt;        +
     &lt;/Plan&gt;                                             +
   &lt;/Query&gt;                                              +
(1 row)
Time: 2.919 ms
[local]:5432 jim@planner_example=# explain (format json) select * from movies where movie_id = 6645;
                 QUERY PLAN                 
 [                                         +
   {                                       +
     "Plan": {                             +
       "Node Type": "Index Scan",          +
       "Scan Direction": "Forward",        +
       "Index Name": "movies_movie_id_idx",+
       "Relation Name": "movies",          +
       "Alias": "movies",                  +
       "Startup Cost": 0.29,               +
       "Total Cost": 8.30,                 +
       "Plan Rows": 1,                     +
       "Plan Width": 77,                   +
       "Index Cond": "(movie_id = 6645)"   +
     }                                     +
   }                                       +
(1 row)
Time: 0.601 ms
[local]:5432 jim@planner_example=# explain (format yaml) select * from movies where movie_id = 6645;
              QUERY PLAN               
 - Plan:                              +
     Node Type: "Index Scan"          +
     Scan Direction: "Forward"        +
     Index Name: "movies_movie_id_idx"+
     Relation Name: "movies"          +
     Alias: "movies"                  +
     Startup Cost: 0.29               +
     Total Cost: 8.30                 +
     Plan Rows: 1                     +
     Plan Width: 77                   +
     Index Cond: "(movie_id = 6645)"
(1 row)
Time: 0.537 ms

(The +‘s at the end of the line represent a new line and would not be returned in a programmatic fashion, such as your favorite programming languages postgresql library.)

These formats allow for nesting of the query plan as well. (Ed Note: Replace these with a better example when we get to that point).

Eddie Bauer Langley Set

Date: 20150821

Tags: furniture

I sent this to Target and Dorel Living because of the experience I’ve had with their dresser.

My wife and I received a Langley Dresser and Crib in expectation of our first child. The assembly of the crib was made simple, sturdy, and quick by the length of the bolts, and metal counter-screws already in the crib frame. I was quite glad it was not a lock-and-cam system that plagues cheaper items, as they always seem to wobble after a while.
The dresser, on the other hand, has been a complete nightmare. The first step has cross beams being mounted to the middle section of the dresser with lock-and-cam faster, whose screw-locks never stabilize and just continue to turn as I attempt to tighten them and cams never lock as well as they should compared to an actual screw, or worse, the cross beams are simply sitting on a peg, in a hole too large to prevent movement.
Moving onto the second step, and praying that everything from the first step doesn’t just fall apart because nothing is adequately fastened (or fastened at all!) for what should be the core of the crib and the nuclease that the rest of the piece is assembled to, I found my prayers were not answered, and in attempting to attach one of the 5 crossbeams the core shifted, pulling out the plastic anchors I had already screwed and cammed into place.
Especially after a simple, quick, and sturdy assembly process with the crib, I expected the same simple, quick, and sturdy process with the dresser and was greatly surprised and very displeased at the shoddy and inadequate fastening system provided with the dresser. I expect as much from sub-$100 pieces of furniture, but not something that’s suppose to be of better quality — otherwise why would I have bought it?
Beyond the shoddy and inadequate fastening system for a dresser with a long expected lifetime, I am beyond worried that a part designed to hold a screw-lock in position could be pulled out with such little force, that is the middle section twisting because nothing is tightly fastened to it. I will more-than-likley end up using wood glue to secure the part that detached and any others that continue to. Moreover, I am also planning on purchasing larger screws that I can use to fasten the dresser properly.
While the crib is exactly what I expected, the dresser was worlds apart. I cannot in good conscience recommend this set to anyone.

UPDATE (2015-08-23):

hello James, I’m sorry to hear about the trouble you had with the crib you bought at Target. I know how frustrating it can be when an item doesn’t work out.
We’re always looking for ways to make the Target experience better for every guest. I’ll be sure to share your comments with our Merchandising team for further review with our buyers. In the meantime, we’ll keep working to provide you with the high quality merchandise you’ve come to expect at Target.
Thanks for writing. Hearing about your experience is important to us.
Target Guest Relations

Thanks for actually reading my complaint and not just skimming the first line for a product name! /s Also, I’d rather not be contacted if all I’m getting is a form letter.