About Me!

This blog is about my musings and thoughts. I hope you find it useful, at most, and entertaining, at least.

Résumé [PDF]

Other Pages

Quotes

Links

Presence Elsewhere

jim@jimkeener.com

GitHub

BitBucket

Calculating Service Area, Part 1: Introduction and Setup

Date: 2019-01-08
Tags: postgis pg-routing postgres gis gtfs transit census

Motivation

One of my fellow Allegheny County Transit Council (ACTC) members asked how much of the county the Port Authority serves. After not receiving a response, I decided to compute it myself. We're going to define the service area in terms of both land and people, i.e. percent of the land in Allegheny County in the service area and also percent of the people in Allegheny County in the service area.

That still leaves a question of what a "Service Area" actually is. For this project, I'm going to define the "Service Area" is any Census 2010 Tabulation Block that is adjacent to a road that is within ¼mi of a bus stop and ½mi from a busway or trolley stop. From there, I can use the area and population of tabulation block (block) to build up my statistics.

Parts

Tools Required

I'm using Ubuntu 18.04 and have basic build utilities such as gcc, autoconf, make, cmake, git, &c already installed.

QGIS

QGIS is a featureful, free (libre and beer, although paid consulting is available) GIS (Geographic Information Systems) tool similar in many ways to ArcGIS. I covered a little on using it in my So You Want to Make a Map post. The QGIS Training Manual contains information (among many, many other things) about how to connect QGIS to PostgreSQL. It also has sections on Database Concepts and Spatial Database Concepts which I would highly recommend. It's very fast, well documented, and easy to use. I cannot say enough good things about it.

To install the latest version (3.x), otherwise the ubuntu repository has the 2.18.x LTR already:

sudo add-apt-repository "deb     https://qgis.org/debian bionic main"
wget -O - https://qgis.org/downloads/qgis-2017.gpg.key | gpg --import
gpg --fingerprint CAEB3DC3BDF7FB45
gpg --export --armor CAEB3DC3BDF7FB45 | sudo apt-key add -
sudo apt-get update
sudo apt-get install qgis python-qgis

Some instructions say to do the following too

sudo add-apt-repository "deb-src https://qgis.org/debian bionic main"
sudo apt-get install qgis-plugin-grass

but they didn't work for me, nor are they required, so I skipped them.

PostgreSQL, PostGIS, pgRouting

PostgreSQL is an extremely powerful, full-featured SQL RDBMS. The documentation is top-notch and extensions such as PostGIS for performing GIS (Geographic Information System) work and pgRouting for performing graph algorithms and traversals make it much more valuable. I truly cannot say enough good things about PostgreSQL.

To install:

sudo apt-get install postgresql-10 postgresql-10-pgrouting postgresql-10-postgis-2.4

mdbtools

mdbtools allows viewing and manipulating Microsoft Access databases -- which the census distributes schemas as. It is a fork of an older tool, mdbtools, but that I could get to compile.

To install:

cd ~/projects/external
git clone git@github.com:brianb/mdbtools.git brianb/mdbtools
cd brianb/mdbtools
sudo apt-get install txt2man
autoreconf -i -f
export DOCBOOK_DSL=/usr/share/sgml/docbook/stylesheet/dsssl/modular/html/docbook.dsl
./configure
make

gtfs-sql-importer

gtfs-sql-importer is a set of scripts to help import the csv files in the GTFS and generate some geographic features inside of PostgreSQL.

To install:

cd projects/external
git clone git@github.com:fitnr/gtfs-sql-importer.git fitnr/gtfs-sql-importer
make init
PGDATABASE=census GTFS=~/Downloads/www.portauthority.org/generaltransitfeed/general_transit_CleverTripID_1811.zip make load

I did need to patch the code to handle files with empty lines. However, this appears to be fixed in the add-cols branch, which might be merged into master soon.

diff --git a/src/load.sh b/src/load.sh
index 6d77846..f2f4278 100644
--- a/src/load.sh
+++ b/src/load.sh
@@ -15,7 +15,7 @@ function import_stdin()
     # remove possible BOM
     hed=$(unzip -p "$ZIP" "${1}.txt" | head -n 1 | awk '{sub(/^\xef\xbb\xbf/,"")}{print}')
     echo "COPY gtfs.${1}" 1>&2
-    unzip -p "$ZIP" "${1}.txt" | awk '{ sub(/\r$/,""); sub(",\"\"", ","); print }' | psql -c "COPY gtfs.${1} (${hed}) FROM STDIN WITH DELIMITER AS ',' HEADER CSV"
+    unzip -p "$ZIP" "${1}.txt" | grep -v "^\s*$" | awk '{ sub(/\r$/,""); sub(",\"\"", ","); print }' | psql -c "COPY gtfs.${1} (${hed}) FROM STDIN WITH DELIMITER AS ',' HEADER CSV"
 }

 ADD_DATES=

census-postgres-scripts

census-postgres-scripts is a set of scripts from Census Reporter to make quick work of importing ACS and TIGER datasets.

To install:

cd projects/external
git clone git@github.com:censusreporter/census-postgres-scripts.git censusreporter/census-postgres-scripts

I also made some patches to only download data for Pennsylvania (FIPS ID 42) or the whole US when needed and also to download the Roads and Tabulation Block geometries, which aren't included by default in these scripts. I also store mine in a different location and login to postgres via my personal user, not the census user.

diff --git a/12_download_tiger_2017.sh b/12_download_tiger_2017.sh
index ea17e88..86084b1 100755
--- a/12_download_tiger_2017.sh
+++ b/12_download_tiger_2017.sh
@@ -1,11 +1,12 @@
 # Download census-y TIGER data

-mkdir -p /mnt/tmp/tiger2017
-
-wget --recursive --continue --accept=*.zip \
+wget --recursive --continue \
+     --accept=tl_2017_42_*.zip \
+     --accept=tl_2017_42003*.zip \
+     --accept=tl_2017_us*.zip \
+     --wait=1 --random-wait \
      --no-parent --cut-dirs=3 --no-host-directories \
      -e robots=off \
-     --directory-prefix=/mnt/tmp/tiger2017 \
     https://www2.census.gov/geo/tiger/TIGER2017/AIANNH/ \
     https://www2.census.gov/geo/tiger/TIGER2017/AITSN/ \
     https://www2.census.gov/geo/tiger/TIGER2017/ANRC/ \
@@ -23,11 +24,13 @@ wget --recursive --continue --accept=*.zip \
     https://www2.census.gov/geo/tiger/TIGER2017/NECTADIV/ \
     https://www2.census.gov/geo/tiger/TIGER2017/PLACE/ \
     https://www2.census.gov/geo/tiger/TIGER2017/PUMA/ \
+    https://www2.census.gov/geo/tiger/TIGER2017/ROADS/ \
     https://www2.census.gov/geo/tiger/TIGER2017/SCSD/ \
     https://www2.census.gov/geo/tiger/TIGER2017/SLDL/ \
     https://www2.census.gov/geo/tiger/TIGER2017/SLDU/ \
     https://www2.census.gov/geo/tiger/TIGER2017/STATE/ \
     https://www2.census.gov/geo/tiger/TIGER2017/SUBMCD/ \
+    https://www2.census.gov/geo/tiger/TIGER2017/TABBLOCK/ \
     https://www2.census.gov/geo/tiger/TIGER2017/TBG/ \
     https://www2.census.gov/geo/tiger/TIGER2017/TRACT/ \
     https://www2.census.gov/geo/tiger/TIGER2017/TTRACT/ \
diff --git a/13_import_tiger_2017.sh b/13_import_tiger_2017.sh
index 2d00e54..683c39f 100755
--- a/13_import_tiger_2017.sh
+++ b/13_import_tiger_2017.sh
@@ -1,18 +1,9 @@
 #!/bin/bash
-cd /mnt/tmp/tiger2017

-# For shp2pgsql
-sudo apt-get install -y postgis
+psql -d census  -v ON_ERROR_STOP=1 -q -c "DROP SCHEMA tiger2017 CASCADE; CREATE SCHEMA tiger2017;"
+psql -d census  -v ON_ERROR_STOP=1 -q -c "ALTER SCHEMA tiger2017 OWNER TO census;"

-if [ -z $PGHOST ]; then
-    echo "You must set PGHOST environment variable to the hostname of the PostgreSQL server to operate on."
-    exit 1
-fi
-
-psql -d census -U census -v ON_ERROR_STOP=1 -q -c "DROP SCHEMA IF EXISTS tiger2017 CASCADE; CREATE SCHEMA tiger2017;"
-psql -d census -U census -v ON_ERROR_STOP=1 -q -c "ALTER SCHEMA tiger2017 OWNER TO census;"
-
-for i in /mnt/tmp/tiger2017/{CBSA,CD,COUNTY,CSA,PLACE,STATE,ELSD,SCSD,ZCTA5,COUSUB,PUMA,SLDL,SLDU,AIANNH,AITSN,ANRC,BG,CNECTA,CONCITY,METDIV,NECTA,NECTADIV,SUBMCD,TBG,TTRACT,TRACT,UAC,UNSD}
+for i in ./{TABBLOCK,ROADS,CBSA,CD,COUNTY,CSA,PLACE,STATE,ELSD,SCSD,ZCTA5,COUSUB,PUMA,SLDL,SLDU,AIANNH,AITSN,ANRC,BG,CNECTA,CONCITY,METDIV,NECTA,NECTADIV,SUBMCD,TBG,TTRACT,TRACT,UAC,UNSD}
 do
     tablename=$(basename $i)
     for j in $i/*.zip
@@ -24,12 +15,12 @@ do
     one_shapefile=`ls -a $i/*.shp | head -n 1`

     # Start by preparing the table
-    shp2pgsql -W "latin1" -s 4326 -p -I $one_shapefile tiger2017.$tablename | psql -d census -U census -v ON_ERROR_STOP=1 -q
+    shp2pgsql -W "latin1" -s 4326 -p -I $one_shapefile tiger2017.$tablename | psql -d census  -v ON_ERROR_STOP=1 -q

     # Then append all the geometries
     for j in $i/*.shp
     do
-        shp2pgsql -W "latin1" -s 4326 -a $j tiger2017.$tablename | psql -d census -U census -v ON_ERROR_STOP=1 -q
+        shp2pgsql -W "latin1" -s 4326 -a $j tiger2017.$tablename | psql -d census  -v ON_ERROR_STOP=1 -q
     done

     if [ $? -ne 0 ]
diff --git a/13_index_tiger_2017.sql b/13_index_tiger_2017.sql
index 288d8ac..5421efa 100644
--- a/13_index_tiger_2017.sql
+++ b/13_index_tiger_2017.sql
@@ -27,6 +27,8 @@ ALTER TABLE tiger2017.ttract OWNER TO census;
 ALTER TABLE tiger2017.tract OWNER TO census;
 ALTER TABLE tiger2017.uac OWNER TO census;
 ALTER TABLE tiger2017.unsd OWNER TO census;
+ALTER TABLE tiger2017.roads OWNER TO census;
+ALTER TABLE tiger2017.tabblock OWNER TO census;

 -- OGR needs select on geography_columns to do it's thing
 GRANT SELECT ON geography_columns TO census;

osm2pgrouting

osm2pgrouting is a tool that will import OpenStreetMap dumps as a topology (i.e. connects roads that are connected in real-life) into a database with PostGIS and pgRouting.

To install:

sudo apt-get install libpqxx-dev
cd projects/external
git clone git@github.com:pgRouting/osm2pgrouting.git pgRouting/osm2pgrouting
cd pgRouting/osm2pgrouting
mkdir build
cd build
cmake ..
make

osmosis

osmosis is a tool for manipulating OpenStreetMap dumps. We'll use it to "crop" the Geofabrik dump of Pennsylvania to around Allegheny County because osm2pgrouting requires raw xml, not the Protocol Buffers format Geofabrik distributes in. osm2pgrouting also tries to load the whole dump into memory before performing its work, so the less data it needs to load, the better.

To install:

 sudo apt-get install osmosis

cd ~/projects/external
mkdir osmosis
cd osmosis
wget https://bretth.dev.openstreetmap.org/osmosis-build/osmosis-latest.tgz
tar xvfz osmosis-latest.tgz
rm osmosis-latest.tgz
chmod a+x bin/osmosis

Database Setup

First we need to create the role/user and database in postgres. I've traditionally used the "census" user as that's what the Census Report scripts used, and that's previously where I started. If you login to postgres as a superuser, you can create the role and user via the following SQL.

create role census with nosuperuser login password 'censuspassword';
create database census with owner census;

Alternatively, you could use the following commands (run as the census role).

sudo -u postgres createuser --no-superuser --login --pwprompt census
sudo -u postgres createdb --owner census census

Once created, we can login via your favourite SQL interface, including psql.

psql --username census -password census

and execute the following:

create extension postgis;
create extension hstore;
create extension pgrouting;
create schema osmpgrouting;
create schema gtfs;

Data Needed

To compute all of this, and because I haven't loaded this data onto this laptop previously, we'll need a few pieces of data. We'll cover downloading and importing the data in the next post.

TIGER 2010 Tabulation Blocks

The Census provides the geometries corresponding to their summary data, along with some additional data, such as basic water features and roads. These geometries are part of a collection called "TIGER"(Topologically Integrated Geographic Encoding & Referencing). I will be using many of these datasets at a later time, so I am going to import all of them; If you're following along, feel free to only download what you need.

You can follow the instructions that come with the census-postgres-scripts, so an abbreviated version is presented here. As I mentioned before, feel free to only download the TABBLOCK, as that's all we need for this project and to modify it to only download your state's data.

cd projects/external/censusreporter/
./12_download_tiger_2017.sh
./13_import_tiger_2017.sh
./13_index_tiger_2017.sh

Road Topologies

While TIGER does provide road information, it's actually not the most useful for this purpose because it does not differentiate between an at-grade or grade-separated crossing (e.g. an intersection vs a bridge). This is the essence of a "topology": the connections of the roads themselves. For this, we're going to use the OpenStreetMap data. Luckily, the folks over at Geofabrik in Germany, produce up-to-date exports based on administrative boundaries, such as US States.

First I grabbed the dump for Pennsylvania.

cd ~/Downloads
wget https://download.geofabrik.de/north-america/us/pennsylvania-190107.osm.pbf

Then, I used osmosis to convert it into XML from the Protocol Buffers format and also to "crop" the input to osm2pgrouting to just the county. The coordinates I'm using were randomly taken to be outside the county in all directions. It also seemed like osmosis didn't like ~ or $HOME, so I used full paths to my files.

osmosis --read-pbf file=/home/jim/Downloads/pennsylvania-190107.osm.pbf --bb left="-80.456" right="-79.542" top="40.787" bottom="40.119" --write-xml file=/home/jim/Downloads/pennsylvania-190107-ac.osm

Now we can actually import the data! Before I decided to trim the input to osm2pgrouting, this operation would take quite a while and then fail with a std::bad_alloc, i.e. out of memory. After cropping, it didn't take as long, nor use as much memory, but more importantly, finished.

cd projects/external/pgRouting/osm2pgrouting
./osm2pgrouting --conf ../mapconfig.xml --addnodes --attributes --tags --dbname census --username census --password censuspassword --schema osmpgrouting --file ~/Downloads/pennsylvania-190107-ac.osm

Something I noticed when viewing the data was that there were a lot of lines from how I "cropped" the region; I suspect they're connections of roads with the same name, i.e. state routes, interstate highways, &c. I deleted them by filter on values with a really large cost, but there were still a handful of outliers that needed to be cleaned up afterwards by hand.

echo "delete from osmpgrouting.ways where cost > 1" | psql census

Bus Stops

The General Transit Feed Specification is produced my most transit agencies and is consumed by services such as Google Maps, Bing Maps, Transit App, &c. It contains data such as stops, routes, trips, fare, normal schedules, special schedules, &c. The Port Authority also produces a GTFS feed.

I started by grabbing all of the GTFS and GIS data from the Port Authority, although I only needed the www.portauthority.org/generaltransitfeed/general_transit_CleverTripID_1811.zip file and could have only downloaded that. Remember, you may need the patched version of the code or the add-cols branch for the import to succeed.

cd ~/Downloads wget --recursive --continue -e robots=off
--force-directories --no-parent
http://www.portauthority.org/generaltransitfeed/GIS/ cd
~/projects/external/fitnr/gtfs-sql-importer make init PGDATABASE=census
GTFS=~/Downloads/www.portauthority.org/generaltransitfeed/general_transit_CleverTripID_1811.zip
make load

Census 2010 Population Data

While I could use more recent ACS estimates, I'm choosing to use the decennial census because I would like my area estimations to use tabulation blocks, and not block groups. Block Groups are groups of blocks in the same Census Track, and are used to aggregate data over a larger geographic region. That's useful for a few reasons:

Fewer data points if you don't need the geographic specificity of a block. (Let's face it, 90+% of the time you don't; tracts will work well-enough.)
The ACS is a sampled survey, not an enumeration like the decennial census, and the error would be too high when examining blocks.
Statistical Disclosure Limitation whereby data that could be used to identify a person may be edited, especially in geographically fine-grained data such as block-level summaries. However, "By agreement with the Department of Justice (2000) the Census Bureau will provide exact counts at the Census block level for the following variables:
Number of people: total, age 18+ (voting age) , and less than age 18,
Number of vacant housing units, and
Number of householders, which is equal to the number of occupied housing units"

Since this is a little more complicated to import, and we need to generate our walksheds first, anyway, I'm going to put off the import until a later post.

Next Steps

Part 2: Defining the Walkshed