This blog is about my musings and thoughts. I hope you find it useful, at most, and entertaining, at least.
Date: 2019-01-08
Tags: postgis pg-routing postgres gis gtfs transit census
One of my fellow Allegheny County Transit Council (ACTC) members asked how much of the county the Port Authority serves. After not receiving a response, I decided to compute it myself. We're going to define the service area in terms of both land and people, i.e. percent of the land in Allegheny County in the service area and also percent of the people in Allegheny County in the service area.
That still leaves a question of what a "Service Area" actually is. For this project, I'm going to define the "Service Area" is any Census 2010 Tabulation Block that is adjacent to a road that is within ¼mi of a bus stop and ½mi from a busway or trolley stop. From there, I can use the area and population of tabulation block (block) to build up my statistics.
I'm using Ubuntu 18.04 and have basic build utilities such as gcc, autoconf, make, cmake, git, &c already installed.
QGIS is a featureful, free (libre and beer, although paid consulting is available) GIS (Geographic Information Systems) tool similar in many ways to ArcGIS. I covered a little on using it in my So You Want to Make a Map post. The QGIS Training Manual contains information (among many, many other things) about how to connect QGIS to PostgreSQL. It also has sections on Database Concepts and Spatial Database Concepts which I would highly recommend. It's very fast, well documented, and easy to use. I cannot say enough good things about it.
To install the latest version (3.x), otherwise the ubuntu repository has the 2.18.x LTR already:
sudo add-apt-repository "deb https://qgis.org/debian bionic main"
wget -O - https://qgis.org/downloads/qgis-2017.gpg.key | gpg --import
gpg --fingerprint CAEB3DC3BDF7FB45
gpg --export --armor CAEB3DC3BDF7FB45 | sudo apt-key add -
sudo apt-get update
sudo apt-get install qgis python-qgis
Some instructions say to do the following too
sudo add-apt-repository "deb-src https://qgis.org/debian bionic main"
sudo apt-get install qgis-plugin-grass
but they didn't work for me, nor are they required, so I skipped them.
PostgreSQL is an extremely powerful, full-featured SQL RDBMS. The documentation is top-notch and extensions such as PostGIS for performing GIS (Geographic Information System) work and pgRouting for performing graph algorithms and traversals make it much more valuable. I truly cannot say enough good things about PostgreSQL.
To install:
sudo apt-get install postgresql-10 postgresql-10-pgrouting postgresql-10-postgis-2.4
mdbtools allows viewing and manipulating Microsoft Access databases -- which the census distributes schemas as. It is a fork of an older tool, mdbtools, but that I could get to compile.
To install:
cd ~/projects/external
git clone git@github.com:brianb/mdbtools.git brianb/mdbtools
cd brianb/mdbtools
sudo apt-get install txt2man
autoreconf -i -f
export DOCBOOK_DSL=/usr/share/sgml/docbook/stylesheet/dsssl/modular/html/docbook.dsl
./configure
make
gtfs-sql-importer is a set of scripts to help import the csv files in the GTFS and generate some geographic features inside of PostgreSQL.
To install:
cd projects/external
git clone git@github.com:fitnr/gtfs-sql-importer.git fitnr/gtfs-sql-importer
make init
PGDATABASE=census GTFS=~/Downloads/www.portauthority.org/generaltransitfeed/general_transit_CleverTripID_1811.zip make load
I did need to patch the code to handle files with empty lines. However, this appears to be fixed in the add-cols branch, which might be merged into master soon.
diff --git a/src/load.sh b/src/load.sh
index 6d77846..f2f4278 100644
--- a/src/load.sh
+++ b/src/load.sh
@@ -15,7 +15,7 @@ function import_stdin()
# remove possible BOM
hed=$(unzip -p "$ZIP" "${1}.txt" | head -n 1 | awk '{sub(/^\xef\xbb\xbf/,"")}{print}')
echo "COPY gtfs.${1}" 1>&2
- unzip -p "$ZIP" "${1}.txt" | awk '{ sub(/\r$/,""); sub(",\"\"", ","); print }' | psql -c "COPY gtfs.${1} (${hed}) FROM STDIN WITH DELIMITER AS ',' HEADER CSV"
+ unzip -p "$ZIP" "${1}.txt" | grep -v "^\s*$" | awk '{ sub(/\r$/,""); sub(",\"\"", ","); print }' | psql -c "COPY gtfs.${1} (${hed}) FROM STDIN WITH DELIMITER AS ',' HEADER CSV"
}
ADD_DATES=
census-postgres-scripts is a set of scripts from Census Reporter to make quick work of importing ACS and TIGER datasets.
To install:
cd projects/external
git clone git@github.com:censusreporter/census-postgres-scripts.git censusreporter/census-postgres-scripts
I also made some patches to only download data for Pennsylvania (FIPS ID
42) or the whole US when needed and also to download the Roads and
Tabulation Block geometries, which aren't included by default in these
scripts. I also store mine in a different location and login to postgres
via my personal user, not the census
user.
diff --git a/12_download_tiger_2017.sh b/12_download_tiger_2017.sh
index ea17e88..86084b1 100755
--- a/12_download_tiger_2017.sh
+++ b/12_download_tiger_2017.sh
@@ -1,11 +1,12 @@
# Download census-y TIGER data
-mkdir -p /mnt/tmp/tiger2017
-
-wget --recursive --continue --accept=*.zip \
+wget --recursive --continue \
+ --accept=tl_2017_42_*.zip \
+ --accept=tl_2017_42003*.zip \
+ --accept=tl_2017_us*.zip \
+ --wait=1 --random-wait \
--no-parent --cut-dirs=3 --no-host-directories \
-e robots=off \
- --directory-prefix=/mnt/tmp/tiger2017 \
https://www2.census.gov/geo/tiger/TIGER2017/AIANNH/ \
https://www2.census.gov/geo/tiger/TIGER2017/AITSN/ \
https://www2.census.gov/geo/tiger/TIGER2017/ANRC/ \
@@ -23,11 +24,13 @@ wget --recursive --continue --accept=*.zip \
https://www2.census.gov/geo/tiger/TIGER2017/NECTADIV/ \
https://www2.census.gov/geo/tiger/TIGER2017/PLACE/ \
https://www2.census.gov/geo/tiger/TIGER2017/PUMA/ \
+ https://www2.census.gov/geo/tiger/TIGER2017/ROADS/ \
https://www2.census.gov/geo/tiger/TIGER2017/SCSD/ \
https://www2.census.gov/geo/tiger/TIGER2017/SLDL/ \
https://www2.census.gov/geo/tiger/TIGER2017/SLDU/ \
https://www2.census.gov/geo/tiger/TIGER2017/STATE/ \
https://www2.census.gov/geo/tiger/TIGER2017/SUBMCD/ \
+ https://www2.census.gov/geo/tiger/TIGER2017/TABBLOCK/ \
https://www2.census.gov/geo/tiger/TIGER2017/TBG/ \
https://www2.census.gov/geo/tiger/TIGER2017/TRACT/ \
https://www2.census.gov/geo/tiger/TIGER2017/TTRACT/ \
diff --git a/13_import_tiger_2017.sh b/13_import_tiger_2017.sh
index 2d00e54..683c39f 100755
--- a/13_import_tiger_2017.sh
+++ b/13_import_tiger_2017.sh
@@ -1,18 +1,9 @@
#!/bin/bash
-cd /mnt/tmp/tiger2017
-# For shp2pgsql
-sudo apt-get install -y postgis
+psql -d census -v ON_ERROR_STOP=1 -q -c "DROP SCHEMA tiger2017 CASCADE; CREATE SCHEMA tiger2017;"
+psql -d census -v ON_ERROR_STOP=1 -q -c "ALTER SCHEMA tiger2017 OWNER TO census;"
-if [ -z $PGHOST ]; then
- echo "You must set PGHOST environment variable to the hostname of the PostgreSQL server to operate on."
- exit 1
-fi
-
-psql -d census -U census -v ON_ERROR_STOP=1 -q -c "DROP SCHEMA IF EXISTS tiger2017 CASCADE; CREATE SCHEMA tiger2017;"
-psql -d census -U census -v ON_ERROR_STOP=1 -q -c "ALTER SCHEMA tiger2017 OWNER TO census;"
-
-for i in /mnt/tmp/tiger2017/{CBSA,CD,COUNTY,CSA,PLACE,STATE,ELSD,SCSD,ZCTA5,COUSUB,PUMA,SLDL,SLDU,AIANNH,AITSN,ANRC,BG,CNECTA,CONCITY,METDIV,NECTA,NECTADIV,SUBMCD,TBG,TTRACT,TRACT,UAC,UNSD}
+for i in ./{TABBLOCK,ROADS,CBSA,CD,COUNTY,CSA,PLACE,STATE,ELSD,SCSD,ZCTA5,COUSUB,PUMA,SLDL,SLDU,AIANNH,AITSN,ANRC,BG,CNECTA,CONCITY,METDIV,NECTA,NECTADIV,SUBMCD,TBG,TTRACT,TRACT,UAC,UNSD}
do
tablename=$(basename $i)
for j in $i/*.zip
@@ -24,12 +15,12 @@ do
one_shapefile=`ls -a $i/*.shp | head -n 1`
# Start by preparing the table
- shp2pgsql -W "latin1" -s 4326 -p -I $one_shapefile tiger2017.$tablename | psql -d census -U census -v ON_ERROR_STOP=1 -q
+ shp2pgsql -W "latin1" -s 4326 -p -I $one_shapefile tiger2017.$tablename | psql -d census -v ON_ERROR_STOP=1 -q
# Then append all the geometries
for j in $i/*.shp
do
- shp2pgsql -W "latin1" -s 4326 -a $j tiger2017.$tablename | psql -d census -U census -v ON_ERROR_STOP=1 -q
+ shp2pgsql -W "latin1" -s 4326 -a $j tiger2017.$tablename | psql -d census -v ON_ERROR_STOP=1 -q
done
if [ $? -ne 0 ]
diff --git a/13_index_tiger_2017.sql b/13_index_tiger_2017.sql
index 288d8ac..5421efa 100644
--- a/13_index_tiger_2017.sql
+++ b/13_index_tiger_2017.sql
@@ -27,6 +27,8 @@ ALTER TABLE tiger2017.ttract OWNER TO census;
ALTER TABLE tiger2017.tract OWNER TO census;
ALTER TABLE tiger2017.uac OWNER TO census;
ALTER TABLE tiger2017.unsd OWNER TO census;
+ALTER TABLE tiger2017.roads OWNER TO census;
+ALTER TABLE tiger2017.tabblock OWNER TO census;
-- OGR needs select on geography_columns to do it's thing
GRANT SELECT ON geography_columns TO census;
osm2pgrouting is a tool that will import OpenStreetMap dumps as a topology (i.e. connects roads that are connected in real-life) into a database with PostGIS and pgRouting.
To install:
sudo apt-get install libpqxx-dev
cd projects/external
git clone git@github.com:pgRouting/osm2pgrouting.git pgRouting/osm2pgrouting
cd pgRouting/osm2pgrouting
mkdir build
cd build
cmake ..
make
osmosis is a tool for manipulating OpenStreetMap dumps. We'll use it to "crop" the Geofabrik dump of Pennsylvania to around Allegheny County because osm2pgrouting requires raw xml, not the Protocol Buffers format Geofabrik distributes in. osm2pgrouting also tries to load the whole dump into memory before performing its work, so the less data it needs to load, the better.
To install:
sudo apt-get install osmosis
or
cd ~/projects/external
mkdir osmosis
cd osmosis
wget https://bretth.dev.openstreetmap.org/osmosis-build/osmosis-latest.tgz
tar xvfz osmosis-latest.tgz
rm osmosis-latest.tgz
chmod a+x bin/osmosis
First we need to create the role/user and database in postgres. I've traditionally used the "census" user as that's what the Census Report scripts used, and that's previously where I started. If you login to postgres as a superuser, you can create the role and user via the following SQL.
create role census with nosuperuser login password 'censuspassword';
create database census with owner census;
Alternatively, you could use the following commands (run as the census
role).
sudo -u postgres createuser --no-superuser --login --pwprompt census
sudo -u postgres createdb --owner census census
Once created, we can login via your favourite SQL interface, including psql.
psql --username census -password census
and execute the following:
create extension postgis;
create extension hstore;
create extension pgrouting;
create schema osmpgrouting;
create schema gtfs;
To compute all of this, and because I haven't loaded this data onto this laptop previously, we'll need a few pieces of data. We'll cover downloading and importing the data in the next post.
The Census provides the geometries corresponding to their summary data, along with some additional data, such as basic water features and roads. These geometries are part of a collection called "TIGER"(Topologically Integrated Geographic Encoding & Referencing). I will be using many of these datasets at a later time, so I am going to import all of them; If you're following along, feel free to only download what you need.
You can follow the instructions that come with the census-postgres-scripts, so an abbreviated version is presented here. As I mentioned before, feel free to only download the TABBLOCK, as that's all we need for this project and to modify it to only download your state's data.
cd projects/external/censusreporter/
./12_download_tiger_2017.sh
./13_import_tiger_2017.sh
./13_index_tiger_2017.sh
While TIGER does provide road information, it's actually not the most useful for this purpose because it does not differentiate between an at-grade or grade-separated crossing (e.g. an intersection vs a bridge). This is the essence of a "topology": the connections of the roads themselves. For this, we're going to use the OpenStreetMap data. Luckily, the folks over at Geofabrik in Germany, produce up-to-date exports based on administrative boundaries, such as US States.
First I grabbed the dump for Pennsylvania.
cd ~/Downloads
wget https://download.geofabrik.de/north-america/us/pennsylvania-190107.osm.pbf
Then, I used osmosis
to convert it into XML from the Protocol
Buffers format and also
to "crop" the input to osm2pgrouting
to just the county. The
coordinates I'm using were randomly taken to be outside the county in
all directions. It also seemed like osmosis
didn't like ~
or
$HOME
, so I used full paths to my files.
osmosis --read-pbf file=/home/jim/Downloads/pennsylvania-190107.osm.pbf --bb left="-80.456" right="-79.542" top="40.787" bottom="40.119" --write-xml file=/home/jim/Downloads/pennsylvania-190107-ac.osm
Now we can actually import the data! Before I decided to trim the input to
osm2pgrouting
, this operation would take quite a while and then fail with a
std::bad_alloc
, i.e. out of memory. After cropping, it didn't take as long,
nor use as much memory, but more importantly, finished.
cd projects/external/pgRouting/osm2pgrouting
./osm2pgrouting --conf ../mapconfig.xml --addnodes --attributes --tags --dbname census --username census --password censuspassword --schema osmpgrouting --file ~/Downloads/pennsylvania-190107-ac.osm
Something I noticed when viewing the data was that there were a lot of lines from how I "cropped" the region; I suspect they're connections of roads with the same name, i.e. state routes, interstate highways, &c. I deleted them by filter on values with a really large cost, but there were still a handful of outliers that needed to be cleaned up afterwards by hand.
echo "delete from osmpgrouting.ways where cost > 1" | psql census
The General Transit Feed Specification is produced my most transit agencies and is consumed by services such as Google Maps, Bing Maps, Transit App, &c. It contains data such as stops, routes, trips, fare, normal schedules, special schedules, &c. The Port Authority also produces a GTFS feed.
I started by grabbing all of the GTFS and GIS data from the Port Authority,
although I only needed the
www.portauthority.org/generaltransitfeed/general_transit_CleverTripID_1811.zip
file and could have only downloaded that. Remember, you may need the patched
version of the code or the add-cols
branch for the import to succeed.
cd ~/Downloads wget --recursive --continue -e robots=off
--force-directories --no-parent
http://www.portauthority.org/generaltransitfeed/GIS/ cd
~/projects/external/fitnr/gtfs-sql-importer make init PGDATABASE=census
GTFS=~/Downloads/www.portauthority.org/generaltransitfeed/general_transit_CleverTripID_1811.zip
make load
While I could use more recent ACS estimates, I'm choosing to use the decennial census because I would like my area estimations to use tabulation blocks, and not block groups. Block Groups are groups of blocks in the same Census Track, and are used to aggregate data over a larger geographic region. That's useful for a few reasons:
Since this is a little more complicated to import, and we need to generate our walksheds first, anyway, I'm going to put off the import until a later post.