About Me!

This blog is about my musings and thoughts. I hope you find it useful, at most, and entertaining, at least.

Résumé [PDF]

Other Pages



Oak Island

Items for Sale

Presence Elsewhere




Calculating Service Area, Part 2: Defining the Walkshed

Date: 20190110

Tags: postgis pg-routing postgres gis gtfs transit census

In Part 1 we
talked about the project and got some of the tools and data we need to
start. A quick reminder of our motivation: to figure out the percentage
area and population the county transit agency serves by using a
“walkshed”. A walkshed is the distance from a busstop, along the road
for a certain distance. We’ll talk more later in this article why it’s
important to be along the road. Once we have a walkshed, we’ll use
census data to compute the statistics we’re interested in.


Defining Distance

At this point, we have enough data to start playing around with. However, if
we examine a random road from the ways table in QGIS, we’ll notice that its
length is something like “0.0007016736064598”.

Length of a random road before reprojection

You might be thinking that those are some funny units. You wouldn’t be far off;
they’re units of degrees. They’re not insensible units, especially given that
our geographic data has a project on WGS 84 (EPSG
) (GPS Coördinates,
which are in-fact in degrees), but they’re not very useful for what we’re
trying to do.


In order to deal with more useful units, feet, we need to reproject, convert it
into a new coordinate system, the data. (I discuss projections a little
more at Projections and Why They’re
.) I’m going to choose
the Pennsylvania South State
(EPSG 3365),
which is designed to provide an accurate plane in linear units (3365 is
feet, 3364 is in
meters) in the southern portion of Pennsylvania. Being able to work in
linear units is significantly easier, and significantly more accurate,
when doing analysis like this that is based on requirements in linear

To convert the geometries we currently have, we’ll copy the tables (to
retain the source data) and then reproject those in-place. The reason we
need to reproject the stops and blocks is that the geographic operations
like selecting within a certain distance work best when everything is in
the same projection. (QGIS would display everything correctly, but
PostGIS doesn’t like doing spatial operations over different
projections.) I also took this oppurtunity to filter the block data to
just Allegheny county, whose FIPS code is ‘003’ and a state of ‘42’.
(The full FIPS code is ‘42003’.) In Postgres we can accomplish this

create table osmpgrouting.ways_3365 as select * from osmpgrouting.ways;
alter table osmpgrouting.ways_3365 alter column the_geom type geometry(linestring, 3365) using st_transform(the_geom,3365) ;
create index ways_3365_the_geom_idx on osmpgrouting.ways_3365 using gist(the_geom);

create table osmpgrouting.ways_vertices_pgr_3365 as select * from osmpgrouting.ways_vertices_pgr;
alter table osmpgrouting.ways_vertices_pgr_3365 alter column the_geom type geometry(point, 3365) using st_transform(the_geom,3365) ;
create index ways_vertices_pgr_3365_the_geom_idx on osmpgrouting.ways_vertices_pgr_3365 using gist(the_geom);

create table gtfs.stops_3365 as select * from gtfs.stops;
alter table gtfs.stops_3365 alter column the_geom type geometry(point, 3365) using st_transform(the_geom,3365) ;
create index stops_3365_the_geom_idx on gtfs.stops_3365 using gist(the_geom);

create table tiger2017.tabblock_ac_3365 as select * from tiger2017.tabblock where countyfp10 = ‘003’ and statefp10 =‘42’;
alter table tiger2017.tabblock_ac_3365 alter column geom type geometry(multipolygon, 3365) using st_transform(geom,3365) ;
create index tabblock_ac_3365_geom_idx on tiger2017.tabblock_ac_3365 using gist(geom);

We’re also going to use a handy little feature of Postgres to make the
resulting SQL use an index: expression
The index is on the start point and end point of the geometry column,
will update if the column is ever updated, and isn’t stores explicitly
as a field.

create index osmpgrouting_ways_startpoint_idx on osmpgrouting.ways_3365 using gist (st_startpoint(the_geom));
create index osmpgrouting_ways_endpoint_idx on osmpgrouting.ways_3365 using gist (st_endpoint(the_geom));

We also need to update the length, cost, and reverse_cost fields
on osmpgrouting.ways_3365.

update osmpgrouting.ways_3365 set length = st_length(the_geom), cost = st_length(the_geom), reverse_cost = st_length(the_geom);

Now, let’s load these layers and work with them.

Length of a random road after reprojection

That same road is now “203.747620652544” which seems like a reasonable
number of feet. (It’s also spot on the length_m (length in meters)
field, that is part of the OSM import, when converted to feet!)

Finding all roads within a certain distance of a stop

Now, a walkshed is how far you can walk on the road for a certain
distance. To clarify why this is important, take the yellow bus stop
below. The grey circle is ¼mi (1230ft)
buffer around it. This buffer includes part of a loop of road to the
However, the pale orange line of the ruler shows that you’d need to walk
along 0.3mi of road to get to the point from our starting spot.

Example of why walkshed is important. The yellow busstop is \<¼mi from part of a loop the way the crow flies, but not by foot.

What we need is a function that takes the road network and provides the
distance you’d walk or drive on the road from a starting point to get
somewhere, since we can’t just build a buffer around each point.

pgRouting provides a handy function that does exactly what we’re looking

We give pgr_drivingDistance a query containing our weighted graph as a
table of (edge id, source vertex id, target vertex id, cost, reverse
cost). We need to select reverse cost, otherwise pgr_drivingDistance
will be forced to traverse every road from start to end, and not end to
start, so we’d be at the mercy of how the road was imported. There are
ways to take “one-way” roads into account, but since we’re dealing with
pedestrians, we’ll ignore that.

We also give pgr_drivingDistance a source vertex id and a maximum
distance, and it returns every segment of road in the source graph, and
the aggregate cost to get there, that is within the distance given.

The following query computes the walkshed. I store it directly in a
table because it takes north of 2 hours to run on my machine. I’m still
working on making this faster, but since we’re essentially computing it
once and will reüse these results, it’s not a huge problem in

What’s happening here is I build a
CTE that is the
result or running pgr_drivingDistance for each stop. Notice the
keyword lateral in the join. Lateral

allow the execution of the subquery for each record in the FROM
clause; otherwise the subquery is executed independently of the rest of
the query.

I then build a CTE that pulls in all road segments that have a start or
end point that is the source or target from our results. I do this,
instead of just going by the edges returned fro pgr_drivingDistance
because if a particularly long segment or road (which isn’t unheard of
if we were doing this for ¼mi) would not be included and it would
skew our results by not counting that area, even if it may be within our
walkshed. I’m erring on the side of including a few features further than
I want to avoid not including features I want.

Then, because of how the join was done, we simply group everything by
stop and road segment, pulling the smallest distance to a vertex for a
segment of road. (Remember, pgr_drivingDistance gives the distance a
vertex is from the start based on the edges in the graph.) However,
let’s focus on a single stop, ‘S18340’, first since computing walksheds
for all stops takes a good while.

create table stops_walkshed_S18340 as
with all_within as (
  select stop_id, dd.*
  from gtfs.stops_3365
  join lateral (
    select seq, node, edge, cost, agg_cost
      from pgr_drivingDistance(
            -- We need to select reverse cost, otherwise
            -- pgr_drivingDistance will be forced to traverse every road
            -- from start to end, and not end to start, so we'd be at the
            -- mercy of how the road was imported. There are ways to take
            -- "one-way" roads into account, but since we're dealing with
            -- pedestrians, we'll ignore that.
            'SELECT gid as id, source, target, cost, reverse_cost ' ||
            'FROM osmpgrouting.ways_3365 ' ||
            -- Find all road segments that begin or end within the
            -- distance we're looking for. We're going to do 1mi to give
            -- ourselves some flexibility later on, and it doesn't
            -- substantually increase the walltime.
            'where st_dwithin(' ||
              -- This awkwardness is because pgRouting functions take
              -- text to run as a query to get the edges and vertecies,
              -- not the results of a subquery.
              'ST_GeomFromEWKB(decode(''' || stops_3365.the_geom::text  || ''', ''hex'')), ' ||
              'ST_StartPoint(ways_3365.the_geom), ' ||
              '5280) or' ||
            ' st_dwithin(' ||
              'ST_GeomFromEWKB(decode(''' || stops_3365.the_geom::text  || ''', ''hex'')), ' ||
              'ST_EndPoint(ways_3365.the_geom), ' ||
        -- Start from the nearest vertex/intersection to the bus stop.
        (select id
          from osmpgrouting.ways_vertices_pgr_3365
          order by ways_vertices_pgr_3365.the_geom &lt;-&gt; stops_3365.the_geom
          limit 1),
  ) dd
    on true
  where stop_id = 'S18340'
), simplified as (
    gid, stop_id, agg_cost,
  from all_within
  join osmpgrouting.ways_3365
    on node in (source, target)
-- When we do the join above, a road segment could be pulled in
-- multiple times for a single stop, so we'll collect it and count
-- it as the min of either its start or end vertecies.
select gid, stop_id, min(agg_cost) as agg_cost, the_geom
from simplified
group by stop_id, gid, the_geom

We still have issues with portions of roads that have other portions
outside the distance we’re shooting for. To solve this, we’re going to
split each road into 100ft segments.

source and target are used by pgr_drivingDistance to connect
segments, so we need to ensure that they are still consistent. Examining
the data shows that the max edge and vertex ids are 6 digits longs.

census=# select max(gid) as max_gid, max(source) as max_source, max(target) as max_target from osmpgrouting.ways_3365;
 max_gid | max_source | max_target
  287229 |     216247 |     216247
(1 row)

Additionally, gid is a bigint, which maxes out at 9223372036854775807,
which is 19 digits long. As such, I can create a new vertex id that is
the “packed” version of the old vertex id, the edge id, and the sequence
in the split. For example, if I had an edge with gid=287229 and a
source=216247, and it’s the 4th segment after splitting, I can call
this new vertex 287229216247000004. Adding some special logic to keep
the original source and targets the same as they were in the
original data (but formatted appropriately), we can still use the
ways_vertices_pgr vertex table as well. (Note that I got a good
portion/idea for this query from the
documentation.) (This does take a while to run.)

create table osmpgrouting.ways_3365_split100 as
select (to_char(gid, ‘FM000000’) || to_char(n, ‘FM000000’))::bigint as gid osm_id, tag_id, st_length(new_geom) as length, name, cast(case when n = 0 then to_char(source, ‘FM000000’) || ‘000000000000’ else to_char(source, ‘FM000000’) || to_char(gid, ‘FM000000’) || to_char(n, ‘FM000000’) end as bigint) as source, cast(case when end_percent = 1 then to_char(target, ‘000000’) || ‘000000000000’ else to_char(source, ‘FM000000’) || to_char(gid, ‘FM000000’) || to_char(n+1, ‘FM000000’) end as bigint) as target, st_length(new_geom) as cost, st_length(new_geom) as reverse_cost, new_geom as the_geom
from (select *, greatest(length,1) as orig_length from osmpgrouting.ways_3365) as x
cross join generate_series(0,10000) as n
join lateral ( select 100.00*n/orig_length as start_percent, case when 100.00*(n+1) < orig_length then 100.00*(n+1)/orig_length else 1 end as end_percent
)y on n*100.00/orig_length< 1
join lateral ( select st_linesubstring(the_geom, start_percent, end_percent) as new_geom
)z on true ;
— Some lines end up as a single point. Something to look into more :/
delete from osmpgrouting.ways_3365_split100 where ST_GeometryType(the_geom) = ‘ST_Point’;
alter table osmpgrouting.ways_3365_split100 add primary key (gid);
alter table osmpgrouting.ways_3365_split100 owner to census;
alter table osmpgrouting.ways_3365_split100 alter column the_geom type geometry(linestring, 3365) using st_setsrid(the_geom, 3365);
create index ways_3365_split100_the_geom_idx on osmpgrouting.ways_3365_split100 using gist(the_geom);
create index ways_3365_split100_source_idx on osmpgrouting.ways_3365_split100(source);
create index ways_3365_split100_target_idx on osmpgrouting.ways_3365_split100(target);
vacuum full osmpgrouting.ways_3365_split100;

create table osmpgrouting.ways_vertices_pgr_3365_split100 as select * from osmpgrouting.ways_vertices_pgr_3365;
update osmpgrouting.ways_vertices_pgr_3365_split100 set id = (to_char(id, ‘FM000000’) || ‘000000000000’)::bigint;
create index ways_vertices_pgr_3365_split100_the_geom_idx on osmpgrouting.ways_vertices_pgr_3365_split100 using gist(the_geom);
vacuum full osmpgrouting.ways_vertices_pgr_3365_split100;

Now, we can run our walkshed calculation again. However, we can clean it
up a little as well, now that we have much smaller segments to work
with. Again, just for S18340 because we want to iterate quickly.

create table stops_walkshed_s18340_split100 as
with all_within as (
  select stop_id, dd.*
  from gtfs.stops_3365
  join lateral (
    select seq, node, edge, cost, agg_cost
      from pgr_drivingDistance(
            'SELECT gid as id, source, target, cost, reverse_cost ' ||
            'FROM osmpgrouting.ways_3365_split100 ' ||
            'where st_dwithin(' ||
              'ST_GeomFromEWKB(decode(''' || stops_3365.the_geom::text  || ''', ''hex'')), ' ||
              'ways_3365_split100.the_geom, ' ||
        (select id
          from osmpgrouting.ways_vertices_pgr_3365_split100
          order by ways_vertices_pgr_3365_split100.the_geom &lt;-&gt; stops_3365.the_geom
          limit 1),
  ) dd
    on true
  where stop_id = 'S18340'
), all_within_simplified as (
  select stop_id, node, max(agg_cost) as agg_cost
  from all_within
  group by stop_id, node
  gid, stop_id, agg_cost, the_geom
from all_within_simplified
join osmpgrouting.ways_3365_split100
  on node in (source, target);

Below is a much more accurate representation of the walkshed in red,
which was computed after splitting the road into segments. The orange
was the “¼ mi” walkshed before we split the road into segments,
and it misses a good portion of territory.

Comparison of walkshed before and after split

Since we computed the walkshed out to 1mi, and we stored the aggregate
distance for each segment, we can use that to build heat maps of what’s
within a certain distance of a stop.

Coloring Walkshed by distance

Let’s now join the walkshed against our block data. Since the road
features themselves might not overlap a block, since we’re using
different data sources, it’s highly likely that a road might be on one
side or the other of the block boundary. What we can do is build an
arbitrary buffer along the road segment, i.e. make it “thicker”, and
test if the buffer overlaps a block. I chose 10ft. I also chose to
filter only for blocks within ¼mi, and not build the distance
metrics into the block.

create table blocks_s18340 as 
select b.* 
from tiger2017.tabblock_ac_3365 b 
join stops_walkshed_s18340_split100 ws 
  on st_overlaps(b.geom, st_buffer(ws.the_geom, 10))
where ws.agg_cost &lt;= 1230;

The blocks themselves are delineated by a road or one of the white
lines. Each block that was adjacent to one of the red lines is now
selected in dark yellow. (As opposed to the rest of the blocks in a
lighter yellow, mustard if you will.) This does add a bit to what the
“walkshed” technically is, but at this point I’m not willing to
interpolate population by building an arbitrary buffer over the
walkshed, as when blocks are large it’s likely that the population is
not evenly spread among it and interpolation can give rather odd
results. (See
Interpolating Old Census Data
for some discussion of those issues.) Using adjacent blocks is more than
sufficient for our purposes, and at worst will overestimate slightly.

Quarter mile walkshed with adjacent blocks

Now, let’s remove the where stop_id = 'S18340' and compute the
walkshed and adjacent blocks for each stop!

create table stops_walkshed_5280 as
with all_within as ( select stop_id, dd.* from gtfs.stops_3365 join lateral ( select seq, node, edge, cost, agg_cost from pgr_drivingDistance( ‘SELECT gid as id, source, target, cost, reverse_cost ‘ || ‘FROM osmpgrouting.ways_3365_split100 ‘ || ‘where st_dwithin(’ || ‘ST_GeomFromEWKB(decode(‘’‘ || stops_3365.the_geom::text || ‘’‘, ‘‘hex’‘)), ‘ || ‘ways_3365_split100.the_geom, ‘ || ’5280)’, (select id from osmpgrouting.ways_vertices_pgr_3365_split100 order by ways_vertices_pgr_3365_split100.the_geom <-> stops_3365.the_geom limit 1), 5280 ) ) dd on true
), all_within_simplified as ( select stop_id, node, max(agg_cost) as agg_cost from all_within group by stop_id, node
select gid, stop_id, agg_cost, the_geom
from all_within_simplified
join osmpgrouting.ways_3365_split100 on node in (source, target);
delete from stops_walkshed_5280 where ST_GeometryType(the_geom) = ‘ST_Point’;
vacuum full stops_walkshed_5280;
alter table stops_walkshed_5280 alter column the_geom type geometry(linestring, 3365) using st_setsrid(the_geom, 3365);
create index stops_walkshed_the_geom_idx on stops_walkshed using gist (the_geom);

create table stop_walkshed_blocks as
with stop_walkshed_buffer as ( select stop_id, st_buffer(st_collect(the_geom), 10) as geom from stops_walkshed ws where ws.agg_cost <= 1230 group by stop_id
select b.gid, b.geoid10, stop_id, b.geom
from stop_walkshed_buffer swb
join tiger2017.tabblock_ac_3365 b on st_overlaps(b.geom, swb.geom);

(The reason I used a CTE is because I wantd to limit the number of blocks
joined against. I didn’t exhustively test performance against the more
naïve query of

create table stop_walkshed_blocks_2 as
  select b.gid, b.geoid10, stop_id, b.geom
  from tiger2017.tabblock_ac_3365 b
  join stops_walkshed_5280 ws
    on st_overlaps(b.geom, st_buffer(ws.the_geom, 10))
  where ws.agg_cost &lt;= 1230;

as the CTE version ran fast enough and I was running out of time alloted for
this step.)

And Voilà, we have our walkshed.


Now, to calculate the area of the county served.

(select sum(st_area(geom)) from (select geom from stop_walkshed_blocks group by geom)x)
(select sum(st_area(geom)) from tiger2017.tabblock_ac_3365);

Which yields just over 34% of the county by area.

In the next part, we’ll import the cenus records and compute the percent of the
population served. While that seems low, I expect the percent of population to
be substantially higher, as the parts of the county not covered in pink have a
much lower density (and in some cases are very much rural).

Calculating Service Area, Part 1: Introduction and Setup

Date: 20190108

Tags: postgis pg-routing postgres gis gtfs transit census


One of my fellow Allegheny County Transit

(ACTC) members asked how much of the county the Port
serves. After not receiving a
response, I decided to compute it myself. We’re going to define the
service area in terms of both land and people, i.e. percent of the land in
Allegheny County in the service area and also percent of the people in Allegheny
County in the service area.

That still leaves a question of what a “Service Area” actually is. For this
project, I’m going to define the “Service Area” is any Census 2010 Tabulation

that is adjacent to a road that is within ¼mi of a bus stop and
½mi from a busway or trolley
. From
there, I can use the area and population of tabulation block (block) to build
up my statistics.


Tools Required

I’m using Ubuntu 18.04 and have basic build utilities such as gcc,
autoconf, make, cmake, git, &c already installed.


QGIS is a featureful, free (libre and beer,
although paid consulting is available) GIS (Geographic Information
Systems) tool similar in many ways to ArcGIS. I covered a little on
using it in my So You Want to Make a
post. The QGIS Training

contains information (among many, many other things) about how to
connect QGIS to PostgreSQL. It also has sections on Database

and Spatial Database

which I would highly recommend. It’s very fast, well documented, and
easy to use. I cannot say enough good things about it.

To install the latest version (3.x), otherwise the ubuntu repository has the
2.18.x LTR already:

sudo add-apt-repository "deb     https://qgis.org/debian bionic main"
wget -O - https://qgis.org/downloads/qgis-2017.gpg.key | gpg --import
gpg --fingerprint CAEB3DC3BDF7FB45
gpg --export --armor CAEB3DC3BDF7FB45 | sudo apt-key add -
sudo apt-get update
sudo apt-get install qgis python-qgis

Some instructions say to do the following too

sudo add-apt-repository "deb-src https://qgis.org/debian bionic main"
sudo apt-get install qgis-plugin-grass

but they didn’t work for me, nor are they required, so I skipped them.

PostgreSQL, PostGIS, pgRouting

PostgreSQL is an extremely powerful,
full-featured SQL RDBMS. The documentation is top-notch and extensions
such as PostGIS for performing GIS (Geographic
Information System) work and pgRouting for
performing graph algorithms and traversals make it much more valuable. I
truly cannot say enough good things about PostgreSQL.

To install:

sudo apt-get install postgresql-10 postgresql-10-pgrouting postgresql-10-postgis-2.4


mdbtools allows viewing and
manipulating Microsoft Access databases — which the census distributes
schemas as. It is a fork of an older tool,
mdbtools, but that I could get to

To install:

cd ~/projects/external
git clone git@github.com:brianb/mdbtools.git brianb/mdbtools
cd brianb/mdbtools
sudo apt-get install txt2man
autoreconf -i -f
export DOCBOOK_DSL=/usr/share/sgml/docbook/stylesheet/dsssl/modular/html/docbook.dsl


gtfs-sql-importer is a
set of scripts to help import the csv files in the GTFS and generate
some geographic features inside of PostgreSQL.

To install:

cd projects/external
git clone git@github.com:fitnr/gtfs-sql-importer.git fitnr/gtfs-sql-importer
make init
PGDATABASE=census GTFS=~/Downloads/www.portauthority.org/generaltransitfeed/general_transit_CleverTripID_1811.zip make load

I did need to patch the code to handle files with empty lines. However,
this appears to be fixed in the
branch, which might be merged into master soon.

diff —git a/src/load.sh b/src/load.sh
index 6d77846..f2f4278 100644
—- a/src/load.sh
+++ b/src/load.sh
-15,7 +15,7 function import_stdin() # remove possible BOM hed=$(unzip -p “$ZIP” “${1}.txt” | head -n 1 | awk ‘{sub(/^\xef\xbb\xbf/,”“)}{print}’) echo “COPY gtfs.${1}” 1>&2
- unzip -p “$ZIP” “${1}.txt” | awk ‘{ sub(/\r$/,”“); sub(“,\”\”“, “,”); print }’ | psql -c “COPY gtfs.${1} (${hed}) FROM STDIN WITH DELIMITER AS ‘,’ HEADER CSV
+ unzip -p “$ZIP” “${1}.txt” | grep -v “^\s*$” | awk ‘{ sub(/\r$/,”“); sub(“,\”\”“, “,”); print }’ | psql -c “COPY gtfs.${1} (${hed}) FROM STDIN WITH DELIMITER AS ‘,’ HEADER CSV” }



census-postgres-scripts is a set of scripts from
Census Reporter to make quick work of
importing ACS and TIGER datasets.

To install:

cd projects/external
git clone git@github.com:censusreporter/census-postgres-scripts.git censusreporter/census-postgres-scripts

I also made some patches to only download data for Pennsylvania (FIPS ID
42) or the whole US when needed and also to download the Roads and
Tabulation Block geometries, which aren’t included by default in these
scripts. I also store mine in a different location and login to postgres
via my personal user, not the census user.

diff —git a/12_download_tiger_2017.sh b/12_download_tiger_2017.sh
index ea17e88..86084b1 100755
—- a/12_download_tiger_2017.sh
+++ b/12_download_tiger_2017.sh
 -1,11 +1,12 
 # Download census-y TIGER data


-mkdir -p /mnt/tmp/tiger2017
-wget —recursive —continue —accept=*.zip \
+wget —recursive —continue \
+ —accept=tl_2017_42_*.zip \
+ —accept=tl_2017_42003*.zip \
+ —accept=tl_2017_us*.zip \
+ —wait=1 —random-wait \ —no-parent —cut-dirs=3 —no-host-directories \ -e robots=off \
- —directory-prefix=/mnt/tmp/tiger2017 \ https://www2.census.gov/geo/tiger/TIGER2017/AIANNH/ \ https://www2.census.gov/geo/tiger/TIGER2017/AITSN/ \ https://www2.census.gov/geo/tiger/TIGER2017/ANRC/ \
-23,11 +24,13 wget —recursive —continue —accept=*.zip \ https://www2.census.gov/geo/tiger/TIGER2017/NECTADIV/ \ https://www2.census.gov/geo/tiger/TIGER2017/PLACE/ \ https://www2.census.gov/geo/tiger/TIGER2017/PUMA/ \
+ https://www2.census.gov/geo/tiger/TIGER2017/ROADS/ \ https://www2.census.gov/geo/tiger/TIGER2017/SCSD/ \ https://www2.census.gov/geo/tiger/TIGER2017/SLDL/ \ https://www2.census.gov/geo/tiger/TIGER2017/SLDU/ \ https://www2.census.gov/geo/tiger/TIGER2017/STATE/ \ https://www2.census.gov/geo/tiger/TIGER2017/SUBMCD/ \
+ https://www2.census.gov/geo/tiger/TIGER2017/TABBLOCK/ \ https://www2.census.gov/geo/tiger/TIGER2017/TBG/ \ https://www2.census.gov/geo/tiger/TIGER2017/TRACT/ \ https://www2.census.gov/geo/tiger/TIGER2017/TTRACT/ \
diff —git a/13_import_tiger_2017.sh b/13_import_tiger_2017.sh
index 2d00e54..683c39f 100755
—- a/13_import_tiger_2017.sh
+++ b/13_import_tiger_2017.sh
-1,18 +1,9 #!/bin/bash
-cd /mnt/tmp/tiger2017

-# For shp2pgsql
-sudo apt-get install -y postgis
+psql -d census -v ON_ERROR_STOP=1 -q -c “DROP SCHEMA tiger2017 CASCADE; CREATE SCHEMA tiger2017;”
+psql -d census -v ON_ERROR_STOP=1 -q -c “ALTER SCHEMA tiger2017 OWNER TO census;”

-if [ -z $PGHOST ]; then
- echo “You must set PGHOST environment variable to the hostname of the PostgreSQL server to operate on.”
- exit 1
-psql -d census -U census -v ON_ERROR_STOP=1 -q -c “DROP SCHEMA IF EXISTS tiger2017 CASCADE; CREATE SCHEMA tiger2017;”
-psql -d census -U census -v ON_ERROR_STOP=1 -q -c “ALTER SCHEMA tiger2017 OWNER TO census;”
-24,12 +15,12 do one_shapefile=`ls -a $i/*.shp | head -n 1`

# Start by preparing the table - shp2pgsql -W “latin1” -s 4326 -p -I $one_shapefile tiger2017.$tablename | psql -d census -U census -v ON_ERROR_STOP=1 -q + shp2pgsql -W “latin1” -s 4326 -p -I $one_shapefile tiger2017.$tablename | psql -d census -v ON_ERROR_STOP=1 -q # Then append all the geometries for j in $i/*.shp do - shp2pgsql -W “latin1” -s 4326 -a $j tiger2017.$tablename | psql -d census -U census -v ON_ERROR_STOP=1 -q + shp2pgsql -W “latin1” -s 4326 -a $j tiger2017.$tablename | psql -d census -v ON_ERROR_STOP=1 -q done if [ $? -ne 0 ] diff —git a/13_index_tiger_2017.sql b/13_index_tiger_2017.sql index 288d8ac..5421efa 100644 —- a/13_index_tiger_2017.sql +++ b/13_index_tiger_2017.sql -27,6 +27,8 ALTER TABLE tiger2017.ttract OWNER TO census; ALTER TABLE tiger2017.tract OWNER TO census; ALTER TABLE tiger2017.uac OWNER TO census; ALTER TABLE tiger2017.unsd OWNER TO census; +ALTER TABLE tiger2017.roads OWNER TO census; +ALTER TABLE tiger2017.tabblock OWNER TO census; — OGR needs select on geography_columns to do it’s thing GRANT SELECT ON geography_columns TO census;


osm2pgrouting is a tool
that will import OpenStreetMap dumps as a topology (i.e. connects roads
that are connected in real-life) into a database with PostGIS and

To install:

sudo apt-get install libpqxx-dev
cd projects/external
git clone git@github.com:pgRouting/osm2pgrouting.git pgRouting/osm2pgrouting
cd pgRouting/osm2pgrouting
mkdir build
cd build
cmake ..


osmosis is a tool for
manipulating OpenStreetMap dumps. We’ll use it to “crop” the Geofabrik
dump of Pennsylvania to around Allegheny County because osm2pgrouting
requires raw xml, not the Protocol
Geofabrik distributes in. osm2pgrouting also tries to load the whole
dump into memory before performing its work, so the less data it needs
to load, the better.

To install:

 sudo apt-get install osmosis


cd ~/projects/external
mkdir osmosis
cd osmosis
wget https://bretth.dev.openstreetmap.org/osmosis-build/osmosis-latest.tgz
tar xvfz osmosis-latest.tgz
rm osmosis-latest.tgz
chmod a+x bin/osmosis

Database Setup

First we need to create the role/user and database in postgres. I’ve
traditionally used the “census” user as that’s what the Census Report
scripts used, and that’s previously where I started. If you login to
postgres as a superuser, you can create the role and user via the
following SQL.

create role census with nosuperuser login password 'censuspassword';
create database census with owner census;

Alternatively, you could use the following commands (run as the census role).

sudo -u postgres createuser --no-superuser --login --pwprompt census
sudo -u postgres createdb --owner census census

Once created, we can login via your favourite SQL interface, including psql.

psql --username census -password census

and execute the following:

create extension postgis;
create extension hstore;
create extension pgrouting;
create schema osmpgrouting;
create schema gtfs;

Data Needed

To compute all of this, and because I haven’t loaded this data onto this
laptop previously, we’ll need a few pieces of data. We’ll cover
downloading and importing the data in the next post.

TIGER 2010 Tabulation Blocks

The Census provides the geometries corresponding to their summary data, along
with some additional data, such as basic water features and roads.
These geometries are part of a collection called “TIGER”(Topologically
Integrated Geographic Encoding & Referencing). I will be using many of
these datasets at a later time, so I am going to import all of them; If you’re
following along, feel free to only download what you need.

You can follow the instructions that come with the
so an abbreviated version is presented here. As I mentioned before, feel
free to only download the TABBLOCK, as that’s all we need for this
project and to modify it to only download your state’s data.

cd projects/external/censusreporter/

Road Topologies

While TIGER does provide road information, it’s actually not the most
useful for this purpose because it does not differentiate between an
at-grade or grade-separated crossing (e.g. an intersection vs a bridge).
This is the essence of a “topology”: the connections of the roads
themselves. For this, we’re going to use the
OpenStreetMap data. Luckily, the folks
over at Geofabrik in Germany, produce
up-to-date exports based on administrative boundaries, such as US

First I grabbed the dump for Pennsylvania.

cd ~/Downloads
wget https://download.geofabrik.de/north-america/us/pennsylvania-190107.osm.pbf

Then, I used osmosis to convert it into XML from the Protocol
format and also
to “crop” the input to osm2pgrouting to just the county. The
coordinates I’m using were randomly taken to be outside the county in
all directions. It also seemed like osmosis didn’t like ~ or
$HOME, so I used full paths to my files.

osmosis --read-pbf file=/home/jim/Downloads/pennsylvania-190107.osm.pbf --bb left="-80.456" right="-79.542" top="40.787" bottom="40.119" --write-xml file=/home/jim/Downloads/pennsylvania-190107-ac.osm

Now we can actually import the data! Before I decided to trim the input to
osm2pgrouting, this operation would take quite a while and then fail with a
std::bad_alloc, i.e. out of memory. After cropping, it didn’t take as long,
nor use as much memory, but more importantly, finished.

cd projects/external/pgRouting/osm2pgrouting
./osm2pgrouting --conf ../mapconfig.xml --addnodes --attributes --tags --dbname census --username census --password censuspassword --schema osmpgrouting --file ~/Downloads/pennsylvania-190107-ac.osm

Something I noticed when viewing the data was that there were a lot of
lines from how I “cropped” the region; I suspect they’re connections of
roads with the same name, i.e. state routes, interstate highways, &c. I
deleted them by filter on values with a really large cost, but there
were still a handful of outliers that needed to be cleaned up afterwards
by hand.

echo "delete from osmpgrouting.ways where cost &gt; 1" | psql census

Bus Stops

The General Transit Feed
is produced
my most transit agencies and is consumed by services such as Google
Maps, Bing Maps, Transit App, &c. It contains data such as stops,
routes, trips, fare, normal schedules, special schedules, &c. The Port
Authority also produces a GTFS

I started by grabbing all of the GTFS and GIS data from the Port Authority,
although I only needed the
file and could have only downloaded that. Remember, you may need the patched
version of the code or the add-cols branch for the import to succeed.

cd ~/Downloads wget --recursive --continue -e robots=off
--force-directories --no-parent
http://www.portauthority.org/generaltransitfeed/GIS/ cd
~/projects/external/fitnr/gtfs-sql-importer make init PGDATABASE=census
make load

Census 2010 Population Data

While I could use more recent ACS estimates, I’m choosing to use the
decennial census because I would like my area estimations to use
tabulation blocks, and not block
. Block
Groups are groups of blocks in the same Census Track, and are used to
aggregate data over a larger geographic region. That’s useful for a few

  • Fewer data points if you don’t need the geographic specificity of a block. (Let’s face it, 90+% of the time you don’t; tracts will work well-enough.)

  • The ACS is a sampled survey, not an enumeration like the decennial census, and the error would be too high when examining blocks.

  • Statistical Disclosure Limitation whereby data that could be used to identify a person may be edited, especially in geographically fine-grained data such as block-level summaries. However, “By agreement with the Department of Justice (2000) the Census Bureau will provide exact counts at the Census block level for the following variables:

  • Number of people: total, age 18+ (voting age) , and less than age 18,

  • Number of vacant housing units, and

  • Number of householders, which is equal to the number of occupied housing units”

Since this is a little more complicated to import, and we need to generate our
walksheds first, anyway, I’m going to put off the import until a later post.

Next Steps

Part 2: Defining the Walkshed

The Death of jimktrains

Date: 20180514

Tags: social_media

Yesterday I got into yet another argument about the GDPR on hackernews.
It ended with me being rate limited to some long amount of time and no
actual discussion except people insinuating that my position is wrong
because of the GDPR. This is a fundemental misunderstanding in how
arguments work. I can find the GDPR to be silly and baseless while still
complying with it. Their arguments boil down to, in another context,
marijuana is bad because the law says so. That doesn’t help expand the
conversation more. I’m not talking about compliance with the law, I’m
talking about the premise of it.

(I’m working on a longer post discussing my thoughts on the GDPR. I view it in
much the same way as previous cookie legislation: good intentioned, but
ultimatly poorly designed. I feel that, unfortunatly, the onus is ultimatly on
us, the consumers of said services to understand what we’re sending to these
services and how it might be used. Moreover, we need to understand and be fully
aware that anything we give someone else is no longer ours to control. I’ve
deleted my accounts from a few social media outlets, but I’m under no delusion
that they’re actually gone. Even the best of intentioned places will have
remenants of me in backups and logs at the very least or at “worst” just set a
“deleted” field on my record(s). I gave this information away and no longer
believe I am in ultimate control of said data.)

I realized in that I don’t gain anything meaningful from social media.
It brings out the worst in me, and becomes a huge time sink that I can’t

I hardly use Twitter anymore because, for me, the restricted length of a
tweet makes discussion difficult to near meaningless. The stream of news
is redundant with 1st-party sources I already check, and many tweets are
often linking to articles of questionable origin.

I used Facebook sparingly. While it is extremly nice to see some of
the heartwarming posts from actual friends, ultimately, I don’t feel like
that furthers our friendship. Those interactions are few and far
between, and in contrast to the constant stream of nonsense and
questionable sources posted by others. I was part of a community message
board, but perhaps I can coax them elsewhere or simply have Anne let me
know if there is anything interesting there.

LinkedIn was nothing but recruiter spam and a deletion I should have
done a very long time ago.

HackerNews provides no way to delete an account or older posts. I
emailed the mods and requested my account be deleted. We’ll see what
ultimately happens. It’s funny that I was yelled at because I asserted
websites should control the data you send them and the group-think is
that the GDPR changes that long-standing (and practical and realistic
convention) and that I’m just plain wrong, and yet the website we’re
arguing on provides no means for the users disagreeing with me to
control “their” data (hint: it’s not really and ultimately theirs, no
matter the law).

So anyway, please don’t try to find me on social media. If you’d like to
contact me, send me an email, gchat, or text.

Anger in Politics

Date: 20170615

Tags: politics

It saddened me deeply to learn about the (shooting at practice for the
Congressional Baseball Game
I sympathise with the frustration the shooter feels with our political
system, but I cannot condone, or even fathom, using violence in an
attempt to cause political change. Instances where domestic terrorism,
because even if the media won’t call it and him that he is a terrorist,
are actually productive are few and very far between.

Even the US Revolution, was essentially 13 (now independent) Countries
waging war against another country; which may have been the only thing
that kept us from plunging into dictatorship and further (immediate)
war. Even the first French Revolution left France with an Empire for
many decades. The Russian February Revolution led to the October
Revolution which led to Lenin and Stalin. We’re still seeing the
aftermath of Bolivar’s Revolutions in South America. Israel is hardly
peaceful. Africa is still rife with civil wars and mismanagement. The
Cultural Revolution in China brought neither communism nor democracy.
Violence in politics never begets peace. It’s those who are brave enough
to sit and govern fairly instead of trying to consolidate their own
power that end cycles of violence and usher in stability.

During the Occupy movement, especially as abuses of power
I began to seriously wonder if armed revolt was going to be necessary to
combat what felt like a willingness of the government to use force
against peaceful protest. While there was undoubtedly an overuse of
violence against protesters, answering with violence would have been the
worst possible decision. Violence further justifies violence.

It angers me beyond belief that calls for political action are being
associated with and put on par with putting bullseye on Democrat faces
ending with one of them being shot and 6 others killed. There are a lot
of angry people because Republicans are doing everything in their power
to make life more difficult for the poor, and easier for the rich. The
actions of one deranged man pain all this anger as inappropriate.

It is appropriate to be angry given everything the Republican base in
Congress is doing. We should be furious! We need to channel that anger
into political action: call and write your rep and senator; get involved
in campaigns (local, state, and federal); have civil, constructive
debates with people;VOTE.

Don’t let anyone tell you your anger is invalid or unfounded or
dangerous. Be angry! Be furious! You’re getting the short end of the
stick almost every time the Republican base votes. Use that anger
constructively, not destructively. Violence is an attack on the very
principles our republic is founded on. Non-violent political action is
a reƤffermation of those values.

Political Civility

Date: 20170507

Tags: political-theory

I drafted a letter to my representive Tim
about his support of the
While it wasn’t profanity ridden, it was lacking in civility. It ended

May you rot in hell, having to live and relive the suffering that the passage of this bill, if it passes the Senate, would inflict upon nearly all of your constituents. Over and over, reliving every mental health crisis, every overdose, every preventable death I wish to pass before you. You represent us, not the Republican party.

and contained lines like

Your lack of understanding of the ACA and the AHCA, your rush to push
this through without CBO scoring, and your complete apathy for your
constituents leave me disgusted with your spineless, wretched, soulless
being not worthy of the title `human’.

It wasn’t the most polite letter to say the least. It was a (partial)
release of the frustration I’ve been feeling with our system and the
and the obvious lack of care for what’s best for their

in order to bend to party politics. It’s beyond frustrating. Our
elected officals appear to have no oversight and no repurcusions.

We’re at a point where I didn’t care that the letter I drafted was
hateful and incindiary because I didn’t even believe it would be read
beyond the point they realized I was against Murphy’s actions. This
caused a rift in me and I sent it to my wife and some of my closest
friends. They all said the same thing, more-or-less, “While I agree
and understand, it’ll be discarded as hate mail with no thought given to
your message.” That’s true. And even if I believe it won’t be read, I
should still strive to uphold the virtues of civil discourse. (I did
draft and send a less incendiary letter which I’ll post soon.)

Letting our political discourse be driven by hate, anger, emotion, and
team-psychology is an enormous problem, especially right now. The
hatred that Republicans felt towards Obama resulted in his
Constitutional Right to nominate a SCOTUS Justice to not even be
entertained. (Hearings weren’t even held! The Republicans controlled the
Senate and could have simply voted No.) It led to the disastrous AHCA,
which isn’t even an attempt to fix the problem, only to strike back at
the Democrats and Obama. The hatred many liberals are espousing
currently isn’t the solution. It’s not the way forward.

If all we expect from our opposition is hate and rhetoric, we maintain
no reason to actually listen to what they’re saying. When we fail to
listen, empathize, and understand (note, I didn’t say agree with or
capitulate to) other points of view, society, as a whole, becomes the

I can pick many examples from the past few months. The day following
the election I couldn’t do anything. I was so wrapped up in the hate
that 46% of the country rallied behind a man who brags about sexual
assault, espouses racist views, and has no real understanding of policy
that my chest hurt and I was red and hot for days. I was mortified,
angry, and hateful.

Over the next few months, I’ve slowly learned a different understanding
of that 46% of America. While I still vehemently disagree that any reason
justifies supporting Donald Trump, maybe I have the privileged of the
moral high ground. I’m not destitute. I haven’t just lost my job of
decades. I haven’t had a huge amount of uncertainty injected into the
core of my life. Clinton told these people that she supports better
educational programs and other means of bringing them into different
industries and those industries to them. I can only image that all these
people heard was “we’ll try something, but you’re going to have to deal
with the uncertainty”. Trump offered a very clear “You’ve been wronged;
I will right it and get your old job back.” The emotional trauma of
having what have become the core of your life ripped out from you isn’t
something I’ve ever lived through. However, by trying to better
understand these people’s position, I can better understand the reality
of the problem faced. They aren’t a thought experiment or statistical
value — they’re people who are fighting for their lives and livelihood.

That said, for the most part these people are not who I get to interact
with often. (Not that I avoid them. I’m a techie, as are many of my
friends and most of civic groups I attend have like minded people. We
all have a natural, self-imposed filter bubble. It’s not ideal, but it’s
practically what happens, and we’re all best to break out of it. We,
including me, should break out much more often.) I do run the risk of
simply making things up and creating sob stories. I run the risk of just
being silly. I don’t think I am, at least entirely though, based upon
my interactions with and interviews I’ve read with this group of people.

Do I think they made the choice that was best for their self interest?
No, absolutely not. I think they made the worst choice they could have
possibly made. However, I’m now able to have a conversation that doesn’t
begin “How could you possibly have supported someone who brags about
sexual assault and is so stupid he doesn’t even see the racism and
violence he spouts?” Perhaps now we could discuss their job, their
prospects, their thoughts about moving forward. There will be
idealogical misses — I’ve met many coal miners that believe Obama’s EPA
put them out of a job, not cheap oil and natural gas — but a
conversation could still happen. And that is the important part: A
conversation happened.

Moreover, name calling isn’t useful. Not only does it create barriers
to having the other person talk with you, but it’s more-often-than-not
coming from a place of hate. When someone complains about “immigrants”
or “Mexicans” taking their job, they may not be racist, and it’s
counterproductive to call them one. They may have actually lost their
job to cheaper labour here in the States, or even in their hometown.
While they’re description of the problem may grate ears or sound like a
dog whistle for something it’s not, it might not be factually wrong.
However, maybe the conversation could be turned from immigration to if
the problem is really immigrants or if the miniscule labour protects and
weak unions had a role in their job loss. Cheap labour is the
immediate, visible cause — but not necessarily the way to a solution.
We need to move beyond our politically inspired hatred to understand
that your fellow citizen is actually facing a problem; writing them off
as racist or over-exaggeration is no way to come to understand the
problem, let along work towards a solution.

Our hate and anger build walls just as terrible as Trump’s proposed
border wall, and, honestly, at a much higher cost to our society.