About Me!

This blog is about my musings and thoughts. I hope you find it useful, at most, and entertaining, at least.

Résumé [PDF]

Other Pages

Quotes

Links

Oak Island

Items for Sale

Presence Elsewhere

jim@jimkeener.com

del.icio.us

Twitter

Facebook

LinkedIn

GitHub

BitBucket

Obergefell v. Hodges

Date: 2015-06-28

Tags: politics

No union is more profound than marriage, for it embodies the highest ideals of love, fidelity, devotion, sacrifice, and family. In forming a marital union, two people become something greater than once they were. As some of the petitioners in these cases demonstrate, marriage embodies a love that may endure even past death. It would misunderstand these men and women to say they disrespect the idea of marriage. Their plea is that they do respect it, respect it so deeply that they seek to find its fulfillment for themselves. Their hope is not to be condemned to live in loneliness, excluded from one of civilization’s oldest institutions. They ask for equal dignity in the eyes of the law. The Constitution grants them that right.

- Obergefell v Hodges, Opinion of the Court, p28

Surveillance

Date: 2015-03-21

Tags: surveillance civil-rights

“Here’s the problem with this kind of thing; for the people who don’t seem to get it.

It isn’t a matter of whether or not you’ve done anything wrong. It isn’t a matter of having anything to hide.

It’s a matter of peace of mind. It’s a matter of exercising your rights without having someone snooping into whatever you do. You think what you do is innocent? That’s great. But it’s not you that you have to worry about. It’s the interpretation of others that matter here. Others who may look at the books you read or the sites you visit or the causes you support or the things you say and read something sinister into it because of their own paranoia and prejudices.

Even if nothing comes of it, the knowledge you’re being watched is enough to put most people on edge. You may find yourself not reading that book or not visiting that site or supporting that cause or saying that thing … all because you worry about what someone else will think.

You shouldn’t have to. That’s the issue.” author unknown

It deeply saddens me when I encounter people who feel that blanket surveillance keeps us “safe”. I, and many many many others feel threatened and invariably less free because of surveillance. The fear of worrying about how other people perceive your actions is a terrible burden — so terrible we constantly tell our children and ourselves “not to worry what other people think”. However, as that becomes impossible, people will inevitably self-censor in order to not be noticed; which may be the worst possible thing that can happen in a “democracy” (represenative or otherwise).

“With the first link, the chain is forged. The first speech censured, the first thought forbidden, the first freedom denied, chains us all irrevocably.”

Adding NAIP (and MrSID format in general) to QGIS

Date: 2015-02-12

Tags: qgis gis orthoimagery

I happened across 1-meter National Agriculture Imagery Program (NAIP) imagery on PASDA. Excitedly I downloaded the zip file for my county and uncompressed it. Inside was a shp file and a very large sid file. The shp contains what appear to be the photography tracks and not very useful for my purposes (looking at pretty pictures!). I attempted to add the sid as a raster layer only to be greeted by an error telling me that it’s an unsupported filetype.

What is this format? file was no help, just telling me it’s “data” I hit Google to attempt to figure out what this mystery format was and found out it was called MrSID (Multiresolution Seamless Image Database) an image compression format developed at Los Alamos and patented by LizardTech. (Aside: Why a government agency is giving out data in a proprietary format when suitable open alternatives exist is beyond me.)

The friendly denizens of #qgis confirmed what I found and pointed me at a command line tool from LizardTech that allows one to convert MrSID files into a more open format. (While I generally don’t run proprietary code on my laptop, I sucked it up and did the conversion).

geoexpress_commandlineutils_linux/linux64/GeoExpressCLUtils-9.1.0.3982/bin/mrsidgeodecode -i ortho_1-1_1n_s_pa003_2013_1.sid -o ortho_1-1_1n_s_pa003_2013_1.tiff

BEWARE: This will increase the size of the file 15 to 20 times.

To get some idea of what’s stored in this file you can run

gdalinfo ortho_1-1_1n_s_pa003_2013_1.tiff

and see that there are 3 bands (Red, Green, and Blue) and actual size of the image.


Driver: GTiff/GeoTIFF
Files: ortho_1-1_1n_s_pa003_2013_1.tiff
Size is 59112, 56449
Coordinate System is `'
Metadata:
  TIFFTAG_XRESOLUTION=1
  TIFFTAG_YRESOLUTION=1
Image Structure Metadata:
  INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,56449.0)
Upper Right (59112.0,    0.0)
Lower Right (59112.0,56449.0)
Center      (29556.0,28224.5)
Band 1 Block=59112x1 Type=Byte, ColorInterp=Red
Band 2 Block=59112x1 Type=Byte, ColorInterp=Green
Band 3 Block=59112x1 Type=Byte, ColorInterp=Blue

This tiff, despite being 10GB loaded just peachy in QGIS. However, I wanted some of that disk space back, so I was told to convert it to a jpeg image. Additionally, the file does not contain it’s own projection (and probably is relying on the prj file in the same directory), so I decided to add that into the jpeg.

gdal_translate -co TILED=YES -co COMPRESS=JPEG -co PHOTOMETRIC=YCBCR -co JPEG_QUALITY=85 -a_srs ortho_1-1_1n_s_pa003_2013_1.prj ortho_1-1_1n_s_pa003_2013_1.tiff ortho_1-1_1n_s_pa003_2013_1.jpeg

gdalinfo shows that we converted it, it’s the same size, and it’s the correct projection.


Driver: GTiff/GeoTIFF
Files: ortho_1-1_1n_s_pa003_2013_1.jpeg
Size is 59112, 56449
Coordinate System is:
PROJCS["NAD_1983_UTM_Zone_17N",
    GEOGCS["GCS_North_American_1983",
        DATUM["D_North_American_1983",
            SPHEROID["GRS_1980",6378137,298.257222101]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",-81],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]]]
Metadata:
  AREA_OR_POINT=Area
  TIFFTAG_XRESOLUTION=1
  TIFFTAG_YRESOLUTION=1
Image Structure Metadata:
  COMPRESSION=YCbCr JPEG
  INTERLEAVE=PIXEL
  SOURCE_COLOR_SPACE=YCbCr
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,56449.0)
Upper Right (59112.0,    0.0)
Lower Right (59112.0,56449.0)
Center      (29556.0,28224.5)
Band 1 Block=256x256 Type=Byte, ColorInterp=Red
Band 2 Block=256x256 Type=Byte, ColorInterp=Green
Band 3 Block=256x256 Type=Byte, ColorInterp=Blue

In #qgis I was told that it’s usually prudent to precalulate overviews (which are similar to zoom-levels in a TMS). This calculates an image pyramid, but not exactly that generates images that can be used to quickly display the image when zoomed in and out without having to compute it from the base blocks every time.

gdaladdo -r gauss ortho_1-1_1n_s_pa003_2013_1.jpeg 2 4 8 16 32 64 128 256 512

Will compute overviews for half, quarter, eight, &c of the entire image (Note that these are powers of 2, as that works best because it becomes easy to calculate the size of the pyramid). gdalinfo shows that we’ve added overviews.


Driver: GTiff/GeoTIFF
Files: ortho_1-1_1n_s_pa003_2013_1.jpeg
Size is 59112, 56449
Coordinate System is:
PROJCS["NAD_1983_UTM_Zone_17N",
    GEOGCS["GCS_North_American_1983",
        DATUM["D_North_American_1983",
            SPHEROID["GRS_1980",6378137,298.257222101]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",-81],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]]]
Metadata:
  AREA_OR_POINT=Area
  TIFFTAG_XRESOLUTION=1
  TIFFTAG_YRESOLUTION=1
Image Structure Metadata:
  COMPRESSION=YCbCr JPEG
  INTERLEAVE=PIXEL
  SOURCE_COLOR_SPACE=YCbCr
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,56449.0)
Upper Right (59112.0,    0.0)
Lower Right (59112.0,56449.0)
Center      (29556.0,28224.5)
Band 1 Block=256x256 Type=Byte, ColorInterp=Red
  Overviews: 29556x28225, 14778x14113, 7389x7057, 3695x3529, 1848x1765, 924x883, 462x442, 231x221, 116x111
Band 2 Block=256x256 Type=Byte, ColorInterp=Green
  Overviews: 29556x28225, 14778x14113, 7389x7057, 3695x3529, 1848x1765, 924x883, 462x442, 231x221, 116x111
Band 3 Block=256x256 Type=Byte, ColorInterp=Blue
  Overviews: 29556x28225, 14778x14113, 7389x7057, 3695x3529, 1848x1765, 924x883, 462x442, 231x221, 116x111

Now, how does this compare with the original SRID in terms of size?

Format Size Relative Size
MrSID 637M 1
TIFF 9.4G 15
JPEG 708M 1.1
JPEG (with overviews) 1.6G 2.5

So, a little larger, but natively supported by F/OSS tools such as GDAL and QGIS.

In terms of quality, I haven’t noticed a difference.

Happy Orthoimagerying!

So You Want to Make a Map

Date: 2014-09-13

Tags: GIS QGIS Census TIGER

So you want to make a map but have no experience in doing so and don’t know where to start? Well you’ve come to the right place! In this tutorial I will show you where to find some basic spatial data, how to use it, and some tricks when rendering it.

Data Background

Firstly, we need data to make the map with. I’m going to be using the TIGER data from the US Census Bureau. They have vast troves of spatial data (among the even more vast troves of data that you normally think of when I say “census”). All of this data is organized by FIPS codes.

FIPS codes “combine” to produce longer codes for more specific things. For instance, Pennsylvania is 42 and Allegheny County is 003 in Pennsylvania, so the unique code for Allegheny County, PA is 42003. The following chart is the organization of all the data.

Taken from “TIGER Tchnical Documention, Chapter 3”:https://www.census.gov/geo/maps-data/data/pdfs/tiger/tgrshp2013/TGRSHP2013_TechDoc_Ch3.pdf

As you can see, States have counties, and counties have Census Tracts, which in turn have Block Groups. You’ll notice that School Districts, Congressional Districts, and State Legislature districts are contained within a State (which makes sense, e.g. a school district can span county and municipality lines). Note the “County Subdivisions” under Counties; this is also known as municipalities (e.g. towns, cities, townships, boroughs, &c). While Census Tracts are normally contained by municipalities, they are not always.

If codes are additive, how does one distinguish between a county subdivision and a census tract? Good question. The answer is that they’re different lengths:

  • Pennsylvania is 42 (2 characters)
  • Allegheny County is 42003 (5 characters)
  • Pittsburgh is 4200361000 (10 characters)
  • Tract drawn around Schenley Park is 42003980500 (11 characters)
  • The Block Group inside that tract is 420039805001 (12 characters)

So, what happens when new things are added? Great question! The Census Bureau keeps these lists in alphabetical order, so they can’t just (well, don’t want to) add it to the end of the list. What they do is skip codes between existing items. Search for “Pennsylvania” in the FIPS look up link below and you’ll notice:

County FIPS Name
42001 Adams County
42003 Allegheny County
42005 Armstrong County
….. …..
42099 Perry County
42101 Philadelphia County
42103 Pike County
42105 Potter County
….. …..

So if, say, Pittmesh County, it would become 42104. If Pennsylvania then added a “Pickle County”, well, it would have to be bumped to the bottom of the list. Sorry Pickle County.

Note that “ZIP Code Tabulation Areas” is off to the side under nation. ZIP codes are rarely something you want to be doing analysis with. ZIP Codes aren’t polygons (they aren’t defined by some space on the surface of the Earth), they are instead lines representing mail routes served by a Post Office. The ZCTA files (and others that you’ll find online) try to convert them to polygons the best they can, but it’s not always possible to do with 100% accuracy. They are also at the national level because they can and do span state boundaries.

You’ll note that things like roads and railroads aren’t listed. They are normally broken down by county for ease of use, but are more-or-less lines and don’t belong in this hierarchy (or could be stuffed in at the national level, I suppose). I will note, though, the road and railroad files aren’t always 100% accurate, though I find them good-enough for everyday usage. When they aren’t, I try to see if the municipality I’m working with has their own files or search for the state’s spatial repository (Pennsylvania’s is PASDA) every state has one.

OK, so now we know how this data is organized, let’s get files! First thing is to find the FIPS code for your county. The Census provides a nice tool to get that; just select your state and then look for your county name. (There are many other tools available everywhere as well, just search if you don’t like this one). For this tutorial I’ll be focusing on Allegheny county, but feel free to use your own county when attempting this.

Getting Data

Let’s use the 2014 TIGER/Line Shapefiles (go to that link and then click FTP site). The TIGER page also contains information and documentation about the files and their contents if you’d like to learn more.

First let’s grab the COUNTY file. There is only one because it’s small even thought it contains every county in the union.

Next, lets grab county subdivisions, COUSUB for the state of interest. For this tutorial we’ll use Pennsylvania’s, which is named tl_2014_42_cousub.zip because Pennsylvania is state 42.

Now let’s do ROADS. Again we’ll use Allegheny County’s, named tl_2014_42003_roads.zip because Allegheny County is 42003.

We’ll also grab the RAILS file because I like trains. (No really, I do.)

Once all of these have been downloaded, unzip them each into their own directory. The thing about shapefiles is that they’r not really a “file”, they’re at least 3 (geometry file, database file, and an index file. Usually there is a projection file too, because you need to know this to use it.) You need to keep all the files that appear when you move them around.

Shapefiles contain just that, shapes. They are a form of “vector” format, similar to SVG if you’re familiar with those, that lets you zoom in, rotate, and move the map without ever losing quality. How? Instead of storing a line as “There is a point at X. There is a point at X+(just a little bit). There is a point at X+(just a bit more yet)….” it stores “There is a line from X to Y” and the computer figures out how best to draw it. The other format (“There is a point…”) are known as “Raster” formats, like JPEG, GIF, PNG, and TIFF among many others are raster format’s you’ve probably used (there are jpgs and pngs on this site and don’t tell me you haven’t seen any cat gifs

(What are projections? Well, they’re how you convert points on the globe to points on a flat surface. There are many different ways to do this. WGS 84/EPSG 4326 is what’s commonly known as “GPS Coordinates”. The census data you just downloaded is in EPSG 4269 which is a more accurate version for use in the United States and Canada. These are all in degrees.

Some projections are in feet. For instance EPSG 2272 can be used in the lower half of Pennsylvania for very good accuracy. (Feet from what? The lower left corner of the state of course!) Also since it’s in feet it makes distance calculations easy!

For the rest of this tutorial you won’t need to care about any of this, QGIS will read the project file with the shape files and figure it all out. I just wanted to let you know there is a whole other (confusing) world out there.)

Viewing

Download and install QGIS, which is a Free software package that is similar to ArcGIS but free, as in beer and speech. If you ever need help, QGIS has many places where you can turn for community support which I’ve always found very helpful. (If you’re a company, Commercial Support for QGIS is also available, but for you and I IRC, the mailing list, or gis.stackoverflow are adequate (Trust me, I’ve learned a ton and have gotten help solving I can’t count how many issues from them).

Once installed, open it. You should see a screen similar to the one below.

We’ll be using QGIS 2.4.0 – Chigiak for this tutorial.

Counties

To add a shapefile layer to our canvas you can click on the little V-looking-with-a-green-plus button on the left (), or you can go to Layers > Add Vector Layer in the menu. Select the .shp (the file that ends in the shp extension) from where you unzipped the COUNTIES file above. Once loaded, you should see all the counties in the country.

Using the Zoom tool () draw rectangle around the area you’re interested in. Feel free to zoom in little by little. You don’t have to find it right away.

Until you’re mostly over the county you want.

To view information about a feature, use the “Identify Features” () button and then click on the county you’re interested in.

Now you can see all the information associated with this feature. Note the “STATEFP” is 42 and “COUNTYFP” is 003. Also, the “GEOID” is 42003, which is what we expected for Allegheny County. You can look up the meaning of some of the other fields in the TIGER documentation. (Click the “Hand” tool () to deselect everything and continue to pan the map as usual.)

Now, as an exercise in being able to query and filter your data, and to remove some clutter, right-click on your layer and go to properties.

Now, click “Query Builder”, and you should be presented with a screen like this.

You’ll note that it’s displaying all the fields on the left. Select one and click “Sample”. (That’s one way just to check if a field is what you think it is. Use the “Attribute Table” I’ll show you later to do a more in-depth looking-at of the data.)

We want the “GEOID” column that contains the FIPS code, and from the sample data, you can see that those look like county FIPS codes. The filter expression language is similar to SQL, but if you don’t know SQL, that’s OK. We want the county with a GEOID of 42003, so we enter COUNTY” = ‘42003’ (NB(Nota bene): Use double quotes for fields and single quotes for values. Failure to do so will result in errors.) You can click “Test” to see how many results your query finds and if it’s valid, useful for a quick, well, test, of your query.

In this case we want only a single result because that ID should be unique!

Click OK on the test popup and then on the main dialog to gt back to the layer properties. Then Click OK to go back to the main window.

Now, right click on the layer and go to “Zoom to Layer”.

Municipalities

Now, add the county subdivision you downloaded just as you did the county file originally.

So great, now our county is covered. Make sure your municipality layer is selected and then, identify a feature, just like you did for the county (“Identify Feature” only identifies features in the currently selected layer. For fun, select the county layer and click where your county should be; you’ll see it become highlighted).

If you remember, municipalities are part of counties. In that info box, you’ll notice that the “STATEFP” is 42 and “COUNTYFP” is 003. What we can do is filter the municipalities to show only the ones in our county! (With the query STATEFP” = ‘42’ ANDCOUNTYFP” = ‘003’

As you just saw, the order of the layers is the ordered rendered; those lower in the list are on the bottom. Drag the county layer above the municipality layer, and you’ll see all your municipalities be covered by the county.

What I want to do now is to make the county layer have no fill and a think blue border. Go back to the layer properties for the county, and click on the paint brush on the left side (), the second in the list, to get to the Style properties.

Now select “No Brush” for the fill. Note some of the other brush styles.

Once you’ve done that, click on the border color and change it to blue and the “Border Width” to 0.75.

Then click OK

Roads

Now add the ROADS shapefile that we downloaded.

Let’s zoom in to a place of interest; in this case, let’s do Downtown Pittsburgh.

Railroads

Let’s take a quick detour and show the railroads now. Load that file.

If your computer is like mine, it picked a terrible color for them. Let’s change it to, instead of being a line, being a line with ticks like on many maps. Go back to the layer style.

For the “Symbol layer type” use the drop down to select “Marker Line” and then select “Simple Marker” in the tree to the left.

Select the “+” marker. Then, in the tree to the left, select “Marker Line” and set the interval to be 2.

OK back to the main screen.

Better!

Back to roads

Select the road layer, and then use the “Identify Feature” tool to select an interstate highway.

Now select a local road.

You’ll notice the “RTTYP” is “I” for the interstate and “M” for the local, or municply-owned, road.

Go to the style dialog for the road layer. At the top where the drop down reads “Single Symbol”, click it and select “Categorized”. Then for the “Column” select “RTTYP”, followed by clicking the “Classify” button under the big white area.

You’ll see the different values

Value Owner/class
C County
I Interstate
M Manciple
O Other
S State
U Unknown

Right click on each value but “I” and change it back to blue. Also, delete the first entry, the one with no value.

Now change the color of the interstate to orange and the width to 0.7. Ok back to the main dialog.

You may notice, like you can on part of 279 (currently 376), that some interstates have blue on them. This is because there are multiple shapes in the same spot with different records; in this case it’s being listed as a few different roads with different names. Life is messy, unfortunately. There are ways to merge records, but I’m not going to get into that now.

One easy method of solving an issue like this is to “Duplicate” from the context menu for the layer and show all the records you want in one style in one and the other in the other (using the delete function like earlier); since layers are rendered in order place the interstate above the local. (NB: Duplicated layers aren’t shown by default; check the checkbox next to it to show it.)

Another advantage of duplicating the layer is that we can label one layer and another (Currently QGIS has no way of only labeling certain features and not others in a layer).

Go to the properties for the layer, and click the “Label” pane (). Check “Label this layer with” and then select the “FULLNAME” field. Next, click “Rendering” and then “Label Every part of mulit-part feature”. Click OK.

Sometimes the label-placement engine is a little finicky, especially with very long roads. Try moving the map around a little (by clicking and dragging with the hand tool) to see if that’ll place labels where you want.

Now for a little exercise.

  • Duplicate the road layer.
  • Filter the layer so that only roads with FULLNAME is “Forbes Ave” or FULLNAME is “5th Ave” are showing (Remember what I said about quotes in queries).
  • Make this layer’s style a “Single Symbol” of a blue-green line 0.5 in width
  • Label this layer with the full name

More Data!

Since our tutorial deals with Pennsylvania, we’re going to use PASDA to find some additional information. Every state has a spatial data repository. Find yours; find some data; make some cool maps!

At PASDA, do a search for Allegheny and download the Parks file (ftp download and then uncompress it into it’s own directory. Then, you guessed it, add it to our map.

Once Added, go to the Style dialog to a green; something that says “park”! (Also, while you’r mucking around with styles, go to the style dialog for the municipalities and make it “No Brush”, just like you did for the county.

The City of Pittsburgh’s GIS team in city planning has the best “water” file I’ve seen. I honestly find it very difficult to get good GIS info on rivers. If you know where to find it, please tell me! Anyway, download and add the water file as before; and make it a nice water-blue.

Go back to PASDA and download the “pools”: file for Allegheny County.

It’ll be a bit lack luster when we add it. Go to the style dialog and set a “Centroid Fill” for the “Symbol layer type”.

Then set the “Symbol layer type” to “SVG Fill”.

Finally, select the “Swimming” icon from “gpsicons” from the really nice library of icons QGIS comes with. In order to see the icon better, set it’s size to 8.

Click OK and admire your work!

Next time

In the next installment, I’ll show you how to use the print composer to make nice printouts of your maps, and go over some more data querying.

Analyzing Bus Routes to Find Transit Deserts

Date: 2014-09-11

Tags: transit transportation Pittsburgh PAT Allegheny_County GIS topology QGIS GRASS PostGIS

I have recently become involved with Pittsburghers for Public Transit where I’ve volunteered to help make maps and do spatial and socio-economic analysis to help them better understand the problems areas face. One of the first things I did, part as a demo and part to help provide a little more information for their Baldwin campaign. I will cover importing and using Census data and importing it and TIGER data in a later entry; here I want to focus on topographic analysis done with part of the TIGER data and PAT‘s GTFS data to identify area that are further than a given threshold from a public transit stop.

The first method I tried was to simply draw buffers, everything within a radius of a point, around each bus stop. Since buffers don’t consider anything but way-the-crow-flies (geodesic) distance they included parts of communities that can’t be accessed in anywhere near that distance. Take the following two maps (roads in orange, busstops as red dots); the first shows the geodesic distance from the closest busstop to a road. The second shows the shorted on-the-road (topographic) distance to it. An additional point, the space between the between the road and the busstop is not walkable (too steep), moreover, people should not have to walk through muck to get to their bus.

What I needed was a way to do Topographic or Network instead of straight geodetic analysis. I found that GRASS GIS is capable of doing the analysis I needed, and decided to give it a shot. I followed a tutorial, Basic Network Analysis With GRASS that was very close to what I wanted to be able to do. However, I kept getting nonsensical answers.

I eventually realized that my road map had very large segments for each road. In essence, the algorithm would think that the road it was on was very long, and all of the roads connected to it were too, so none of them were under the length limits I was looking for. To fix this issue, I split all of the roads into smaller parts. How I did this, and then how the analysis was performed is as follows.

I’m working on an Ubuntu 14.04 box, but the commands should be similar for most Unix-like OSs. I assume that PostgreSQL and PostGIS are installed.

Backstory

TIGER data is projected in EPSG 4269 (for some explanation as to why, see this comment on StackOverflow. This projection measures distance in units of arc, which isn’t the most intuitive unit to work in for the type of analysis we want to do (Is walking 13-arcseconds (0.0036 degrees) a short distance? (It’s a quarter mile on the equator)). In order to just ignore all of that, I decided to use the EPSG 2272 (Pennsylvania South (ftUS)) project.

This is known as a “State Plane” and is used by most state agencies in the southern part of Pennsylvania (where Allegheny County is). The project is in feet, therefore distances can be measured in a straight-forward manner.

Creat the database

createdb topotest
echo "CREATE EXTENSION postgis;" | psql topotest

Data Gathering

First let’s grab the TIGER road map for Allegheny County (FIPS code 42003 (42 is Pennsylvania, 003 is the county in the state.)) (Fun fact, FIPS codes are in alphabetical order and often skip codes so that new ones may be added without having to lose that order or re-arrange them).

mkdir geodata
cd geodata
wget ftp://ftp2.census.gov/geo/tiger/TIGER2014/ROADS/tl_2014_42003_roads.zip
mkdir tl_2014_42003_roads 
unzip -d tl_2014_42003_roads tl_2014_42003_roads.zip
shp2pgsql -s 4269:2272 -W LATIN1 -I tl_2014_42003_roads.shp > tl_2014_42003_roads.sql
cat tl_2014_42003_roads.sql | psql topotest

For shp2pgsql, the -s 4269:2272 tells it to convert from the input SRID to the output SRID (in this case the NAD-83 projection to Pennsylvania-South). -W LATIN1 defines the encoding that the text in the shapefile’s database is in, and -I tells it to create spatial indices.

Once imported, cd .. to get back to the geodata directory.

Download the zip from the Port Authority (click “Download Port Authority GTFS files “).

mkdir pat_gtfs
unzip -d ~/geodata/pat_gtfs ~/Downloads/general_transit_CleverTripID_1409.zip

To import this data into our database, we’re going to use my fork of gtfs_SQL_importer originally from cbick as his appears abandoned and some bugs needed fixed for newer versions of PostGIS. This, boys and girls, is one of the main reasons Free Software is great!



git clone https://github.com/jimktrains/gtfs_SQL_importer.git

cd gtfs_SQL_importer/src

cat gtfs_tables.sql | psql topotest

python import_gtfs_to_sql.py ~/geodata/pat_gtfs | psql topotest

cat gtfs_tables_makespatial.sql | psql topotest

cat gtfs_tables_makeindexes.sql | psql topotest

cat vacuumer.sql | psql topotest

Cleanup the data

Use psql topotest to login to your database and get a shell.



ALTER TABLE gtfs_stops ALTER COLUMN the_geom TYPE Geometry(Point, 2272) USING ST_Transform(the_geom, 2272);

ALTER TABLE gtfs_shape_geoms ALTER COLUMN the_geom TYPE Geometry(LineString, 2272) USING ST_Transform(the_geom, 2272);

Now to break apart the road segments into smaller segments; what the following query does is read each pair of successive segments and creates records with only that simple line and the reset of the road’s metadata.



CREATE TABLE roads AS ( SELECT gid, linearid, fullname, rttyp, mtfcc, geom, ST_Length(geom) as length FROM ( SELECT gid, linearid, fullname, rttyp, mtfcc, ST_MakeLine(sp, ep) AS geom FROM ( SELECT gid, linearid, fullname, rttyp, mtfcc, ST_PointN(geom, generate_series(1, ST_NPoints(geom)-1)) as sp, ST_PointN(geom, generate_series(2, ST_NPoints(geom) )) as ep FROM ( SELECT gid, linearid, fullname, rttyp, mtfcc, (ST_Dump(geom)).geom AS geom FROM tl_2014_42003_roads ) AS lines ) AS segments ) AS mostly_there

);

CREATE INDEX roads_geom_idx ON roads USING GIST (geom);

CREATE INDEX roads_gid_idx ON roads (gid);

CREATE INDEX roads_linearid_idx ON roads (linearid);

ALTER TABLE roads ADD COLUMN id BIGSERIAL NOT NULL PRIMARY KEY;

SELECT UpdateGeometrySRID(‘roads’, ‘geom’, 2272);

Network analysis with GRASS

GRASS can be controlled fully via a command line or via the menus. If you’re new to GRASS, I would suggesst hunting around in the menus to see what features it offers. In a very helpful gesture, each menu option provides the name of the command it’s an interface to.

First, we’ll import the vector data (File > Import Vector Data > Common Input Formats [v.in.ogr]) GRASS can connect directly to PostgreSQL databases, which makes it very easy to import the data already in our database.

v.in.ogr dsn=PG:dbname=topotest layer=roads output=roads

v.in.ogr dsn=PG:dbname=topotest layer=gtfs_stops output=stops

Once loaded, the busstop layer needs to be connected to the road, otherwise you can’t get to them via the road! (Vector > Network analysis > Network maintenence [v.net])

v.net input=roads points=stops output=network operation=connect thresh=100

Additionally, lets add the stop metadata as part of the network. (Database > Vector database connections > Set vector map – database connection [v.db.connect])

v.db.connect map=network table=stops layer=2

Now for the guts of the analysis. We need to categorize all of the roads according to how far away the nearest busstop is. These are called isolines. (Vector > Network analysis > Split net [v.net.iso])

v.net.iso --overwrite input=network output=network_isolines ccats=1-10 costs=1320,2640,3960,5280,6600,7920,9240,10560 afcolumn=LENGTH abcolumn=LENGTH

The astute reader would notice two things:

  1. Our cost thresholds are quarter-mile increments
  2. There are almost 7000 busstops (categories) that were loaded, and that we’re now using 10?

The reason we’re only using 10 busstops is because running the entire network takes roughly 6.5hrs on my laptop. For testing purposes, the more-or-less instant gratification that 10 stops will provide suffices.

Visualization

If we assign the iso-lines random colors with d.vect -c network_isolines, we can get a quick glimpse of the output. (In quarter mile increments: grey, light red, dark red, green, dark green, blue, yellow, dark yellow).

As you can see, the little area we were looking at above is dark-yellow, indicating that it is in the > 2mi isoline. Now, this isn’t a very pretty visualization. Let’s export it to a shapefile and load it into QGIS. Once loaded, we’ll categorize the layer with the `cat` field and use a color ramp to show differences in distance.

Zooming in we can look at the area we’ve been focusing on.

Zooming out we can look at the entire county.

It’s plain to see that the transit network covers only the most densely populated area, visually identifiable by a high concentration of roads. However, it does not service all of these area. Further comparison with Census data would yield better results.

Slightly more zoomed in, but very large, images can be found here: North part of the County and South part of the County

Future

There are many things I would like to do with this data, but one of the firsts would be to incorporate DEM into the model so-as to take grades (which we have a lot of over in Pittsburgh) into the cost calculation.

Also, I need to ensure that bridges don’t become intersections, as that introduces a source of error into the calculations.

Another piece of information is to use the GTFS data we’ve already loaded to come up with frequencies. That would enable us to look for areas that are, for instances, not within walking distance of a bus that comes less frequently than every half-an-hour.