About Me!

This blog is about my musings and thoughts. I hope you find it useful, at most, and entertaining, at least.

Résumé [PDF]

Other Pages

Quotes

Links

Presence Elsewhere

jim@jimkeener.com

GitHub

BitBucket

Stop Spacing, Part 1

Date: 2019-11-12
Tags: postgis postgres gis gtfs transit

For reasons I'll get into in another post, I wanted to have a rough idea of the distribution of the distance between stops. To do this I used the GTFS import for the Port Authority of Allegheny County GTFS that I did earlier this year when calculating the service area.

A quick note about projections. Since I had created a table with a geometry in the Pennsylvania South (ftUS) state plane( EPGS 3365), any distance calculations I do will be done in feet.

A quick, rough and dirty experiment went like this:

create table stop_dist_line as 
select stop.stop_id,
       stop.stop_name,
       prev.stop_id as prev_stop_id,
       prev.stop_name as prev_stop_name,
       -- Just using euclidean distance right now.
       -- This could be improved by using the on-the-street distance.
       stop.the_geom <-> prev.the_geom as dist,
       st_setsrid(st_makeline(stop.the_geom, prev.the_geom), 3365) as the_geom
from (
  -- This gets unique pairs of stops, regardless of ordering.
  select distinct least(stop_id, prev_stop_id) as prev_stop_id,
                  greatest(stop_id, prev_stop_id) as stop_id
  -- This gets all pairs of stops. In theory there could be duplicate pairs
  -- based on direction.
  from (select stop_id,
               lag(stop_id, 1) over stop_order prev_stop_id
        from gtfs.stop_times
        join gtfs.trips using (trip_id)
        window stop_order as (partition by trip_id, direction_id order by stop_sequence)
  ) unordered_pairs
) pairs
join gtfs.stops_3365 stop on stop.stop_id = pairs.stop_id
join gtfs.stops_3365 prev on prev.stop_id = pairs.prev_stop_id;

-- Remove segments that don't have a line because there is no
-- previous stop to have a line or have a distance from.
delete from stop_dist_line where prev_stop_id is null;

-- Register the new geometry column
select Populate_Geometry_Columns();
-- Add a PK to play more nicely with QGIS
alter table stop_dist_line add id serial primary key;
-- Add an index on the geometry.
create index on stop_dist_line using gist(the_geom);

We can then get some idea of the distribution by bucketing by 100s.

select dist_group,
       ttlsf,
       ttl,
       round(ttlsf/ttl, 2) as perc,
       c
from
  (select dist_group,
          sum(c) over (rows between unbounded preceding and current row) as ttlsf,
          sum(c) over () as ttl,
          c
   from (
      select 100*floor(dist / 100) as dist_group,
             count(*) as c
      from stop_dist_line
      group by dist_group
      order by dist_group) x)y;

Giving

 dist_group | ttlsf | ttl  | perc |  c
------------+-------+------+------+------
          0 |   149 | 7732 | 0.02 |  149
        100 |   223 | 7732 | 0.03 |   74
        200 |   681 | 7732 | 0.09 |  458
        300 |  1568 | 7732 | 0.20 |  887
        400 |  2528 | 7732 | 0.33 |  960
        500 |  3579 | 7732 | 0.46 | 1051
        600 |  4462 | 7732 | 0.58 |  883
        700 |  5135 | 7732 | 0.66 |  673
        800 |  5589 | 7732 | 0.72 |  454
        900 |  5946 | 7732 | 0.77 |  357
       1000 |  6234 | 7732 | 0.81 |  288
       1100 |  6456 | 7732 | 0.83 |  222
       1200 |  6628 | 7732 | 0.86 |  172
       1300 |  6766 | 7732 | 0.88 |  138
       1400 |  6852 | 7732 | 0.89 |   86
       1500 |  6921 | 7732 | 0.90 |   69
       ...

However, because PostgreSQL is awesome, we can also just compute percentiles directly.

select percentile_cont(0.25) within group (order by dist) as p_25,
       percentile_cont(0.35) within group (order by dist) as p_35,
       percentile_cont(0.45) within group (order by dist) as p_45,
       percentile_cont(0.50) within group (order by dist) as p_50
from stop_dist_line;

Giving

        p_25       |       p_35       |       p_45       |       p_50
------------------+------------------+------------------+------------------
 438.775035648918 | 518.085656645151 | 590.352075165022 | 631.007847560984

Next steps would be to correct many of the little corners cut, e.g. not using on-the-street distance, and rerunning the calculations.