This blog is about my musings and thoughts. I hope you find it useful, at most, and entertaining, at least.

Quotes

#### Presence Elsewhere

jim@jimkeener.com

GitHub

BitBucket

# Stop Spacing, Part 1

Date: 2019-11-12
Tags: postgis postgres gis gtfs transit

For reasons I'll get into in another post, I wanted to have a rough idea of the distribution of the distance between stops. To do this I used the GTFS import for the Port Authority of Allegheny County GTFS that I did earlier this year when calculating the service area.

A quick note about projections. Since I had created a table with a geometry in the Pennsylvania South (ftUS) state plane( EPGS 3365), any distance calculations I do will be done in feet.

A quick, rough and dirty experiment went like this:

create table stop_dist_line as
select stop.stop_id,
stop.stop_name,
prev.stop_id as prev_stop_id,
prev.stop_name as prev_stop_name,
-- Just using euclidean distance right now.
-- This could be improved by using the on-the-street distance.
stop.the_geom <-> prev.the_geom as dist,
st_setsrid(st_makeline(stop.the_geom, prev.the_geom), 3365) as the_geom
from (
-- This gets unique pairs of stops, regardless of ordering.
select distinct least(stop_id, prev_stop_id) as prev_stop_id,
greatest(stop_id, prev_stop_id) as stop_id
-- This gets all pairs of stops. In theory there could be duplicate pairs
-- based on direction.
from (select stop_id,
lag(stop_id, 1) over stop_order prev_stop_id
from gtfs.stop_times
join gtfs.trips using (trip_id)
window stop_order as (partition by trip_id, direction_id order by stop_sequence)
) unordered_pairs
) pairs
join gtfs.stops_3365 stop on stop.stop_id = pairs.stop_id
join gtfs.stops_3365 prev on prev.stop_id = pairs.prev_stop_id;

-- Remove segments that don't have a line because there is no
-- previous stop to have a line or have a distance from.
delete from stop_dist_line where prev_stop_id is null;

-- Register the new geometry column
select Populate_Geometry_Columns();
-- Add a PK to play more nicely with QGIS
alter table stop_dist_line add id serial primary key;
-- Add an index on the geometry.
create index on stop_dist_line using gist(the_geom);


We can then get some idea of the distribution by bucketing by 100s.

select dist_group,
ttlsf,
ttl,
round(ttlsf/ttl, 2) as perc,
c
from
(select dist_group,
sum(c) over (rows between unbounded preceding and current row) as ttlsf,
sum(c) over () as ttl,
c
from (
select 100*floor(dist / 100) as dist_group,
count(*) as c
from stop_dist_line
group by dist_group
order by dist_group) x)y;


Giving

 dist_group | ttlsf | ttl  | perc |  c
------------+-------+------+------+------
0 |   149 | 7732 | 0.02 |  149
100 |   223 | 7732 | 0.03 |   74
200 |   681 | 7732 | 0.09 |  458
300 |  1568 | 7732 | 0.20 |  887
400 |  2528 | 7732 | 0.33 |  960
500 |  3579 | 7732 | 0.46 | 1051
600 |  4462 | 7732 | 0.58 |  883
700 |  5135 | 7732 | 0.66 |  673
800 |  5589 | 7732 | 0.72 |  454
900 |  5946 | 7732 | 0.77 |  357
1000 |  6234 | 7732 | 0.81 |  288
1100 |  6456 | 7732 | 0.83 |  222
1200 |  6628 | 7732 | 0.86 |  172
1300 |  6766 | 7732 | 0.88 |  138
1400 |  6852 | 7732 | 0.89 |   86
1500 |  6921 | 7732 | 0.90 |   69
...


However, because PostgreSQL is awesome, we can also just compute percentiles directly.

select percentile_cont(0.25) within group (order by dist) as p_25,
percentile_cont(0.35) within group (order by dist) as p_35,
percentile_cont(0.45) within group (order by dist) as p_45,
percentile_cont(0.50) within group (order by dist) as p_50
from stop_dist_line;


Giving

        p_25       |       p_35       |       p_45       |       p_50
------------------+------------------+------------------+------------------
438.775035648918 | 518.085656645151 | 590.352075165022 | 631.007847560984


Next steps would be to correct many of the little corners cut, e.g. not using on-the-street distance, and rerunning the calculations.