-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpolated segments for rt_segment_speeds #1084
Conversation
nbviewer URLs for impacted notebooks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think all those suggestions look good and fit well with column naming in gtfs-segments
. Overall, I think there's a good outline of the pieces needed for standardizing the pipeline. Thanks for checking it out!
- I like the rough check of just <1,000 m and filter those to go through more postprocessing
- When you say:
column changes: stop_sequence increment proportional to segment distance within arbitrary stop sequence increment
...this would show likestop_sequence=1.75
situated betweenstop_sequences 1 and 2
, but maintain a numeric column?- follow-up: this is still a TODO, with some notes in
31_interpolated_segments_with_new_stops.ipynb
.- I think my idea of a numeric
stop_sequence1
might be unnecessarily complicated. - I'm not sure if just counting the segment
0
,1
,2
will be sufficient downstream if you want to aggregate across trips. - Having
stop_sequence=2
andnext_stop_sequence=2
might not be enough either to distinguish between the segments, and is the same as the above option.
- I think my idea of a numeric
- follow-up: this is still a TODO, with some notes in
- Can I play with how to expand the rows with the
gdf.explode()
. The functions so far don't look like it could use this create segments and this explode_segments function, but maybe it lends itself to it if I can see the df itself and see the groupings. At any rate, nodask
needed, since the segments are cut withpandas + gtfs_segments
.
follow-up finding: yes, the 2 functions ingeography_utils
gets the same results, statewide, under 3 min. It does thegdf.explode
as you want - I can clean up a notebook and check it in next week. It's like a toy example of what the dfs should look like at each stage, and I used it for road segments to see if it'll create errors in future steps like
nearest_neighbor
,stop_arrivals
, andspeeds
.- follow-up finding: In
32_nearest_neighbor_setup.ipynb
, I tested the immediate connection to nearest neighbor. The subsequent steps are less likely to error once the setup is the same. It all works as of now, so fingers crossed for the remaining steps.
- follow-up finding: In
Thanks for the review and the examples! Busy with SB125 today/tomorrow, but I'll dig into it more later this week.
Short answer: yes. Long answer: there's two things going on here: So first of all, by going to a float we can continue to make it possible to sort on By proportional, I thought it would be cool if within that increment, stop_sequence increased by an amount proportional to distance (the number of 1000m interpolated segments). So if stop_sequence originally goes from 1 to 2, and there's a 10km gap between stops (10 interpolated segments), the new stop_sequence would be something like 1, 1.1, 1.2, 1.3, 1.4... , 2 (adding 1/10 each time) If stop_sequence originally goes from 24 to 67 and there's, say, a 3km gap between stops, the new stop_sequence would be something like 24, 38.33, 52.66, 67. (adding 43/3 each time). That way you can not only sort in the correct order, but roughly understand how far between the two stops an interpolated segment is, regardless of the actual stop_sequence increment or distance. |
@edasmalchi: Ok! Taking everything into consideration:
I think we can both merge PRs in, and you can find the files I left for doing a full run for Mar 2024 for all operators. |
Thanks so much @tiffanychu90!! Merging both, and I'll look at the files from your run later this week 🙂 |
Adds interpolated segments (between stops where stop spacing >1km) to rt_segment_speeds pipeline.
Currently in draft form for review before moving to scripts.
Proposed Upstream Changes:
Should probably happen upstream in
cut_stop_segments.py
, related scripts...length
: float,geometry.length
next_stop_sequence
: lead ofstop_sequence
, should include final stop seq (final stop seq unavailable here since shifting from existing df...)stop_sequence
->stop_sequence1
and addstop_sequence2
(consistent with existingstop_id1
andstop_id2
)@tiffanychu90 let me know if those look doable and if you'd like to add those or have me give it a try.
This script (to be based on notebook)
split_distance
,process_exploded
,store_new_geoms
,lookup_geom
)stop_sequence
increment proportional to segment distance within arbitrary stop sequence incrementsegment_id
postfix _(int) per segment to maintain uniquenessgdf.explode()
but post-processing requires that the gdf remains in order and seemed hard to do with Dask@tiffanychu90 curious to know your general thoughts/feedback. Also happy to find time to pair next week!
If this integrates well with the rest of the pipeline the goal would be to replace the current stop segments product with this version adding interpolated segments, allowing us to retire much of the old speedmaps
rt_delay
code and fully align speedmaps + open data...