Skip to content

Latest commit

 

History

History
44 lines (32 loc) · 1.17 KB

README.md

File metadata and controls

44 lines (32 loc) · 1.17 KB

To generate proto bindings:

wget https://raw.githubusercontent.com/google/transit/refs/heads/master/gtfs-realtime/proto/gtfs-realtime.proto

protoc --python_out=. gtfs-realtime.proto

To run:

pipenv run python gtfs-realtime-to-parquet.py https://cdn.mbta.com/realtime/VehiclePositions.pb
pipenv run python gtfs-realtime-to-parquet.py https://cdn.mbta.com/realtime/TripUpdates.pb gtfs_realtime_data.parquet
pipenv run python gtfs-realtime-to-parquet.py https://cdn.mbta.com/realtime/Alerts.pb gtfs_realtime_data.parquet

To view output:

pipenv run parquet-tools show mbta-example/gtfs_realtime_data.parquet | code -

or using DuckDB:

pipenv run duckdb -c "COPY (SELECT * FROM 'mbta-example/gtfs_realtime_data.parquet') TO stdout (FORMAT 'csv', HEADER TRUE);" | code -

Comparing size on disk:

➜  du -ch mbta-example/*.pb
116K    mbta-example/Alerts.pb
824K    mbta-example/TripUpdates.pb
 72K    mbta-example/VehiclePositions.pb
1.0M    total

tar -czvf mbta-example-pbs.tar.gz *.pb
du -h mbta-example/*.gz
356K	mbta-example/mbta-example-pbs.tar.gz

➜  du -h mbta-example/*.geoparquet 
 36K    mbta-example/gtfs_realtime_data.geoparquet