This container image provides a bridge between GTFS and GTFS-RT APIs and Apache Kafka, Azure Event Hubs, and Fabric Event Streams. The bridge reads data from GTFS and GTFS-RT APIs and writes data to a Kafka topic.
GTFS stands for General Transit Feed Specification. GTFS is a set of open data standards for public transportation schedules and associated geographic information. GTFS-RT is a real-time extension to GTFS that allows public transportation agencies to provide real-time updates about their fleet. Over 2000 transit agencies worldwide provide GTFS and GTFS-RT data.
The Mobility Database provides a comprehensive list of GTFS and GTFS-RT feeds from around the world.
The bridge fetches GTFS and GTFS-RT data from the supplied URLs and writes the data to a Kafka topic as CloudEvents in a JSON format that is documented in EVENTS.md. One bridge instance can handle multiple GTFS and GTFS-RT feeds, but they are managed under a single agency identifier that must be supplied in the configuration.
Install from the command line as
$ docker pull ghcr.io/clemensv/real-time-sources-gtfs:latest
Use as base image in Dockerfile:
FROM ghcr.io/clemensv/real-time-sources-gtfs:latest
If you want to build a full data pipeline with all events ingested into database, the integration with Fabric Eventhouse and Azure Data Explorer is described in DATABASE.md.
The container image defines a single command that starts the bridge. The bridge reads data from GTFS and GTFS-RT APIs and writes data to a Kafka topic or Azure Event Hubs or Fabric Event Streams.
The image assumes that you have a Kafka broker that is configured with TLS and SASL PLAIN authentication. You can start the container locally on docker with the following command:
$ docker run --rm \
-e KAFKA_BOOTSTRAP_SERVERS='<kafka-bootstrap-servers>' \
-e KAFKA_TOPIC='<kafka-topic>' \
-e SASL_USERNAME='<sasl-username>' \
-e SASL_PASSWORD='<sasl-password>' \
-e GTFS_RT_URLS='<gtfs-rt-urls>' \
-e GTFS_URLS='<gtfs-urls>' \
-e AGENCY='<agency-id>' \
ghcr.io/clemensv/real-time-sources-gtfs:latest
With Azure Event Hubs or Fabric Event Streams, you can use the connection string to establish a connection to the service. You get the connection string from the Azure portal, from the Azure CLI, or from the "custom endpoint" of a Fabric Event Stream.
$ docker run --rm \
-e CONNECTION_STRING='<connection-string>' \
-e GTFS_RT_URLS='<gtfs-rt-urls>' \
-e GTFS_URLS='<gtfs-urls>' \
-e AGENCY='<agency-id>' \
ghcr.io/clemensv/real-time-sources-gtfs:latest
If you want to preserve state between restarts to avoid sending that you have
already sent, you can mount a volume to the container and refer to it in the
SCHEDULE_CACHE_DIR
environment variable:
$ docker run --rm \
-v /path/to/state:/var/lib/real-time-sources-gtfs \
... other args... \
-e SCHEDULE_CACHE_DIR='/var/lib/real-time-sources-gtfs' \
ghcr.io/clemensv/real-time-sources-gtfs:latest
Azure Event Hubs-style connection string is used to establish a connection to
Azure Event Hubs or Fabric Event Streams. This connection string contains the
necessary credentials and configuration settings required for secure and
efficient data transmission. The connection string is used in place of
KAFKA_BOOTSTRAP_SERVERS
and SASL_USERNAME
and SASL_PASSWORD
.
Address of the Kafka broker. This is a comma-separated list of host and port
pairs that are the addresses of the Kafka brokers in a "bootstrap" Kafka cluster
that a Kafka client connects to initially to bootstrap itself. For example,
broker1:9092,broker2:9092
. The client assumes communication with TLS enabled
Kafka brokers.
Kafka topic to produce messages to.
Username for SASL PLAIN authentication. The client assumes that the Kafka brokers have been configured to support SASL PLAIN authentication.
Password for SASL PLAIN authentication. The client assumes that the Kafka brokers have been configured to support SASL PLAIN authentication.
Comma-separated list of GTFS-RT URLs for real-time vehicle positions and/or trip updates and/or alerts. The supplied endpoints may support any combination of the GTFS-RT feeds. The client fetches GTFS-RT data from all of these URLs.
HTTP headers to pass to the GTFS-RT endpoint. The headers are passed as
key=value pairs separated by spaces. For example, key1=value1 key2=value2
.
Values may be quoted to preserve spaces inside values. For example,
key1="value with spaces"
. The client passes these headers to the GTFS-RT
endpoint when fetching GTFS-RT data.
Comma-separated list of GTFS URLs for schedule and other reference data. The client fetches GTFS data from all of these URLs.
HTTP headers to pass to the schedule endpoint. The headers are passed as
key=value pairs separated by spaces. For example, key1=value1 key2=value2
.
Values may be quoted to preserve spaces inside values. For example,
key1="value with spaces"
. The client passes these headers to the schedule
endpoint when fetching GTFS data.
Agency identifier. The client uses this identifier to group GTFS and GTFS-RT
data under a single agency. The client includes this identifier in the CloudEvent
source
field.
Route filter. Reserved.
Schedule file cache directory. The client caches GTFS schedule files in this directory and keeps track of the last modification time of the files. This assures that data is not acquired and sent to the Kafka broker if the schedule files have not changed since the last fetch.
You can deploy the RSS/Atom bridge as a container directly to Azure Container Instances providing the information explained above. Just click the button below and go.
The following example shows how to start the bridge with a set of feeds for the Seattle metropolitan area, sending to an Event Hub.
The corresponding entries in the mobility database are:
- Puget Sound Consolidated GTFS
- King County Metro GTFS-RT Trip Updates
- King County Metro GTFS-RT Vehicle Positions
- King County Metro GTFS-RT Alerts
$ docker run --rm \
-e CONNECTION_STRING='<connection-string>' \
-e GTFS_RT_URLS='https://api-endpoint.com/gtfs-rt' \
-e GTFS_URLS='https://gtfs.sound.obaweb.org/prod/1_gtfs.zip' \
-e GTFS_RT_URLS='https://s3.amazonaws.com/kcm-alerts-realtime-prod/tripupdates.pb,https://s3.amazonaws.com/kcm-alerts-realtime-prod/alerts.pb,https://s3.amazonaws.com/kcm-alerts-realtime-prod/vehiclepositions.pb' \
-e AGENCY='KingCountyMetro' \
ghcr.io/clemensv/real-time-sources-gtfs:latest