This project implements a complete ETL (Extract, Transform, Load) pipeline and interactive dashboard using the City of Cambridge’s SeeClickFix (Commonwealth Connect) data. It gives users — both residents and city staff — insight into the most common service requests submitted and provides predictive estimates of how long requests might take to resolve.
The system automatically pulls data daily, processes it, trains a model, and updates a live Streamlit dashboard.
Residents report issues like potholes, graffiti, or missed trash pickup through the SeeClickFix app. This tool:
- Highlights the most common issues being reported.
- Displays request counts filtered by status and time range.
- Predicts the estimated time-to-close for new requests based on issue type, day, month, and time.
It helps city staff understand service delivery and gives the public visibility into government responsiveness using open data.
-Filter requests by date and status
-Visualize the top 20 issue types by volume
-Predict resolution time based on:
- Issue category
- Weekday
- Month
- Hour of day
-Map of reported issues with hover details
-Auto-scheduled daily updates using Prefect
-Lightweight, containerized app (runs locally via Docker)
- Dataset: Commonwealth Connect Service Requests
- API: Socrata Open Data API (SODA)
- CSV Endpoint:
https://data.cambridgema.gov/resource/2z9k-mv9g.csv
- Documentation: Socrata SODA API Docs
- Access: optional(but reccomended) app token stored in
.env
The transform.py
script processes and augments the raw service request data with the following steps:
- Datetime Parsing: Converts datetime columns (e.g., ticket creation and closure times) into proper datetime objects.
- Feature Engineering: Extracts additional features from datetime columns, such as:
- Year, month, day, weekday, and hour of ticket creation and closure.
- String Formatting: Creates a human-readable string version of the ticket creation timestamp for tooltips.
- Time-to-Close Calculation: Computes the time taken to close each ticket (in hours) for requests with both creation and closure timestamps.
- Database Storage: Saves the transformed data into the
service_requests_transformed
table in the SQLite database.
The train_model.py
script trains a machine learning model to predict the estimated time to close service requests. Below are the details of the model:
-
Model Type:
- The model is a Random Forest Regressor.
-
Features Used:
- Categorical Features:
- Issue category (e.g., "Missed Recycling Pickup").
- Weekday of ticket creation (e.g., "Monday").
- Numerical Features:
- Month of ticket creation (e.g., January = 1).
- Hour of ticket creation (e.g., 14 for 2 PM).
- Categorical Features:
-
Target Variable:
- The model predicts the time to close a service request in hours, calculated as the difference between ticket closure and creation timestamps.
-
Training Process:
- The data is split into training and testing sets (80% training, 20% testing).
- The model is trained on the training set and evaluated using the R² score on the test set.
-
Model Storage:
- The trained model is saved as a
.joblib
file (time_to_close_model.joblib
) for use in the Streamlit dashboard.
- The trained model is saved as a
-
Usage in Dashboard:
- Users can input features (e.g., issue category, weekday, month, hour) in the dashboard to get a prediction of the estimated time to close a new service request.
your-project-folder/
├── .env # stores API token securely
├── Dockerfile # builds app container
├── docker-compose.yml # launches Prefect and Streamlit services
├── extract.py # downloads latest data from Socrata API
├── transform.py # processes and augments data
├── train_model.py # trains a predictive ML model
├── prefect_flow.py # defines and schedules the ETL pipeline
├── app.py # Streamlit app for UI & analysis
├── requirements.txt # Python package dependencies
git clone https://github.com/leegianetti/dacss690a_finalproject.git
cd your-repo-link #insert actual folder that will be used.
Add Socrata app token (get one here):
SODA_APP_TOKEN=your_token_here
If not already installed:
- Docker: https://docs.docker.com/get-docker/
- Docker Compose: https://docs.docker.com/compose/install/
docker-compose up --build
- This will launch:
prefect_etl
: runs ETL + model training scriptstreamlit_app
: serves dashboard at http://localhost:8501
- The ETL pipeline is orchestrated using Prefect 2.x
- Defined in
prefect_flow.py
with a daily schedule - Automatically:
- Extracts the latest data
- Transforms it
- Trains a machine learning model
- Scheduling is handled via Prefect’s
IntervalSchedule
- Add filters for geographic area (e.g., neighborhood or zip code)
- Connect to a scalable database like PostgreSQL
- Enhance the model using more predictive features (e.g., service department)
- Track year-over-year trends or seasonality
- Include detailed service performance reports for city leadership
- Export insights as PDF or CSV for public use
- Deploy the app to the cloud (e.g., AWS, Prefect Cloud)
MIT License
Lee Gianetti