Table of Contents
- 1. Overview
- 2. Design & Development
- 3. Deployment
- 4. Challenges
- 5. Future Enhancements
- 6. Project Structure
- 7. References
- 8. Acknowledgements
San Francisco is a vibrant and diverse city, but like any major urban area, it faces ongoing challenges related to public safety and crime. In recent years, it has become the center of attention due to its homelessness and drug problem which has been widely covered in the mainstream media. For someone planning to move to the city, this can be intimdating and even frightening.
The San Francisco Crime Stats data pipeline and dashboard addresses this problem by ingesting, transforming, and visualizing the San Francisco Police Department's (SFPD) incident reports data set. The SFPD publishes detailed, regularly updated incident data through the city's open data portal (DataSF). This empowers a prospective resident to dig into the data and make an informed decision for themselves on if the problems related to public safety are understated or overexaggerated by the media.
- Storage: GCP Buckets, Parquet
- Data Processing: BigQuery, Python, Polars, dbt Core
- Data Visualization: Preset Cloud/Superset
- Orchestration: Mage AI
- DevOps: Terraform, Docker, GitHub, GitHub Actions
dashboard-demo.mp4
- All infrastructure is deployed via Terraform scripts
- A custom docker mage-ai docker image is pulled from Docker Hub and deployed to GCP Cloud Run via Terraform scripts
- Docker image: kdayno/sf-crime-stats
- Terraform Deployment Permissions: It was difficult identifying the correct roles that were required by the service account to create all infrastructure and deploy the docker image to Cloud Run seamlessly. This was only partially defined in the mage-ai documentation and required extensive troubleshooting to resolve missing roles.
- Additional Data Sources: The open data portal (DataSF) offers many other datasets which could be used to provide deeper analysis. For example, the Registered Business Locations - San Francisco dataset could be integrated into the solution to analyze how crime has impacted businesses in the city over time.
sf-crime-stats
|
├── docs
│ ├── deployment
│ └── images -> Images for README
├── .github
│ └── workflows -> GitHub Actions
├── Dockerfile -> Builds custom docker mage-ai image for GCP deployment
├── docker-compose.yml -> Used for local development
├── README.md
├── mage
│ └── sf-crime-stats-mage -> Mage project
│ ├── data_exporters
│ ├── data_loaders
│ ├── dbts
│ ├── markdowns
│ ├── pipelines
│ └── transformers
├── dbt
│ └── sf_crime_stats. -> dbt project
│ ├── dbt_project.yml
│ ├── macros
│ └── models
│ ├── core
│ ├── marts
│ └── staging
└── terraform -> Terraform project
├── envs
│ ├── dev
│ │ ├── keys
│ │ ├── main.tf
│ │ ├── terraform.tfvars
│ │ └── variables.tf
│ └── prod
│ ├── keys
│ ├── main.tf
│ ├── terraform.tfvars
│ └── variables.tf
└── modules
├── dbtcloud
│ ├── main.tf
│ └── variables.tf
└── gcp
├── db.tf
├── fs.tf
├── load_balancer.tf
├── main.tf
└── variables.tf
Mage AI Docs:
Preset BI Docs:
This project was built as the capstone for the Data Engineering Zoomcamp - 2024 Cohort.