- author(s): Aykut Onat
Project Objective: Follow the Kimball Lifecycle to design and develop a public, cloud-based Data Warehouse with a functioning BI Applications
Project Tools: The tools used to build this Data Warehouse were: (change this to make applicable to your project)
- For data integration - python
- For data warehousing - Google BigQuery
- For Business Intelligence - Tableau
Motivation for project:
The National Highway Traffic Safety Administration just recently has released preliminary estimates of fatal crash fatalities in the US, pointing out the largest fatalities since 2007. The early estimates underline that an estimated 38,680 people died in motor vehicle collisions in 2020. Interestingly, a recent New York Times article points at a spike in fatalities in New York City in 2020. Motor vehicle collisions are among the leading causes of major injuries and accidental death. To cushion and avoid the possible undesired consequences of accidents, more in-depth analysis is needed. The project aims at exploring the datasets chosen, identifying trends, and contributing factors to traffic accidents.
Description of the issues or opportunities the project will address:
With 3 datasets provided by NYC Open Data, which is an online repository of data published by NYC agencies, the project aims at finding answers to questions such as
- Where are the crash hotspots?
- What causes accidents the most?
- What time of the day is the safest time to be on the road?
- What type of vehicle causes accidents the most?
- What are the leading factors behind collisions?
- Which seat position is more prone to fatalities in accidents?
- What is the age distribution of people getting involved in an accident?What are the most common damage patterns in collusion?
Project Business or Organization Value:
Explore related datasets, identify trends, and contributing factors to traffic accidents happening in NYC.
According to identified trends and contributing factors leading to accidents in this project, measurement suggestions can be made to prevent traffic accidents and lower death tolls more effectively.
Data Sources:
- NYC Open Data Motor Vehicle Collisions - Crashes
- NYC Open Data Motor Vehicle Collisions – Vehicles
- NYC Open Data Motor Vehicle Collisions – Person
List of Data Warehouse KPI's:
- Number of Accidents by Year per Borough
- Seasonality of Accidents
- Effects of Seat Positions and Vehicle Types on Injuries and Fatalities
- Contributing Factors to an Accident
- Number of Accidents of Drivers Falling in Each Age Group
This project's Dimensional Model consists of (x) Facts and (y) Dimensions
List of Visualizations for each KPI:
- Number of Accidents by Year per Borough
- Grouped Bar Chart of Number of Accidents per Year by Borough
- Hot Spots for Crashes Map
- Seasonality of Accidents
- Line Graph of Accidents per Month
- Line Graph of Accidents per Week
- Line Graph of Accidents per Hourly Interval
- Effects of Seat Positions and Vehicle Types on Injuries and Fatalities
- Table of Seat Positions Leading to Injuries and Fatalities
- Table of Injuries and Fatalities per Person Type
- Stacked Bar Chart of Vehicle Types Leading to Fatalities
- Contributing Factors to an Accident
- Packed Bubble Chart of Contributing Factors to An Accident
- Number of Accidents of Drivers Falling in Each Age Group
- Tree Map of Number of Accidents of Drivers Falling in Each Age Group
The project was deployed on Tableau Public: https://public.tableau.com/app/profile/aykut5620