This repo contains python notebooks and code to analyze NYPD records of bike crashes. The goals was to do some basic EDA to develop an understanding of when/why/where crashes occur, and then to build a predictive model for bike crashes.
A brief report summarizing the findings of the analysis can be found in markdown format here.
- Clone the repo:
git clone https://github.com/mhalvers/nyc_bike_crash_analysis.git
- Run
make
to download the most recent crash data. Here is whatmake
will do:- Run
retrieve_nyc_crashes_soda.py
to download all available data into a csv file using sodaypy, a python client for the Socrata Open Data API. - (Optional) Downloading works best when the data request is made with a user-specific token (strict throttling is removed). A token can be obtained by registering here with Socrata: https://data.cityofnewyork.us/signup.
- (Optional) Store the token in an environment variable called
SODAPY_APPTOKEN
with the following command:export SODAPY_APPTOKEN=<token>
. Better yet, place this into your.bash_profile
or.bashrc
.
- Run
- Or, simply run
retrieve_nyc_crashes_soda.py
. Command line options are required, type./retrieve_nyc_crashes_soda.py --help
for help. - Open
NYC_bike_crash_summary_stats.ipynb
to read the data from the csv output and explore.
The data were obtained from the NYC OpenData service. The primary data set analyzed here was filtered from the Motor Vehicle Collisions - Crashes page. I filtered the data with the condition that "bike" or "bicycle" must appear somewhere in each row.
Each row contains a crash event. The NYPD must file a police report if there is a death, injury, or significant property damage (>$1000). The rows are described in this spreadsheet.