As one of the most populous cities in the United States, New York City witnesses millions of taxi trips every month. This project aims to conduct a quantitative analysis of the New York City Taxi and Limousine Commission (TLC) trip record data to gain a better understanding of it. Additionally, we aim to provide recommendations that might improve taxi drivers' income.
- Language: Python 3.8.8
- Python Packages / Libraries: pandas, geopandas, numpy, matplotlib, seaborn, scipy, sklearn, statsmodels, contextily
- NYC TLC Dataset (Jan, Feb, Jul, Aug of 2018)
- NYC Taxi Zone Shapefile
- NYC Central Park Weather Record (2018)
- To download NYC TLC datasets please locate
download.ipynb
included incode
, Taxi Zone Shapefile and weather dataset has been included inraw_data
.
raw_data
: Contains all the raw data files. Added to.gitignore
preprocessed_data
: Contains all the preprocessed data files. Added to.gitignore
plots
: Contains all visualisation plot for the project.deprecated
: Contains all the old code that I don't use anymore.code
: Contains notebooks for Preprocessing, Visualisation, and Modelling.download.ipynb
for "Downloading" trip record datasets.preprocessing.ipynb
for "Preprocessing" and "Exploratory Data Analysis".visualisation.ipynb
for "Analysis and Visualisation".modelling.ipynb
for "Statistical Modelling".
- To reproduce the results, simply download all the dataset and run each notebook.