GitHub - ian-cokehyeng/SparkML_on_AWS: Utilizing AWS EMR and S3, leveraged Apache Spark to train numerous ML models in parallel to predict NYC taxi demand, 7 days ahead, for a given zone and hour of the day.

Overview

This project forecasts the aggregate hourly taxi demand for New York City pick-up locations, seven days in advance. Unlike many projects using the NYC Taxi Trip dataset, this project sought to forecast city-wide demand across all taxi vendors, from 2015 - 2023 H1. This presented the critical challenge of processing 1.9 billion rows in the raw dataset to develop a predictive model.

Utilizing AWS EMR and S3, this project leveraged Apache Spark to train numerous ML models on specific groups in parallel, as opposed to a single model trained on the entire dataset. Collectively, these granular models formed a single system for all of NYC. Each model was trained to predict the demand for a specific combination of the pick-up location, hour of the day, and day of the week. The resulting system is one that can predict demand for each node (ex. JFK Airport, 10AM, Monday). Compared to the naive baseline, test MAE was reduced by 15%.

Notes

This project was developed in AWS EMR studio and makes use of a non-public personal S3 bucket. The notebooks are therefore for preview purposes only and will not run without proper modifications.

References

AWS Marketplace: New York City Taxi and Limousine Commission (TLC) Trip Record Data (accessed June 1, 2024).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
notebooks		notebooks
0_Data_Preprocessing.html		0_Data_Preprocessing.html
1_Commute_Compute.html		1_Commute_Compute.html
LICENSE		LICENSE
README.md		README.md
banner.png		banner.png
methodology.png		methodology.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Notes

References

About

Uh oh!

Releases

Packages

Languages

License

ian-cokehyeng/SparkML_on_AWS

Folders and files

Latest commit

History

Repository files navigation

Overview

Notes

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages