This repo contains jupyter notebooks and code I submitted to a hackathon I participated in for an undisclosed NFL team.
Hackathon Goal: Use NFL's next-gen player tracking data (in-game player xyz coordinates at 0.1s refresh rate) to develop useful insights for undisclosed NFL team in coaching or player scouting.
Personal Motivation: I participated in this hackathon to get hands-on experience working with massive, highly complex data in the AWS Cloud environment, as well as sharpen my business and analytical skills in developing a valuable use case for the data.
Technical Environment: Python (PySpark, Pandas, NumPy, MatPlotLib), AWS S3, Sagemaker, Athena
Use Case: Develop a holistic measure of offensive linemen performance based on how often they allowed their mark to beat them and pressure the QB.
Challenges:
- Highly denormalized and complex data
- Total lack of football knowledge
- Time needed to familiarize with AWS Environment
How to Navigate Repo:
- Solution is detailed in "Code" folder in Jupyter Notebooks
- Snippets of original NFL tracking data is available in "SampleInputData" folder
- Code includes an ETL script which loads NFL tracking data into usable data for analysis, found in "ETLOutput" folder