thesis code
EAS 499, Senior Capstone Thesis Andrew Cui Advisor: Dr. Shane T. Jensen
We use these models in a predictive analysis of Major League Baseball games, extracting data from Retrosheet logs and performing extensive data wrangling, preprocessing and feature engineering to identify smart covariates to use. We targeted binary classification of whether a game would be won by the home team or not.
Overall, the logit elastic net model scored an accuracy of 61.77%, exceeding our naive classifiers and many examples from the literature. This repository details the code bank that was used in analysis, including relevant charts and graphics used.
Further detail about the analytical approach can be found in the paper itself. Please direct questions to Andrew Cui (andrewc@seas.upenn.edu)