Skip to content

predicting major league baseball games using logit regression - eas 499, sp2020

Notifications You must be signed in to change notification settings

danielkmetz/mlb-game-prediction

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mlb-game-prediction

thesis code

University of Pennsylvania

EAS 499, Senior Capstone Thesis Andrew Cui Advisor: Dr. Shane T. Jensen


We use these models in a predictive analysis of Major League Baseball games, extracting data from Retrosheet logs and performing extensive data wrangling, preprocessing and feature engineering to identify smart covariates to use. We targeted binary classification of whether a game would be won by the home team or not.

Overall, the logit elastic net model scored an accuracy of 61.77%, exceeding our naive classifiers and many examples from the literature. This repository details the code bank that was used in analysis, including relevant charts and graphics used.

Further detail about the analytical approach can be found in the paper itself. Please direct questions to Andrew Cui (andrewc@seas.upenn.edu)

About

predicting major league baseball games using logit regression - eas 499, sp2020

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.6%
  • Other 0.4%