GitHub - lealcastillo1996/Baseball-Win-Predictor-Classificator-: Will try to make a baseball win predictor

The objective of this project is to create an accurate as possible machine learning classification model for determining whether or not a baseball team is going to win a game as local.

Overview

- Initial EDA:

A dataset gathered from a famous sports website will be used to gain initial insights for this project.

Missing values handling
Temporal relationship
Numerical variables distribution
Categorical variables
Dependent variable relantionships
Correlations
Conclusions from data

- Data transformation:

In this step, new features summarizing previous games performance will be generated

- Advanced EDA:

In this step, a new analysis using the generated variables in the step will be performed

The features created now correlate better than original ones, however the correlation keeps weak. A model of classification will be constructed with this features as intances, we will try to choose a not flexible model (linear model) because correlation is not strong enough, therefore variance is big and we dont want to fall in an overfit problem

- Feature Engineering:

In this step, the created dataset will be cleaned. Secondly a a train and test set will be generated. Finally the sets numerical variables will be normalized using a standard deviation scaler.

Two scaled sets were generated and are ready to apply machine learning models on them (Train set: 8479rows, Test set: 2120 rows)

- Naive-Bayes Classificator

Naive- Bayes classicator algorithm was choosed because is among one of the best unflexible classificators when large variance is observed in data.

- Results summary

Even though the predictors didnt correlate so well with the output, a classificator with a reasonable accuracy was possible to create, especially for the True predictions of a Local Team to win. (63%)

Application of this classificator for sport betting should be analyzed since the game odds of sportbooks are so tricky and maybe a revenue with this accuracy could no be possible without a correct betting strategy.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ipynb_checkpoints		.ipynb_checkpoints
sets		sets
01_Initial_EDA.ipynb		01_Initial_EDA.ipynb
02_Data transformation.ipynb		02_Data transformation.ipynb
03_Advanced_EDA.ipynb		03_Advanced_EDA.ipynb
04_Feature_Engineering.ipynb		04_Feature_Engineering.ipynb
05_Naive-Bayes_evaluator.ipynb		05_Naive-Bayes_evaluator.ipynb
README.md		README.md
set_acum_clean.csv		set_acum_clean.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

- Initial EDA:

- Data transformation:

- Advanced EDA:

- Feature Engineering:

- Naive-Bayes Classificator

- Results summary

About

Releases

Packages

lealcastillo1996/Baseball-Win-Predictor-Classificator-

Folders and files

Latest commit

History

Repository files navigation

Overview

- Initial EDA:

- Data transformation:

- Advanced EDA:

- Feature Engineering:

- Naive-Bayes Classificator

- Results summary

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages