Predicting NBA game outcomes and extracting win probabilities for each game based on schedule, travel and game density metrics.
The model outcomes game predictions (along with the probability associated to winning and losing). However, my main goal was to understand how different games may be affected by schedule related metrics such as mileage, rest, density, time zone shifts, etc. This information could potentially be used by teams to optimize travel plans and manage different schedule indicators during the season.
- I built an R package ({airball}) to scrape the data. {airball} provides various functions to extract schedule related metrics from public box score information.
- Data preparation. My code to clean the data and prepare it for modeling is available here.
Once the data is ready:
- To train the model I used 20 seasons of NBA data (2000-19).
- I also ran the model on 2021 season data to check its performance given some of the differences in schedule related to COVID.
Game Outcome
(model target)Distance Travelled
Distance travelled over "X" time windows for both teams in a game.Time Zone Shifts
Number of time zone shifs over "X" time windows for both teams in a game.Games Played
Games played over "X" time windows for both teams.Rest Days
Number of rest days prior to a game for both teams.Location
Home or Away.Streak
Consecutive Ws or Ls for both teams.Win %
Winning % for each team.
- This is an example of supervised learning where a XGBoost classifier was implemented. I used the {h2o} package in R to build, train and evaluate the model.
- Current model performance:
MSE: 0.1803835
RMSE: 0.4247158
LogLoss: 0.5333432
Mean Per-Class Error: 0.2758835
AUC: 0.8041708
AUCPR: 0.8072966
Gini: 0.6083416
R^2: 0.2784605
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
0 1 Error Rate
0 2495 1335 0.348564 =1335/3830
1 774 3035 0.203203 =774/3809
Totals 3269 4370 0.276083 =2109/7639
- I used SHAP values to identify feature importance, as well as to explain how different features contribute to model predictions and outcome probabilities for each observation.
- Below is an example image of how the model makes a decision for one game:
For more info on the science behind SHAP values visit this video.
- A static copy of the notebook is available here
- For access to the interactive notebook visit this link to open google colab.
- Continue improving model performance.
- Identify other potential relevant features.
- Deploy model into shiny app to provide user friendly access to predictions.