This AI system utilizes machine learning techniques to make predictions about NBA game outcomes, including identifying potential winners and predicting whether games will have over/under scores. The system analyzes data from all teams in the NBA dating back to the 2011-12 season, and combines this information with odds for current games to generate predictions using a neural network model. The system achieves an accuracy rate of approximately 57% for moneyline bets and approximately 49% for under/over bets. In addition to identifying winners and losers, the system also provides insight into the expected value of moneyline bets for individual teams.
- Home Team ["Home_Team"]
- The team who is hosting the basketball game at their home stadium.
- Away Team ["Away_Team"]
- The team who is traveling to the home team's stadium to play the game.
- Game Total Points ["Game_Total_Points"]
- The total amount of points scored in a game: Home team's total points PLUS away team's total points.
- Over/Under ["Over_Under"]
- A group of people who work for the NBA decide on the most likely amount of game total points that any given game will have BEFORE the game is played.
- A fan would use this to place a bet on whether they believe the actual game total points will be over or under the amount that the group of people predicted.
- Spread ["Spread"]
- Every game, one team will be favored to win. The spread is the amount of total points that the team which is favored to win is predicted to win by over the other team.
- Moneyline ["Moneyline_Home_Team", "Moneyline_Away_Team"]
- The same group of people as above predicts which team is going to outright win the game. A positive moneyline score means that the team is NOT predicted to win and a negative moneyline score means that the team is favored to win that particular game.
- Win Margin ["Win_Margin"]
- The amount of points that the home team won or lost the game by (not a prediction, this is the actual true data taken after the game ends).
- A positive number means the home team won by that many points and a negative number means the home team lost by that many points.
- NBA Sports Betting ML
- 10 seasons of data from 2011 to 2021
- Hidden Layers: 3
- Hidden Layer Activation: LeakyReLU
- Epochs: 80
- Optimizer: Adam
- Outer Layer Activation: Linear
- Loss: Mean Squared Error (MSE)
The 'accuracy vs loss' graph displays the blue line for accuracy and the orange line for loss, which are both behaving appropriately. The accuracy line is leveling out to 1, while the loss line is leveling out to 0. It is normal to observe a few spikes in the graph.
Onto the second graph 'accuracy vs validation accuracy'. Accuracy refers to the model's performance on the training dataset, which is the dataset used to train the model. It measures how well the model fits the training data and is optimized for the training set.
On the other hand, validation accuracy measures the model's performance on a dataset that the model has not seen during training. This dataset is typically used to evaluate the model's ability to generalize to new data, and it is important for determining if the model is overfitting or underfitting.
As you can see our model does a pretty good job at predicting who will win in the future games. The accuracy and validation accuracy follows each other fairly closely.
- 5 seasons of data from 2016-2021
- Hidden Layers: 3
- Hidden Layer Activation: LeakyReLU
- Epochs: 130
- Optimizer: Adam
- Outer Layer Activation: Linear
- Loss: MSE
It's not surprising to see that the second model, which uses less data, doesn't perform as well as the first model. This is because having more data usually allows the model to learn more patterns and generalize better to new data.
When a model is trained on a smaller dataset, it is more likely to overfit to the training data, meaning it may perform well on the training data but not generalize well to new, unseen data. This is likely why the second model is showing less accuracy and less correlation between the loss and validation accuracy.
Though the performance isn't as good as the first model this model is still working well with the data it has.
- 10 seasons of data from 2011 to 2021
- Hidden Layers: 4
- Hidden Layer Activation: LeakyReLU
- Epochs: 100
- Optimizer: Adam
- Outer Layer Activation: Linear
- Loss: MSE
The third model has a very similar outcome as the first model. The first model had a loss of .0796 and an accuracy of .4771 while the third model has a loss of .0990 and an accuracy of .4787. The accuracy and validation accuracy are performing accurately together.
It's great to hear that the third model has a similar outcome to the first model, despite having some differences in the loss and accuracy values. The first model had a loss of (.0796) and an accuracy of (.4771) while the third model has a loss of (.0990) and an accuracy of (.4787).
The fact that the accuracy and validation accuracy are both high and close to each other indicates that the model is not overfitting to the training data and is able to generalize well to new data. The loss value is also reasonable, as it indicates how well the model is fitting the data.
Overall, it's great that the third model is showing promising results, and it may be worth further exploring its performance on new data or tweaking some of its parameters to see if it can be improved even further.
- 5 seasons of data from 2016 to 2021
- Hidden Layers: 4
- Hidden Layer Activation: LeakyReLU
- Epochs: 60
- Optimizer: Adam
- Outer Layer Activation: Linear
- Loss: MSE
While model 4 may not have performed as well as model 1 and model 3 in terms of loss, its loss is still quite low and may be considered acceptable for the task at hand. Additionally, it's worth noting that sometimes a model with a slightly higher loss may still perform better on unseen data, as it may have learned to generalize better.
- 10 seasons of data from 2011 to 2021
- Hidden Layers: 3
- Hidden Layer Activation: LeakyReLU
- Epochs: 80
- Optimizer: Adam
- Outer Layer Activation: Linear
- Loss: Mean Squared Error (MSE)
- 5 seasons of data from 2016-2021
- Hidden Layers: 3
- Hidden Layer Activation: LeakyReLU
- Epochs: 130
- Optimizer: Adam
- Outer Layer Activation: Linear
- Loss: MSE
- 10 seasons of data from 2011 to 2021
- Hidden Layers: 4
- Hidden Layer Activation: LeakyReLU
- Epochs: 100
- Optimizer: Adam
- Outer Layer Activation: Linear
- Loss: MSE
- 5 seasons of data from 2016 to 2021
- Hidden Layers: 4
- Hidden Layer Activation: LeakyReLU
- Epochs: 60
- Optimizer: Adam
- Outer Layer Activation: Linear
- Loss: MSE
Based on the validation split versus accuracy plot, it is evident that plot 1 outperforms the other plots with its exceptional performance. Plot 1 demonstrates a close alignment between validation accuracy and the accuracy line. While plot 2 has some initial overfitting, it shows a good recovery. Plot 3 initially overfits, followed by a recovery, and then starts to underfit. Plot 4 also has some initial overfitting but shows a good recovery. However, overall, plot 1 emerges as the winner in this comparison.
If you would like to clone the repository, type "git clone https://github.com/kheller18/nba-sports-betting-ml.git". In the terminal, with the conda dev environment activated, install the following packages and dependencies before running the NBA Machine Learning application. To understand how to install these, refer to the Usage
-
csv - Used to store all of our data
-
Jupyter Lab - version 3.4.4 - Used to create and share documents that contain live code, equations, visualizations and narrative text.
-
pandas - For the analysis of NBA data.
-
pathlib - version 1.0.1 - This was used to locate through the directory or file path.
-
TensorFlow 2.0 - An end-to-end machine learning platform
-
Keras - Keras is a popular deep learning framework that serves as a high-level API for TensorFlow
-
Scikit-Learn - Tools for data predictions and analysis
-
NumPy - Package for scientific computing
After cloning the repository locally, you'll need to have the packages listed in Installation installed on your machine. To do so, you'll need to activate your conda dev environment and running the following commands:
```
pip install pandas
pip install jupyterlab
pip install --upgrade tensorflow
pip install -U scikit-learn
pip install numpy
```
After all of these are installed, please refer to the Deployment section for instructions on how to view or edit the notebook.
MIT License
Copyright (c) 2023 Keenan Heller | Olga Ortega | Audell Sabeti | Ariana Moreno | Rachel Hodson
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
- There are currently no tests associated with this project.
- There is currently no live deployment of this notebook on a common server, but the user has the ability to run this notebook locally on their machine via:
Jupyter Lab
: Navigate to the root directory and type "jupyter lab NN_Win_Margin.ipynb" for the Moneyline models and "jupyter lab NN_Over_Under.ipynb" for Over/Under models.Google Colab
: For Moneyline models, open "GC_NN_Win_Margin.ipynb" and click "Open in Colab". For Over/Under models, open "GC_NN_Over_Under.ipynb" and click "Open in Colab".