Open up python, ipython or jupyter notebook from the root directory of this project, then run
> from nnnba import *
> nnnba = NNNBA()
There are 8 models available:
The default model (which can be set by nnnba.default_model_type
) is lasso.
To find undervalued players, run
> nnnba.getMostValuablePlayers()
where model_type
is one of the models described above.
To find undervalued players, run
> nnnba.getUndervalued()
where model_type
is one of the models described above.
To find a player's value in each model, run
> nnnba.getPlayerValue(player_name)
where model_type
is one of the models described above, and player_name
is the player's name
> nnnba.getPlayerValue("Giannis Antetokounmpo")
The data is gathered from three different places: NBA.com Stats, Basketball Reference, and hoopshype.
NBA.com was scraped using nba_py to gather player statistics (including advanced stats, misc stats, etc.). Basketball Reference was scraped using basketballcrawler to gather player age, current salary, etc. Then hoopshype was scraped to gather players' future salary. As Basketball Reference doesn't care to be scraped often, the information is saved in players.json. It's read when prepare_data
is run, and combined with the nba_py and hoopshype data, then the data is stored in raw_data.json
.
As it turns out, a lot of the columns of data needed to be removed. Players who has played less than 15 games are removed as their high stats skewed the models.
Each model uses their stats as an input, and salary as output. The idea is to fit the model to each player stats, and predict their value. The output is scaled from the min to the max contract price for 2017-18 season. An average is also done (and considered to be a separate model), which averages Bayes Ridge, Lasso, and ElasticNet output equally.
Some of the methods used comes with coefficients that explains how the model works, and why specific players are ahead of others.
For example, Ridge seems to favor Personal Fouls Drawn and Free Throws Made, so players who draws fouls and makes a lot of free throws would be deemed to be worth more. Thus, DeRozan (for all his ability to draw fouls and get to the line) is considered to be the most valuable.
Linear Regression seem to value FG3M and FGM, so players with volume would be considered more valuable.
By comparing their calculated worth to their future salary, it is possible to find undervalued players as well. As the most valuable player is dependent on the model, the undervalued players are dependent on the model as well.
It is important to note that this only analyzes stats based on past year performance, which is very isolated. It doesn't take into account team strength (though many models take wins into account), and potential. For example, Curry and Durant would have better stats on separate teams, and although their stats are still very impressive, the models don't take into account their stats are lowered than what it could've been. Therefore, their salary value is lowered.
If you want to contribute, please see CONTRIBUTING guidelines.