This project aims to deliver a series of sports analytics notebooks on Novo Basquete Brasil's (NBB) data. The NBB is Brazil's main basketball professional league. To this date, there are very few data science and analytical projects on Brazilian basketball available to the public in spite of the rise of machine learning and big data analytical tools in all economic sectors and in the world's most important sports leagues. Hence, the notebooks in this repository attempts to showcase the value that such kind of techniques could add to decision making at the NBB in a handful of areas like roster formation, practice design, performance evaluation and many others.
The dataset used in the notebooks was made through data scraped from NBB's website. To this date, NBB's data isn't easily accessible in any kind of csv
table or public database. The statistics, however, can be found in the league's website in a set of tables for each season. The scraper.py
file at /scraper
scrapes and exports each one of these tables to csv
files in the /scraper/output
folder. The cleaner.py
at /cleaner
, on the other hand, unifies every one of them in a single table, cleans it and exports it to a .csv
and a .xlsx
file. Currently, this ETL process is only available for the 2020-2021 season, but we expect to expand it to all available seasons (2008-2021).