BaskRef is a tool to scrape basketball Data from the web.
The goal of this project is to provide a data collection utility for NBA basketball data. The collection strategy is to scrape data from https://www.basketball-reference.com. The data can then be saved into a csv to be used by a different utility.
- games & game stats (in depth stats of the games)
- players game stats
All datasets are available to be collected:
- by day (all games in one day)
- by whole season (regular + playoffs)
- by playoffs
- players meta data (Not Implemented)
- game logs (Not Implemented)
pip install baskref
# optional set logging level. Default value is INFO
export LOG_LEVEL=DEBUG # INFO, DEBUG, ERROR
Scrape all games for the 7th of January 2022.
baskref -t g -d 2022-01-07 -fp datasets
# python -c "from baskref import run_baskref; run_baskref()" -t g -d 2022-01-07 -fp datasets
Scrape all games for the 2006 NBA season (regular season + playoffs).
baskref -t gs -y 2006 -fp datasets
# python -c "from baskref import run_baskref; run_baskref()" -t gs -y 2006 -fp datasets
Scrape all games for the 2006 NBA playoffs.
baskref -t gp -y 2006 -fp datasets
# if you don't install the package
# python -c "from baskref import run_baskref; run_baskref()" -t gp -y 2006 -fp datasets
# simply add "u" to any of the three scraping types:
# g -> gu, gs -> gsu, gp -> gpu
baskref -t gu -d 2022-01-07 -fp datasets
# simply add "pl" to any of the three scraping types:
# g -> gpl, gs -> gspl, gp -> gppl
baskref -t gpl -d 2022-01-07 -fp datasets
Use proxy for scraping.
baskref -t g -d 2022-01-07 -fp datasets -p http://someproxy.com
Install requirements
pip install -r requirements.txt
This refers to the scraping functionalities.
For any mode of collection first you need to import and initialize the below classes.
from baskref.data_collection import (
BaskRefUrlScraper,
BaskRefDataScraper,
)
url_scraper = BaskRefUrlScraper()
data_scraper = BaskRefDataScraper()
# optionally you can set a proxy
proxy_url_scraper = BaskRefUrlScraper("http://someproxy.com")
proxy_data_scraper = BaskRefDataScraper("http://someproxy.com")
The BaskRefDataScraper.get_games_data returns a list of dictionaries.
Collect games for a specific day
from datetime import date
game_urls = url_scraper.get_game_urls_day(date(2022,1,7))
game_data = data_scraper.get_games_data(game_urls)
Collect games for a specific season (regular + playoffs)
game_urls = url_scraper.get_game_urls_year(2006)
game_data = data_scraper.get_games_data(game_urls)
Collect games for a specific postseason
game_urls = url_scraper.get_game_urls_playoffs(2006)
game_data = data_scraper.get_games_data(game_urls)
Collect player stats for for a specific day
from datetime import date
game_urls = url_scraper.get_game_urls_day(date(2022,1,7))
pl_stats_data = data_scraper.get_player_stats_data(game_urls)
This refers to the saving of the data.
Save a list of dictionaries to a CSV file.
import os
from baskref.data_saving.file_saver import save_file_from_list
save_path = os.path.join('datasets', 'file_name.csv')
save_file_from_list(game_data, save_path)
Run all tests with Pytest
pytest
Run all tests with coverage
coverage run --source=baskref -m pytest
coverage report --omit="*/test*" -m --skip-empty
The code base uses black for automatic formating. the configuration for black is stored in pyproject.toml file.
# run black over the entire code base
black .
The code base uses pylint and mypy for code linting.
the configuration for pylint is stored in .pylintrc file.
# run pylint over the entire code base
pylint --recursive=y ./
the configuration for mypy is stored in pyproject.toml file.
# run mypy over the entire code base
mypy baskref
- Create Virtual Environment
- You might want to use a virtual environment for executing the project.
- this is an optional step (if skipping go straight to step 2)
Create a new virtual environemnt
python -m venv venv # The second parameter is a path to the virtual env.
Activate the new virtual environment
# Windows
.\venv\Scripts\activate
# Unix
source venv/bin/activate
Leaving the virtual environment
deactivate
- Install all the dev requirements
pip install -r requirements_dev.txt
# uninstall all packages Windows
pip freeze > unins && pip uninstall -y -r unins && del unins
# uninstall all packages linux
pip freeze | xargs pip uninstall -y
- Install the pre-commit hook
pre-commit install
This section describes some of the steps when preparing a new baskref version.
- empty the dist folder
rm -rf dist/*
-
adjust the pyproject.toml file
- version
- dependencies
-
install project locally and test it
python -m build
pip install .
- install twine
pip install --upgrade twine
- publish project to test.pypi (optional)
twine upload --repository testpypi dist/*
# install from test.pypi
pip install --index-url https://test.pypi.org/simple/ baskref
- publish a new version
twine upload dist/*