A production-grade Formula 1 prediction system designed for accuracy and ease of use, featuring a modular architecture, advanced machine learning, and a streamlined workflow.
Here is the complete workflow for fetching data, training models, and making predictions.
Download the latest F1 data, including race results, qualifying, and weather information. The system is smart and will only fetch new or missing data.
python scripts/predict.py fetch-dataTo force a complete refresh of all data (recommended for the first run), use the --force flag:
python scripts/predict.py fetch-data --forceTrain the qualifying and race prediction models using the latest data. This step only needs to be run after you have fetched new historical data.
python scripts/predict.py trainThe CLI is designed for ergonomics and reproducibility.
# Qualifying or Race (single event)
python scripts/predict.py predict --year 2024 --race "Italian Grand Prix" --session qualifying
python scripts/predict.py predict --year 2024 --race "Italian Grand Prix" --session race
# Scenarios for race predictions via --mode
# - pre-weekend: no qualifying used
# - pre-quali : use predicted qualifying only
# - post-quali : require actual qualifying
# - live : auto-refresh until actual qualifying appears (then exit)
# - auto : use actual when available, else predicted (default)
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode auto
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode pre-weekend
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode post-quali
# Next event shortcut and current season default
python scripts/predict.py predict --race next --session race --mode auto --season-current
# Batch runs (entire season or specific rounds)
python scripts/predict.py predict --year 2024 --season all --session race
python scripts/predict.py predict --year 2024 --session race --rounds 1-10
python scripts/predict.py predict --year 2024 --session race --rounds 1,3,5-7
# Watch mode (auto-refresh every N seconds until quali is published)
python scripts/predict.py predict watch --year 2025 --race "Italian Grand Prix" --interval 300
# Live mode via --mode (single event)
python scripts/predict.py predict --year 2025 --race "Italian Grand Prix" --session race --mode live --interval 300
# Simulations (future seasons / what-ifs)
# Provide an optional custom lineup CSV with at least columns: Driver, Team
python scripts/predict.py predict simulate --year 2026 --race "Italian Grand Prix" --session race --lineup path/to/lineup.csv-
Qualifying only
-
Use when you are tuning qualifying rankers (faster iteration) or you changed quali-specific config.
-
Command:
python scripts/predict.py train --model-type qualifying
-
-
Race only
-
Use when you are iterating on race model logic (e.g., DNF integration, race features) or changed race config.
-
Command:
python scripts/predict.py train --model-type race
-
-
All (qualifying + race)
-
Use after significant changes to features/config, after fetching a lot of new data, or before a new season.
-
Command:
python scripts/predict.py train # same as --model-type all
-
When to retrain:
- After
fetch-datapulls new seasons/rounds you want reflected in training - After changing feature engineering or config defaults (e.g., ensemble metrics, ranker overrides)
- After upgrading dependencies or model versions
Optional hyperparameter tuning and ensembles:
- In
configs/default.yaml:hyperparameter_optimization.enabled: truehyperparameter_optimization.objective_metric: spearman# good for rankinghyperparameter_optimization.n_trials: 100(increase if time allows)models.ensemble_config.target_metric: spearman(orndcg)
Notes:
- HPO runs for non-ranker models by default. To benefit during qualifying, consider adding non-ranker base models or manually override ranker params under
models.qualifying.overrides.
-
pre-weekend(race)-
No actual quali available. Use to get early race outlook from historical/context.
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode pre-weekend
-
-
pre-quali(race)-
Right before qualifying; still no actual quali. Similar to pre-weekend but closer in time.
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode pre-quali
-
-
post-quali(race)-
After qualifying; uses actual quali if available to refine race positions.
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode post-quali
-
-
live-
Auto-refreshes predictions periodically until actual qualifying is detected, then exits.
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode live --interval 300
-
-
auto(default)-
Uses actual qualifying when present; otherwise falls back to predicted quali.
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode auto
-
-
Fresh setup (or start of a season):
python scripts/predict.py fetch-data --force python scripts/predict.py train # trains qualifying + race python scripts/predict.py predict list-schedule --year 2025 # copy exact EventName python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode auto
-
Race week – pre-quali planning:
python scripts/predict.py fetch-data # retrain only if you changed features/config or fetched a lot of new historical data # python scripts/predict.py train --model-type qualifying python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session qualifying python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode pre-quali
-
After qualifying – finalize race outlook:
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode post-quali -
Model tuning loop (faster iterations):
# Edit configs/default.yaml # - Set hyperparameter_optimization.enabled: true (optional) # - Set objective_metric / ensemble target_metric to spearman or ndcg # - Override ranker params under models.<target>.overrides as needed python scripts/predict.py train --model-type qualifying python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session qualifying # Repeat for race when satisfied python scripts/predict.py train --model-type race python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode auto
Utilities:
-
List official event names for a season:
python scripts/predict.py predict list-schedule --year 2025
-
Simulate future seasons or custom lineups:
python scripts/predict.py predict simulate --year 2026 --race "Italian Grand Prix" --session race --lineup path/to/lineup.csv
- Modular Workflow - Separate, independent commands for fetching data, training, and predicting.
- Prediction Scenarios -
--modesupports pre-weekend, pre-quali, post-quali, live, and auto. - Next Event Shortcuts - Use
--race nextand--season-currentto target the upcoming race this year. - Batch Runs - Run entire seasons or ranges of rounds in a single command.
- Watch Mode -
predict watchauto-refreshes until actual qualifying appears. - Simulations - Future-season what-ifs with
predict simulateand custom lineups. - Intelligent Predictions - Automatically uses post-qualifying data (auto mode) when available.
- Advanced Feature Engineering - Creates over 100 F1-specific features, including weather and strategy factors.
- Ensemble Models - Combines LightGBM and XGBoost (including ranking variants) for robust predictions.
- Configuration-Driven - All settings are managed in a central
configs/default.yamlfile.
f1_prediction_project/
├── src/
│ └── f1_predictor/ # Core prediction package
│ ├── data_loader.py
│ ├── feature_engineering_pipeline.py
│ ├── model_training.py
│ ├── prediction.py
│ └── ...
├── scripts/
│ └── predict.py # Main entry point for all commands
├── configs/
│ ├── default.yaml # Central configuration file
│ └── local.yaml.example
├── artifacts/
│ ├── models/ # Trained model artifacts (.pkl)
│ ├── predictions/ # Prediction outputs (.csv)
│ ├── evaluation/
│ ├── reports/
│ └── visualizations/
├── data/
│ ├── raw/ # Raw F1 data
│ └── cache/ # FastF1 and features caches
├── logs/ # System logs
# 1. Clone the repository
git clone https://github.com/yourname/f1_prediction_project.git
cd f1_prediction_project
# 2. Create a virtual environment and install dependencies
python -m venv .venv
# On Windows: .\.venv\Scripts\Activate.ps1
# On macOS/Linux: source .venv/bin/activate
pip install -r requirements.txtPython 3.10+ is recommended.
The entire system is controlled via configs/default.yaml (single source of truth). You can provide a gitignored configs/local.yaml for environment-specific overrides and secrets. Values support environment variable interpolation using ${VAR} or ${VAR:-default}.
Examples:
data_collection:
external_apis:
openweathermap_api_key: "${OPENWEATHER_API_KEY:-}"Create configs/local.yaml to override any setting locally (do not commit):
data_collection:
external_apis:
openweathermap_api_key: "${OPENWEATHER_API_KEY}"Strict feature enforcement at inference is enabled by default via general.strict_feature_enforcement: true. Models persist their training feature list, imputation values, config hash, model_version, and git commit in artifacts/models/*_model.metadata.json.
Every prediction writes two files into artifacts/predictions/:
<type>_predictions_<year>_<race>_<timestamp>.csv<type>_predictions_<year>_<race>_<timestamp>.meta.json
The .meta.json includes:
model_type,year,race_name,generated_atmodel_version(fromconfigs/default.yaml)scenario(e.g.,pre-weekend,pre-quali,post-quali,live,auto, orsimulate)features_used(training-time feature list, when known)config_hash(stable hash of the merged configuration for full traceability)