F1 Prediction System

A production-grade Formula 1 prediction system designed for accuracy and ease of use, featuring a modular architecture, advanced machine learning, and a streamlined workflow.

Usage

Here is the complete workflow for fetching data, training models, and making predictions.

Step 1: Fetch Data

Download the latest F1 data, including race results, qualifying, and weather information. The system is smart and will only fetch new or missing data.

python scripts/predict.py fetch-data

To force a complete refresh of all data (recommended for the first run), use the --force flag:

python scripts/predict.py fetch-data --force

Step 2: Train Models

Train the qualifying and race prediction models using the latest data. This step only needs to be run after you have fetched new historical data.

python scripts/predict.py train

Step 3: Make Predictions

The CLI is designed for ergonomics and reproducibility.

# Qualifying or Race (single event)
python scripts/predict.py predict --year 2024 --race "Italian Grand Prix" --session qualifying
python scripts/predict.py predict --year 2024 --race "Italian Grand Prix" --session race

# Scenarios for race predictions via --mode
#   - pre-weekend: no qualifying used
#   - pre-quali  : use predicted qualifying only
#   - post-quali : require actual qualifying
#   - live       : auto-refresh until actual qualifying appears (then exit)
#   - auto       : use actual when available, else predicted (default)
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode auto
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode pre-weekend
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode post-quali

# Next event shortcut and current season default
python scripts/predict.py predict --race next --session race --mode auto --season-current

# Batch runs (entire season or specific rounds)
python scripts/predict.py predict --year 2024 --season all --session race
python scripts/predict.py predict --year 2024 --session race --rounds 1-10
python scripts/predict.py predict --year 2024 --session race --rounds 1,3,5-7

# Watch mode (auto-refresh every N seconds until quali is published)
python scripts/predict.py predict watch --year 2025 --race "Italian Grand Prix" --interval 300

# Live mode via --mode (single event)
python scripts/predict.py predict --year 2025 --race "Italian Grand Prix" --session race --mode live --interval 300

# Simulations (future seasons / what-ifs)
# Provide an optional custom lineup CSV with at least columns: Driver, Team
python scripts/predict.py predict simulate --year 2026 --race "Italian Grand Prix" --session race --lineup path/to/lineup.csv

Training strategies and scenarios

Train separately vs train all

Qualifying only
- Use when you are tuning qualifying rankers (faster iteration) or you changed quali-specific config.
- Command:
```
python scripts/predict.py train --model-type qualifying
```
Race only
- Use when you are iterating on race model logic (e.g., DNF integration, race features) or changed race config.
- Command:
```
python scripts/predict.py train --model-type race
```
All (qualifying + race)
- Use after significant changes to features/config, after fetching a lot of new data, or before a new season.
- Command:
```
python scripts/predict.py train  # same as --model-type all
```

When to retrain:

After fetch-data pulls new seasons/rounds you want reflected in training
After changing feature engineering or config defaults (e.g., ensemble metrics, ranker overrides)
After upgrading dependencies or model versions

Optional hyperparameter tuning and ensembles:

In configs/default.yaml:
- hyperparameter_optimization.enabled: true
- hyperparameter_optimization.objective_metric: spearman # good for ranking
- hyperparameter_optimization.n_trials: 100 (increase if time allows)
- models.ensemble_config.target_metric: spearman (or ndcg)

Notes:

HPO runs for non-ranker models by default. To benefit during qualifying, consider adding non-ranker base models or manually override ranker params under models.qualifying.overrides.

Prediction scenarios (when to use which)

pre-weekend (race)

No actual quali available. Use to get early race outlook from historical/context.

python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode pre-weekend

pre-quali (race)

Right before qualifying; still no actual quali. Similar to pre-weekend but closer in time.

python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode pre-quali

post-quali (race)

After qualifying; uses actual quali if available to refine race positions.

python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode post-quali

live

Auto-refreshes predictions periodically until actual qualifying is detected, then exits.

python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode live --interval 300

auto (default)

Uses actual qualifying when present; otherwise falls back to predicted quali.

python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode auto

Recommended command sequences (chronological)

Fresh setup (or start of a season):

python scripts/predict.py fetch-data --force
python scripts/predict.py train               # trains qualifying + race
python scripts/predict.py predict list-schedule --year 2025  # copy exact EventName
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode auto

Race week – pre-quali planning:

python scripts/predict.py fetch-data
# retrain only if you changed features/config or fetched a lot of new historical data
# python scripts/predict.py train --model-type qualifying
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session qualifying
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode pre-quali

After qualifying – finalize race outlook:

python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode post-quali

Model tuning loop (faster iterations):

# Edit configs/default.yaml
#  - Set hyperparameter_optimization.enabled: true (optional)
#  - Set objective_metric / ensemble target_metric to spearman or ndcg
#  - Override ranker params under models.<target>.overrides as needed
python scripts/predict.py train --model-type qualifying
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session qualifying
# Repeat for race when satisfied
python scripts/predict.py train --model-type race
python scripts/predict.py predict --year 2025 --race "Bahrain Grand Prix" --session race --mode auto

Utilities:

List official event names for a season:

python scripts/predict.py predict list-schedule --year 2025

Simulate future seasons or custom lineups:

python scripts/predict.py predict simulate --year 2026 --race "Italian Grand Prix" --session race --lineup path/to/lineup.csv

Key Features

Modular Workflow - Separate, independent commands for fetching data, training, and predicting.
Prediction Scenarios - --mode supports pre-weekend, pre-quali, post-quali, live, and auto.
Next Event Shortcuts - Use --race next and --season-current to target the upcoming race this year.
Batch Runs - Run entire seasons or ranges of rounds in a single command.
Watch Mode - predict watch auto-refreshes until actual qualifying appears.
Simulations - Future-season what-ifs with predict simulate and custom lineups.
Intelligent Predictions - Automatically uses post-qualifying data (auto mode) when available.
Advanced Feature Engineering - Creates over 100 F1-specific features, including weather and strategy factors.
Ensemble Models - Combines LightGBM and XGBoost (including ranking variants) for robust predictions.
Configuration-Driven - All settings are managed in a central configs/default.yaml file.

Project Structure

f1_prediction_project/
├── src/
│   └── f1_predictor/             # Core prediction package
│       ├── data_loader.py
│       ├── feature_engineering_pipeline.py
│       ├── model_training.py
│       ├── prediction.py
│       └── ...
├── scripts/
│   └── predict.py                # Main entry point for all commands
├── configs/
│   ├── default.yaml              # Central configuration file
│   └── local.yaml.example
├── artifacts/
│   ├── models/                   # Trained model artifacts (.pkl)
│   ├── predictions/              # Prediction outputs (.csv)
│   ├── evaluation/
│   ├── reports/
│   └── visualizations/
├── data/
│   ├── raw/                      # Raw F1 data
│   └── cache/                    # FastF1 and features caches
├── logs/                         # System logs

Installation

# 1. Clone the repository
git clone https://github.com/yourname/f1_prediction_project.git
cd f1_prediction_project

# 2. Create a virtual environment and install dependencies
python -m venv .venv
# On Windows: .\.venv\Scripts\Activate.ps1
# On macOS/Linux: source .venv/bin/activate
pip install -r requirements.txt

Python 3.10+ is recommended.

Configuration

The entire system is controlled via configs/default.yaml (single source of truth). You can provide a gitignored configs/local.yaml for environment-specific overrides and secrets. Values support environment variable interpolation using ${VAR} or ${VAR:-default}.

Examples:

data_collection:
  external_apis:
    openweathermap_api_key: "${OPENWEATHER_API_KEY:-}"

Create configs/local.yaml to override any setting locally (do not commit):

data_collection:
  external_apis:
    openweathermap_api_key: "${OPENWEATHER_API_KEY}"

Strict feature enforcement at inference is enabled by default via general.strict_feature_enforcement: true. Models persist their training feature list, imputation values, config hash, model_version, and git commit in artifacts/models/*_model.metadata.json.

Prediction Artifacts & Metadata

Every prediction writes two files into artifacts/predictions/:

<type>_predictions_<year>_<race>_<timestamp>.csv
<type>_predictions_<year>_<race>_<timestamp>.meta.json

The .meta.json includes:

model_type, year, race_name, generated_at
model_version (from configs/default.yaml)
scenario (e.g., pre-weekend, pre-quali, post-quali, live, auto, or simulate)
features_used (training-time feature list, when known)
config_hash (stable hash of the merged configuration for full traceability)

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.cursor/rules		.cursor/rules
configs		configs
scripts		scripts
src/f1_predictor		src/f1_predictor
tests/system		tests/system
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

F1 Prediction System

Usage

Step 1: Fetch Data

Step 2: Train Models

Step 3: Make Predictions

Training strategies and scenarios

Train separately vs train all

Prediction scenarios (when to use which)

Recommended command sequences (chronological)

Key Features

Project Structure

Installation

Configuration

Prediction Artifacts & Metadata

About

Uh oh!

Uh oh!

Languages

Samarth2001/Predict_F1

Folders and files

Latest commit

History

Repository files navigation

F1 Prediction System

Usage

Step 1: Fetch Data

Step 2: Train Models

Step 3: Make Predictions

Training strategies and scenarios

Train separately vs train all

Prediction scenarios (when to use which)

Recommended command sequences (chronological)

Key Features

Project Structure

Installation

Configuration

Prediction Artifacts & Metadata

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages