This project provides a comprehensive pipeline for forecasting freight prices using multiple datasets, advanced feature engineering, and a suite of machine learning and time series models. It covers the full workflow from raw data ingestion through preprocessing, feature extraction, model training, and results reporting.
- /data/raw/ # Raw data files (Excel, CSV)
- /data/processed/ # Cleaned and feature-engineered datasets
- /models/ # Model scripts (ARIMA, XGBoost, Lasso, Prophet, etc.)
- /reports/models/ # Model evaluation metrics and plots
- /utils/ # Utility functions (preprocessing, diagnostics)
- /data_pipeline/ # Scripts to fetch and load raw data
- /feature_engineering/ # Feature engineering pipeline scripts
- /notebooks/ # Jupyter notebooks and demo
- Python 3.10 or newer
- Recommended: Virtual environment (venv or conda)
Clone the repository:
cd freight-forecasting
python -m venv venv source venv/bin/activate
pip install -r requirements.txt
Run any part or all of the pipeline using the CLI interface main_cli.py.
--fetchFetch all raw data from source files--prepareMerge and align raw data into a single weekly dataset--featuresPerform feature engineering (interpolation, volatility, seasonality)--trainTrain and benchmark all predictive models with hyperparameter tuning--reportGenerate model comparison dashboards and summary tables
-
Data Fetching Scripts in data_pipeline/ load raw data from Excel/CSV files, perform initial cleaning, and save intermediate processed CSVs.
-
Data Preparation Loading and merging datasets, resampling to a consistent weekly Monday frequency, and aligning time series.
-
Feature Engineering Interpolation of missing values, computation of volatility indicators, and extraction of seasonal/trend components for key variables.
-
Model Training Multiple models trained and benchmarked, including:
- Auto ARIMA
- SARIMAX with exogenous variables
- Lasso (with and without lags)
- Ridge Regression
- Support Vector Regression (with hyperparameter tuning)
- XGBoost Regression (with hyperparameter tuning)
- Prophet (uni- and multivariate, tuned)
-
Results Reporting Generation of performance summaries, metrics logs, and comparison plots saved in reports/models/.
Processed Data: data/processed/processed.csv (final merged and feature-engineered dataset)
Model Performance Plots: Saved in reports/models/ as PNG files
Model Comparison Dashboard: Summary plots comparing MAE and R² scores across models
Ensure all dependencies in requirements.txt are installed.
Confirm that raw data files exist in /data/raw/ before fetching or preprocessing.
Check that data/processed/ contains necessary intermediate files before training models.
Contributions are welcome! Please feel free to write to me to open an issue and discuss your ideas.
Stefan Pilegaard Pedersen May 2025
This project is licensed under the GNU General Public License v3.0.