The goal of this project is to build, train and tune a prediction model for timeseries and then use it to forecast data. A demo is available using Github Pages.
The data is sourced using the Entsoe Python Client.
- Python (Pandas, Scikit-learn, numpy)
- Skforecast (LGBM and XGBoost models)
- Plotly for visualizations
In this project, I have built a pipeline to download and process regularly updated data from Entsoe, a European platform promoting transparency of data regarding energy production. The package in this project can download data on the French energy grid (both actual system load and the one-day ahead forecast provided by Entsoe) to then train various models (LGBM mostly, XGBoost is also supported) using the Skforecast library. These models predict new data and can be visually and quantitativel compared to the provided day-ahead forecast on the generated figure.
The model is initially trained on historical data from 2020 until the end of 2024, and is then re-trained every week using a new batch of data. The training also consists of tuning for hyperparameters using Bayesian search with a backtesting cross-validation setup.
The structure of the code can be symbolically represented through its classes and interfaces:
classDiagram
class PredictionFigure {
- __init__(self, output_prediction) None
+ make_plot(self)
+ write_to_file(self) None
}
namespace preprocessing {
class ExogBuilder {
- __init__(self, periods, country_code) None
- _get_time_columns(self, X) pd.DataFrame
+ build(self, start_date, end_date) pd.DataFrame
}
class LinearlyInterpolateTS {
+ apply(self, y) pd.Series
}
}
namespace models {
class ForecasterRecursiveModel {
+ ForecasterRecursive forecaster
+ str name
- __init__(self, iteration, end_dev) None
+ save_to_file(self) None
- _build_cv_dev(self, train_size) TimeSeriesFold
- _build_cv_test(self, train_size) TimeSeriesFold
+ fit_with_best(self) None
+ tune(self) None
+ backtest(self) None
+ predict(self, delta_predict) tuple[dict, tuple[pd.Series, pd.Series]]
+ get_training(self) tuple[dict, tuple[pd.Series, pd.Series]]
+ get_error_forecast(self, delta_predict) tuple[dict, tuple[pd.Series, pd.Series]]
+ package_prediction(self)
+ get_feature_importance(self) pd.DataFrame | None
}
class ForecasterRecursiveLGBM {
- __init__(self, iteration, end_dev) None
}
class ForecasterRecursiveXGB {
- __init__(self, iteration, end_dev) None
}
}
ForecasterRecursiveLGBM --|> ForecasterRecursiveModel
ForecasterRecursiveXGB --|> ForecasterRecursiveModel
namespace main {
class download
class train
class predict { +plot }
}
class index["index.html"]
class EntsoePandasClient
%% note for EntsoePandasClient "External client connecting to Entsoe API"
predict --> PredictionFigure
PredictionFigure --> index
download --> EntsoePandasClient
train --> ForecasterRecursiveLGBM
ForecasterRecursiveModel -- ExogBuilder
ForecasterRecursiveModel -- LinearlyInterpolateTS
- Clone this repository.
- The raw data is being kept here within this repo.
- Data processing/transformation scripts are being kept here.
- Exploratory data analysis can be found in a Jupyter notebook here (deprecated).
Here is how to run the interface
usage: main.py [-h] {download,predict,train,merge} ...
Prediction of energy demand in France
positional arguments:
{download,predict,train,merge}
options:
-h, --help show this help message and exit