The goal of this project is to predict electric power consumption for various countries using a regression-based approach. The dataset consists of columns representing country names and years from 1981 onwards, with each cell containing the energy consumption value for a specific country and year.
The main objectives of this project are:
-
Data Preprocessing:
- Filter out countries with less than 6 rows of data.
- Ensure the first row for each country is not before 2005.
- Transform the dataset by converting year columns into rows.
- Remove data points after 2014 as they are not relevant for the prediction task.
-
Feature Engineering:
- Extract relevant features from the preprocessed dataset, such as year and country-specific features.
- Create a feature matrix by combining the extracted features.
-
Regression Modeling:
- Train and select the best-performing regression algorithm using the feature matrix and target variable.
- Evaluate the selected model's performance using appropriate metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (
$R^2$ ). - Fine-tune the selected model's hyperparameters to optimize its performance.
-
Prediction:
- Make predictions on the test set or future years using the trained regression model.
Let the raw dataset be represented as a matrix
After data preprocessing, let the transformed dataset be represented as
The feature matrix
The target variable (energy consumption) is denoted as
The trained regression model is denoted as
For a given country
$\hat{y}{c,t} = f(x{c,t})$
where
The following tools and libraries will be used in this project:
-
Pandas: A powerful data manipulation library for data preprocessing and feature engineering.
-
PyCaret: An open-source, low-code machine learning library in Python that automates machine learning workflows, including model training, selection, and evaluation.
-
Docker: A containerization platform that provides a consistent and reproducible environment for running the project.
-
Streamlit: An open-source app framework for building interactive web applications, which will be used to create a user-friendly interface for the energy consumption prediction project.