Problem Definition

The goal of this project is to predict electric power consumption for various countries using a regression-based approach. The dataset consists of columns representing country names and years from 1981 onwards, with each cell containing the energy consumption value for a specific country and year.

The main objectives of this project are:

Data Preprocessing:
- Filter out countries with less than 6 rows of data.
- Ensure the first row for each country is not before 2005.
- Transform the dataset by converting year columns into rows.
- Remove data points after 2014 as they are not relevant for the prediction task.
Feature Engineering:
- Extract relevant features from the preprocessed dataset, such as year and country-specific features.
- Create a feature matrix by combining the extracted features.
Regression Modeling:
- Train and select the best-performing regression algorithm using the feature matrix and target variable.
- Evaluate the selected model's performance using appropriate metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared ($R^2$).
- Fine-tune the selected model's hyperparameters to optimize its performance.
Prediction:
- Make predictions on the test set or future years using the trained regression model.

Mathematical Notation

Let the raw dataset be represented as a matrix $D$, where each row corresponds to a country and each column represents a year. The element $d_{ij}$ denotes the energy consumption for country $i$ in year $j$.

After data preprocessing, let the transformed dataset be represented as $X$, where each row corresponds to a year and each column represents a country.

The feature matrix $F$ is created by combining the extracted features: $F = [x_1, x_2, \ldots, x_n]$

The target variable (energy consumption) is denoted as $y$.

The trained regression model is denoted as $f(x)$, which takes the feature vector $x$ as input and predicts the energy consumption $\hat{y}$.

For a given country $c$ and year $t$, the energy consumption prediction can be obtained as:

$\hat{y}{c,t} = f(x{c,t})$

where $x_{c,t}$ represents the feature vector for country $c$ in year $t$.

Tools

The following tools and libraries will be used in this project:

Pandas: A powerful data manipulation library for data preprocessing and feature engineering.
PyCaret: An open-source, low-code machine learning library in Python that automates machine learning workflows, including model training, selection, and evaluation.
Docker: A containerization platform that provides a consistent and reproducible environment for running the project.
Streamlit: An open-source app framework for building interactive web applications, which will be used to create a user-friendly interface for the energy consumption prediction project.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
src		src
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem Definition

Mathematical Notation

Tools

About

Releases

Packages

Languages

tataknu/energy-prx

Folders and files

Latest commit

History

Repository files navigation

Problem Definition

Mathematical Notation

Tools

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages