Skip to content

pansalasamarth/Air_Quality_Prediction_using_ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Air Quality Prediction Using Machine Learning Models

This repository contains the Python code implementation of the research paper: "Air Quality Prediction by Machine Learning Models: A Predictive Study on the Indian Coastal City of Visakhapatnam" by Gokulan Ravindiran et al. (2023). The study uses advanced machine learning techniques to predict the Air Quality Index (AQI) based on historical data.

Overview

Air pollution is a significant global challenge, impacting both human health and the environment. This project leverages machine learning models to predict AQI levels using air pollutant and meteorological data. The implementation includes:

  • Data preprocessing and transformation
  • Exploratory Data Analysis (EDA)
  • Model training and evaluation using:
    • LightGBM
    • Random Forest
    • CatBoost
    • AdaBoost
    • XGBoost
  • Visualization of model performance and feature importance

Features

  • Handles missing and non-numeric values
  • Performs data transformation for skewness and kurtosis normalization
  • Predicts AQI with high accuracy using optimized machine learning models
  • Generates feature importance and comparison metrics for the models
  • Visualizes AQI trends and pollutant contributions

Dataset

The dataset used in this implementation is based on the Central Pollution Control Board (CPCB) data from July 2017 to September 2022. It includes:

  • 12 Air Pollutants: PM2.5, PM10, NO, NO2, NOx, NH3, SO2, CO, Ozone, Benzene, Toluene, Xylene
  • 10 Meteorological Factors: Temperature, Relative Humidity, Wind Speed, Wind Direction, Solar Radiation, Air Pressure, Ambient Temperature, Rainfall, and Total Rainfall

Requirements

The implementation requires the following Python libraries:

  • numpy
  • pandas
  • matplotlib
  • seaborn
  • scikit-learn
  • lightgbm
  • xgboost
  • catboost

Install all dependencies using:

pip install -r requirements.txt 

Code Structure

  • Data Preprocessing: Handles missing and non-numeric values, normalizes skewed data, and prepares features for modeling.
  • EDA: Analyzes correlations between pollutants and AQI, visualizes monthly and annual pollutant variations.
  • Model Training and Evaluation: Implements and compares the performance of five machine learning models.
  • Prediction: Uses trained models to predict AQI and categorize its health impact.

Results

Model Performance Comparison

Machine learning Models with their performance factors in prediction of AQI
  Model_Training MAE_Training MSE_Training RMSE_Training R2_Training MAE_Testing MSE_Testing RMSE_Testing R2_Testing
0 LightGBM 1.373889 15.846370 3.980750 0.995221 1.811602 19.478235 4.413415 0.992536
1 RandomForest 0.444116 3.171609 1.780901 0.999043 1.279939 20.688324 4.548442 0.992072
2 CatBoost 1.373889 15.846370 3.980750 0.995221 1.811602 19.478235 4.413415 0.992536
3 AdaBoost 1.373889 15.846370 3.980750 0.995221 1.811602 19.478235 4.413415 0.992536
4 XGBoost 0.439370 0.635832 0.797391 0.999808 1.623464 19.362135 4.400243 0.992580

The CatBoost model achieved the highest accuracy with an R² of 0.9998.

Feature Importance

Key contributors to AQI prediction:

  • PM2.5
  • PM10
  • NO2
  • CO
  • NOx

Visualization

The repository includes scripts to visualize:

  • Correlation matrices
  • Feature importance
  • AQI trends (monthly and annual)

References

  1. Gokulan Ravindiran, Gasim Hayder, Karthick Kanagarathinam, Avinash Alagumalai, Christian Sonne. "Air Quality Prediction by Machine Learning Models: A Predictive Study on the Indian Coastal City of Visakhapatnam." Chemosphere, 2023. [DOI: 10.1016/j.chemosphere.2023.139518] (https://doi.org/10.1016/j.chemosphere.2023.139518)

  2. Central Pollution Control Board (CPCB), India.

Acknowledgments

Special thanks to the authors of the research paper and the organizations involved for providing the foundational dataset and methodologies.

About

Air Quality Prediction using different types of ML models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published