Machine Learning Internship at FTS

Air Quality Index Prediction

As air pollution is a complex mixture of toxic components with considerable impact on humans, forecasting air pollution concentration emerges as a priority for improving life quality. So with the help of Python tools and some Machine Learning algorithms, we try to predict the air quality.

Introduction

During the project We were given two datasets:

1. cities\_by\_day → day-wise information including the amount of various chemical substances present in different cities and the AQI information.
2. cities\_by\_hours → hours-wise information including the amount of various chemical substances present in different cities and the AQI information.

Dataset information

The dataset contains 9358 instances of hourly averaged responses from an array of 5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device. The device was located on the field in a significantly polluted area, at road level,within an Italian city. Data were recorded from March 2004 to February 2005 (one year)representing the longest freely available recordings of on field deployed air quality chemical sensor devices responses. Ground Truth hourly averaged concentrations for CO, Non Metanic Hydrocarbons, Benzene, Total Nitrogen Oxides (NOx) and Nitrogen Dioxide (NO2) and were provided by a co-located reference certified analyzer. Evidences of cross-sensitivities as well as both concept and sensor drifts are present as described in De Vito et al., Sens. And Act. B, Vol. 129,2,2008 (citation required) eventually affecting sensors concentration estimation capabilities. Missing values are tagged with -200 value. This dataset can be used exclusively for research purposes. Commercial purposes are fully excluded.

Attribute information

- 0 Date	(DD/MM/YYYY) 
- 1 Time	(HH.MM.SS) 
- 2 True hourly averaged concentration CO in mg/m^3 (reference analyzer) 
- 3 PT08.S1 (tin oxide) hourly averaged sensor response (nominally CO targeted)	
- 4 True hourly averaged overall Non Metanic HydroCarbons concentration in microg/m^3 (reference analyzer) 
- 5 True hourly averaged Benzene concentration in microg/m^3 (reference analyzer) 
- 6 PT08.S2 (titania) hourly averaged sensor response (nominally NMHC targeted)	
- 7 True hourly averaged NOx concentration in ppb (reference analyzer) 
- 8 PT08.S3 (tungsten oxide) hourly averaged sensor response (nominally NOx targeted) 
- 9 True hourly averaged NO2 concentration in microg/m^3 (reference analyzer)	
- 10 PT08.S4 (tungsten oxide) hourly averaged sensor response (nominally NO2 targeted)	
- 11 PT08.S5 (indium oxide) hourly averaged sensor response (nominally O3 targeted) 
- 12 Temperature in Â°C	
- 13 Relative Humidity (%) 
- 14 AH Absolute Humidity

Install

This project requires Python 3.6 and the following libraries installed:

Approach For Analysing Data

We have initially performed Exploratory Data Analysis including Data preprocessing, Outlier.
treatment and Data visualization to study the datasets.
We have then used certain algorithms like XGBoost and Stacked LSTM to create a model that
will predict the AQI for any future reference using the input we are giving.

Models trained on

Linear Regression
Linear Regression with L1, L2 regularization
Boosted models
Stacked LSTM

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
0a8cedd8-0d57-11ec-a980-0cc47a792c0a_id_0a8cedd8-0d57-11ec-a980-0cc47a792c0a_files		0a8cedd8-0d57-11ec-a980-0cc47a792c0a_id_0a8cedd8-0d57-11ec-a980-0cc47a792c0a_files
City by hour analysis with rnn.ipynb		City by hour analysis with rnn.ipynb
City_by_day Data_Preprocessing(2).ipynb		City_by_day Data_Preprocessing(2).ipynb
City_by_day Data_Preprocessing(3).ipynb		City_by_day Data_Preprocessing(3).ipynb
City_by_day_visualizations_2.ipynb		City_by_day_visualizations_2.ipynb
City_by_hour Data_Preprocessing(1).ipynb		City_by_hour Data_Preprocessing(1).ipynb
City_by_hours_imputation.ipynb		City_by_hours_imputation.ipynb
Data preprocessing(city_by_day).ipynb		Data preprocessing(city_by_day).ipynb
FTS.pdf		FTS.pdf
README.md		README.md
Report.html		Report.html
Stacked LSTM.ipynb		Stacked LSTM.ipynb
Visualization_on_AQI.ipynb		Visualization_on_AQI.ipynb
Visualization_on_cities_by_hour_after_imputation.ipynb		Visualization_on_cities_by_hour_after_imputation.ipynb
XGBoost_Regresser (1).ipynb		XGBoost_Regresser (1).ipynb
cities_by_day.csv		cities_by_day.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Internship at FTS

Air Quality Index Prediction

Introduction

Dataset information

Attribute information

Install

Approach For Analysing Data

Models trained on

About

Releases

Packages

Languages

pandirabhishek/FTSAirDataAnalysis

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Internship at FTS

Air Quality Index Prediction

Introduction

Dataset information

Attribute information

Install

Approach For Analysing Data

Models trained on

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages