Nitrogen dioxide is one of the most hazardous pollutants identified by the World Health Organisation. Predicting and reducing pollutants is becoming a very urgent task and many methods have been used to predict their concentration, such as physical or machine learning models. In addition to choosing the right model, it is also critical to choose the appropriate features. This work focuses on the spatiotemporal prediction of nitrogen dioxide concentration using Bidirectional Convolutional LSTM integrated with the exploration of nitrogen dioxide and associated features, as well as the implementation of feature selection methods. The Root Mean Square Error and the Mean Absolute Error were used to evaluate the proposed approach.
The purpose of each file included in this repository is briefly described below:
- MadridExploration.zip contains the result of an exploratory analysis that identifies the relationship between nitrogen dioxide and additional features (meteorological and traffic data).
- Trained_Files.zip contains trained files (.json and .h5) for each subsets (extracted features for each scenario after mutual information and mRMR implementation).
- Traffic_Average_Speed_Calculation.ipynb calculates the average traffic speed for the period 1-7 January 2019 in the city of Madrid.
- mRMR.ipynb calculates Maximum Relevance - Minimum Redundancy.
- Mutual_Information.ipynb calculates mutual information.
- BiConvLSTM.ipynb includes the steps for predicting nitrogen dioxide implementing BiConvLSTM on various selected subsets.
The final version of the preprocessed dataset can be found at the following link: https://doi.org/10.5281/zenodo.6543073. The code for grid cell generation, data preprocessing and the model construction can be found at the following link: https://bit.ly/3vfwrjJ