6105 Final Project
Using the data of Beijing, we aim at building a model that take some factors into consideration and predict the future concentration of PM2.5
The project first tried HMM and linear regression, then turned into randomforest and ANN for more accurate prediction
Data source:
https://www.kaggle.com/sid321axn/beijing-multisite-airquality-data-set/data
The Final version of our work are listed below (please read with order):
- Intro_and_EDA_start.ipynb: Introduction and EDA
- HMM-and-Linear-Model.ipynb: HMM and Linear Model
- RandomForest_final.ipynb: RandomForest Model
- Tensorflow.ipynb: ANN model
Folders
- data: raw data and data/cleanup for cleanup data
- images: images for notebook
- model: saved trained model
- paper: research reference
Other Files:
- simple_clean_up.ipynb: simple clean up function for data clean up
- Visualization.ipynb: visualization test
- April 7, Data set added
- April 8, EDA for Shunyi data
- April 9, EDA for Shunyi(cont'd) and cleanup for other data set
- April 14, Theoretical support/papers/images
- April 14, Bayesian inference update
- April 17, More linear model
- April 20, Add HMM, final version for HMM-Linear-Model