🔬 Analyze, visualize, and forecast air pollution (PM2.5) trends using real-world data.
- Overview
- Features
- Project Structure
- Prerequisites
- Installation
- Usage
- Visualizations
- Technologies
- Contributing
Air pollution impacts millions across the globe. This project leverages real-time data from the WAQI API to analyze, visualize, and predict PM2.5 air quality metrics using modern data science practices.
- 📲 Live Data Collection: Fetches air quality data (PM2.5, temperature, humidity) for any city
- 🧹 Data Cleaning: Handles missing values & normalizes features for robust analysis
- 📊 Exploratory Data Analysis: Intuitive plots to identify trends, outliers & distributions
- 🤖 Machine Learning Model: Linear regression-based PM2.5 prediction
- 📝 Model Evaluation: Key performance metrics and visual inspection
- 💾 Model Saving: Export your trained model for reuse
- 📒 Easy-to-Read Notebooks: All analyses clearly documented in Jupyter Notebooks
📁 air-quality-analysis/
├── data_collection.py # Fetches and stores live air quality data (WAQI API)
├── data_cleaning.py # Cleans and preprocesses collected data
├── train_model.py # Trains the machine learning model
├── eda.ipynb # Jupyter Notebook for Exploratory Data Analysis
├── outputFiles/ # Stores intermediate data (CSV), models (pkl)
└── README.md # Project documentation (this file!)- Python 3.12+
- Get your WAQI API Key here
- Recommended:
pippackage manager
- Clone the repository:
git clone https://github.com/yourusername/air-quality-analysis.git
cd air-quality-analysis- Install dependencies:
pip install -r requirements.txt- Configure your WAQI API key in :
data_collection.py
API_KEY = "your_api_key_here"- Ensure an output directory exists (if not, create one):
mkdir outputFilesStep 1: Collect live air quality data
python data_collection.pyStep 2: Clean and preprocess gathered data
python data_cleaning.pyStep 3: Explore your dataset visually
jupyter notebook eda.ipynbStep 4: Train and evaluate the predictive model
python train_model.py- Distributions: Histogram of PM2.5 levels
- Trends: PM2.5 over time
- Correlations: Scatter plots (PM2.5 vs temp & humidity)
- Outlier Detection: Boxplots
- Model Performance: Actual vs Predicted PM2.5 plot
Example:
- Python: Data analysis & ML scripting
- Pandas, NumPy: Data manipulation
- Matplotlib, Seaborn: Visualization
- scikit-learn: Machine learning
- Requests: API data fetching
- Jupyter Notebook: Exploratory data analysis
We ❤️ contributions!
- Fork the repository
- Create your branch (
git checkout -b feature/something) - Commit your changes (
git commit -m 'Add new feature') - Push to the branch (
git push origin feature/something) - Open a Pull Request