This project provides a comprehensive analysis of global plastic waste production and management for the year 2023, spanning 165 countries. The dataset presents insights into plastic waste handling, production volumes, recycling efficiency, and environmental risk assessment across different nations.
The dataset offers an extensive look into plastic waste management, featuring:
- Country-wise Plastic Waste Production: Plastic waste production volumes (in million metric tons) for each country.
- Main Sources of Plastic Waste: Identification of primary sources contributing to plastic waste in each country.
- National Recycling Rates: Recycling efficiency rates (%) on a national level.
- Per Capita Waste Production: Waste generation per person (kg/person).
- Coastal Waste Risk Assessment: Evaluation of environmental risk for coastal regions impacted by plastic waste.
Note: The data values are approximations derived from historical trends, AI-based large language models (LLM), economic indicators, and waste management patterns up to 2023. Actual figures may vary.
This project leverages machine learning models to provide deeper insights into the data, specifically applying the following classifiers:
- RandomForestClassifier: Used for analyzing various categorical and numerical features within the dataset.
- CatBoostClassifier: Applied for enhanced performance with categorical data, improving model interpretability.
- World Map Visualizations: Geographic plots are created to visually represent country-wise plastic waste production, recycling rates, and coastal waste risk.
To replicate the analysis, ensure you have the following libraries installed:
pip install pandas numpy scikit-learn catboost matplotlib geopandas
data/
: Contains the dataset files.notebooks/
: Jupyter notebooks with step-by-step analysis and model implementation.src/
: Source files for data processing and model training.visualizations/
: Contains world map visualizations and other graphical outputs.
- Data Preprocessing: Handle missing values, normalize the data, and encode categorical variables.
- Exploratory Data Analysis: Generate statistical summaries and visualize country-wise plastic waste, recycling rates, and per capita waste production.
- Machine Learning Models:
- Apply
RandomForestClassifier
to predict recycling efficiency. - Use
CatBoostClassifier
for improved accuracy with categorical features.
- Apply
- World Map Visualizations: Use
geopandas
to create visual maps representing plastic waste production and coastal waste risk globally.
plt.figure(figsize=(10,6))
index_values = [high_risk, low_risk, medium_risk, very_high_risk]
index_labels = ['High Risk', 'Low Risk', 'Medium Risk', 'Very High Risk']
plt.pie(index_values, labels = index_labels, autopct='%2.2f%%')
plt.title('Overall Coastal Risk Distribution', fontsize=20)
plt.show()
import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = world.merge(df, left_on='name', right_on='Country', how='left')
fig, ax = plt.subplots(1, 1, figsize=(15, 10))
world.boundary.plot(ax=ax)
world.plot(column='Per_Capita_Waste_KG', ax=ax, legend=True, cmap='viridis',
legend_kwds={'label': "Per_Capita_Waste_KG",
'orientation': "horizontal"})
plt.title('World Distribution of Per_Capita_Waste_KG', fontsize=16)
plt.show()
import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world = world.merge(df, left_on='name', right_on='Country', how='left')
fig, ax = plt.subplots(1, 1, figsize=(15, 10))
world.boundary.plot(ax=ax)
world.plot(column='Recycling_Rate', ax=ax, legend=True, cmap='viridis',
legend_kwds={'label': "Recycling_Rate",
'orientation': "horizontal"})
plt.title('World Distribution of Recycling_Rate', fontsize=16)
plt.show()
-
Clone this repository:
git clone https://github.com/username/global-plastic-waste-analysis.git cd global-plastic-waste-analysis
-
Load and preprocess the dataset:
- Open a Jupyter notebook or Python script and follow the steps in
notebooks/data_analysis.ipynb
.
- Open a Jupyter notebook or Python script and follow the steps in
-
Train and Evaluate Models:
- Execute the code in
notebooks/model_training.ipynb
to train theRandomForestClassifier
andCatBoostClassifier
.
- Execute the code in
- Country-wise maps and graphs showcasing plastic waste production, recycling rates, and risk assessments.
- Model performance metrics (accuracy, F1-score, etc.) for both classifiers, highlighting feature importance and predictive insights.
This project is licensed under the MIT License. See the LICENSE file for more details.
Contributions are welcome! Please submit a pull request or open an issue for feedback or suggestions.
For any inquiries, please reach out via LinkedIn.