Skip to content

lllohr/Ocean_Plastic_Pollution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ocean Plastic Pollution

PET-bottles-marine-pollution

Collaborators:

  • Andrea Dacy
  • Laura Lohr
  • Stephanie Perillo
  • Amy Tisland

Project Overview

Plastic pollution threatens food safety and quality, human health, coastal tourism, and contributes to climate change. Plastic pollution in the ocean has a devastating impact on marine life and ecosystems.

The purpose of this project is to analyze data on mismanaged plastic in oceans.

We hope to answer the following questions:

  1. What are the most common types of plastic found in the ocean?
  2. Which countries pollute the most plastic?
  3. Is there a correlation between a country's GDP (Gross Domestic Product) and ocean plastic pollution?

We chose to use PostgreSQL and various machine learning models. We then created a dashboard in Tableau.

Links to Tableau & Google Slides presentation:

Click here for Dashboard

Click here for Presentation

Datasets:

  1. https://www.kaggle.com/code/mihailpavlyuk/world-map-plasticwaste

  2. https://wesr.unep.org/downloader (Plastic on beach tonnes)

  3. https://www.kaggle.com/datasets/maartenvandevelde/marine-litter-watch-19502021

  4. https://ourworldindata.org/grapher/per-capita-plastic-waste-vs-gdp-per-capita


Analysis & Results

Initial Data exploration phase

  1. Dropping columns/excluding data
  2. Elimination null values
  3. Renaming columns
  4. Assigning new values to Country codes and plastic pollution
  5. Created a diagram to combine tables for PostgreSQL
  • The image below represents the common connect between our datasets - country:

QuickDBD-ocean_plastic_pollution

There were actually 164 rows in the Data table, each coding to a different type of waste. This ERD only shows a sample of this data.


Amazon Web Service (AWS) RDS instance & Database

  1. Read in data from S3 Buckets for four CSV files
  2. Connect to the AWS RDS instance and wrote each dataframe into four tables

connect AWS

  1. A PostgreSQL database, "plasticpollutiondb" was created along with ten tables

Tables


Machine Learning Model

What machine learning models did we use?

Primarily supervised learning models. We used K Means Clustering (unsupervised learning), linear regression, and logistic regression (both supervised learning). We used the Balanced Random Forest Classifier, Easy Ensemble Classifier, oversampling, undersampling, SMOTE Oversampling, and SMOTEENN.


Cluster Graph

Why did we choose the models we did?

We used Linear Regression because it is the easiest and most popular models to look at relationships between the variables.

We used Logistic Regression to try to predict whether or not a country’s GDP or population would determine how much plastic waste they had. For logistic regression, we used the Balanced Random Forest Classifier, Easy Ensemble Classifier, oversampling, undersampling, SMOTE Oversampling, and SMOTEENN because we wanted to use a variety of methods.

We used various models because we wanted to see which model would give us the best results. We had previously used several supervised and unsupervised models in our class modules. We wanted to find the one that would have the best performance with our particular dataset and questions.

What was our process? How did we do it? What data did we use?

We used two data sets that we merged—one containing the population and GDP and the other containing the metric tons data on plastic waste. For the logistic regression, we created bins to classify our metric tonnage for each country based upon their totals: Low, Medium, High, Extreme. Using these categories, we were able to run the data through the models and try to determine if there was any correlation.

What did our models find?

Our models were not conclusive.

Although not conclusively, our models did seem to indicate that countries with lower GDP had higher instances of mismanaged waste or mismanaged waste that was equal to the higher GDP countries.


Waste by GDP


What we found was not what we expected. We expected that the higher the GDP, and therefore, the higher the consumerism, the higher the plastic waste.


Top Country Contributors

If we had more time, what would we explore next?

If we had more time, we would explore the reasons why we did not find what we thought we would and look into other dynamics that our data did not illuminate for us. Where is this plastic waste coming from? Is it landfills? Sewers? Which industries produce the most plastic waste? Are countries importing waste to other countries?

What was the limitations of our data/machine learning models? What challenges did we have with creating/applying machine learning?

Our data had only 492 rows. If we were to dig into this topic more robustly, we would likely want to look at larger data sets. An issue we ran into was that our data sets did not match. For example, for some of the years we had metrics on some variables but the other variables we wanted to explore were for other years. This complicated our process. We had already cleaned our data and prepared it for analysis before we realized that our data set was not as complete as we would have liked.

Conclusions for Machine Learning

China, India, Brazil, Indonesia, Nigeria, Pakistan, Bangladesh, Egypt were some of the highest contributors of mismanaged waste. We were able to see that through our clustering.


Map by Clusters

Mismanaged waste does not increase proportionally with GDP. There are outliers, however, our data did not support a direct correlation.


Dashboard

An example of some of the features of our dashboard:

Analysis Results

Our analysis revealed many interesting findings. In Europe, cigarette butts and filters were the most common type of plastic waste collected on beaches, by far. Various sized plastic and polystyrene pieces were also among the most common types, followed by plastic caps/lids, shopping bags, and food packaging. Spain and Romania contributed the most to the amount of cigarette butts and filters found on European beaches. We also found that the countries that had the highest amounts of mismanaged plastic waste may not necessarily have the highest GDP. China, which has the highest population and a relatively low GDP, produces the most mismanaged plastic waste. On the other hand, the United States has a high population and GDP, but a disproportionately low amount of mismanaged plastic waste. This is likely because the US and other countries ship their waste to other countries to be processed.


Recommendations and improvements for future analysis:

  • Having more time in discovering data sets
  • Choosing more robust data sets so that machine learning models are more effective
  • Examine how much waste countries export to other countries
  • Find data on the types of plastic pollution found in areas outside of Europe
  • Additional predictions considering The Ocean Cleanup's efforts of removing ocean garbage and intercepting river waste from entering oceans

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •