Skip to content

The-Powder-of-Data/detecting_deforestation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Title Graphic

This project explores different methods to identify deforestation through satellite imagery data. The initial ground work for this repository was done as an individual 2 week project whilst studying Data Science at Lighthouse Labs. I am currently continuing to build out this repository.


Content

  1. Overview
  2. Modeling
  3. Cloud
  4. Resources

Overview

Overview Image Welcome to my project on using satellite images to detect deforestation. I have a passion for sustainability and technology that enable us to make an impact. Applying a data science lens to earth monitoring is an area that I am striving to learn more about and thus the focus of this project.

Currently we are at an inflection point where we risk only having 10% of the worlds rainforests remaining by 2030. There is no Planet B and collectively we need to make more steps to make a measurable difference in the declining state of our world's ecological balance.

Through advancement in technology, such as with satellite imagery, there is more access to open source data and tools than ever before. As an individual you have access to free open source satellite data that can be downloaded and leveraged today. One of the personal aims for this project was to better understand the landscape of modern tools for earth monitoring and practice creating my own suit of ML solutions.


The Vision

My vision for this project was to use combine my Data Science learning and Satellite Imagery to build an Analytics Time Machine. For my initial efforts I focused on the first 2 components. Project Vision


Planning & Presentation

I am a very visual worker. My life and brain exists on the platform Miro which is the best online whiteboard I have found so far. You may be amused to learn that I have successfully workshoped both business strategy and virtual dates whilst in a long distance relationship on Miro! If interested I have linked my Miro workspace, final presentation and a few resources. If you ever want to talk about visual learning and collaboration platforms please feel free to reach out to me on LinkedIn.

  • Presentation: If you would like watch my 5 min walkthrough
  • Miro Workspace: This is where I visually capture my brain, excuse the mess
  • Project Gannt: Done in Miro and shared with my cohort to help them manage their delivery timeline

Modeling

Multi Label Classification

I started with the first layer to my solution. Learning how to approach a supervised multi label classification model. These results can then be used as layer 1 in our overall understanding and automation of processing satellite imagery of our earth. MLC Example

Data & Processing

As a start point I used the Planet dataset. The original 22gb torrent is no longer being seeded, however there is an alternative link to a balanced sample that you can download.
For this example dataset you could choose not to do any pre-processing as the satellite images come already broken into small enough tiles for preprocessing. If ingesting images from a satellite API you would have to preprocess images into small enough patches/tiles in order for them to be ingested by the model. To see an example of this please look at the Semantic Segmentation Model approach.


Results & Application

FastAI was a great library once learning the basics. I was able to get strong results using a few techniques.

  • Transfer learning starting with a Resnet50 architecture and imagenet weights
  • Discriminate Learning Rates
  • Image sizing trick to increase the size of our dataset

After a variety of iterations the max result obtained was an f2 of 0.93 which puts this model in the top 60 results of 888 entries in the original competition. MLC Results

Semantic Segmentation

The second layer to the envisioned tool was to have pixel by pixel identification on each image. For this Tensorflow was used as the main library. Using transfer learning and a UNet architecture with additional pooling yielded the best results on the training dataset. I did also try a Resnet architecture with imagenet weights and found it yielded similar results. However the exported model size of a Resnet architecture was significantly larger. A large amount of learning around image data processing and how to evaluate segmentation results went into this notebook. This notebook is where a large portion of my time was spent during this project. Image Processing

Data Processing

One of the biggest learning lessons of this project was image processing to ensure the right shape to be ingested by our model. As satellite images are high resolution files we do not have powerful enough resources to run models over all pixels as a single input. To accommodate for this, a large image has to be broken down into smaller patches (sometimes referred to as image tiles). Once both images and training masks are divided into patches the data can then be inputted into the model for training. Process Detail Once the model has been trained and patches predicted, the smaller image tiles need to be stitched back together to make an original image. This was done originally with manual code and then using the library of Patchify to assist in the process. A shoutout to DigitalSreeni for fixing bugs in the library that has not been updated in the last few years. I also experimented with using a smooth blending technique where tiles overlap and thus we can use a gaussian blend to smooth out edge artifacts. This is the approach I used in the final model as it picked up more features in images than without. Patched Smooth Blend

Model, Results & Application

Using the Unet approach as my final model converged on a reasonable Jaccard Coefficient (similar to Mean IOU) of 0.73. UNet Results Evan Metric Originally this model was trained on a dataset of Dubai with 6 labels. After tuning the model to satisfactory levels, I experimented with predicting segmentation masks on BC satellite images. Even though this model was trained in a different context (Dubai) than what its intended application was to be (logging in BC) it is a starting step to this continued experiment. To my suprise the model was able to pick out features such as clear cut forest quite well. The next steps would be to aquire training mask data for the intended region of use and compare results. As this is a supervised approach the training segmentation masks may have to be created manually by SMEs. BC Example BC Example


Cloud

Google Cloud Platform

In this project I focused on using Google Cloud Platform as much as possible. As I had previous experience with AWS I wanted to take the opportunity to learn what GCP had to offer. In the spirit of everything Google I also used Google Colab as my main IDE to utilize access to a GPU for training.

For files included in this repo I have authenticated through my personal email with the project Google Cloud Buckets. If you would like to replicate this approach please setup a GCP account and you will get $300 for free for 80 days. Alternatively you could host and reference your files locally.

Setting this product up in the cloud is still a work in progress but below are some of the key areas I experimented with and learnt more about.

  • Managing a storage bucket for image and model storage
  • BigQuery for table to pull and push data from
  • Compute Engine VM as an instance to test hosting of an api
  • App Engine to host a containerized version of our Streamlit app
  • Vertex AI for model and maintenance

As I found my data processing time slower than ideal when using the base Streamlit cloud service, I decided I wanted to setup an API in GCP that I could scale as needed to increase the output time of model predictions. Below is my original concept diagram of how a streamlit script would flow with GCP Bucket through an api to deliver quicker results. API Diagram

Streamlit Dashboard

As I learn more about Streamlit I was able to get an MVP ready within the short timeline of this project. I did have some hurdles successfully hosting this solution in the cloud. After my Data Science bootcamp I will be back to revisit this learning architecture. My aim is to have a flywheel setup so the model is hosted, can take in new user data and learn in cycles.


As a result of my explorations I created a streamlit script to host this model as a proof of concept. I used a slider to help users explore the results of their uploaded image. Streamlit Example

My next steps are as follows: Next Steps Kanban

Resources

General

Achademic Papers / Articles

Learnings

Platforms

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published