Skip to content

javonnii/classifying_marine_mammals_vocals_with_CNN_and_Dolby

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Classifying marine animals with a CNN

"Because water is denser than air, sound travels very efficiently underwater. Sounds from some species of marine life and human activity can be heard many miles away and, in some cases, across oceans.

Passive acoustic instruments record these sounds in the ocean. There are some hydrophones that generate up to 24 terabytes a year! "e.g. Big Data"

This data provides valuable information that helps government agencies and industries understand and reduce the impacts of noise on ocean life.

By listening to sensitive underwater environments with passive acoustic monitoring tools, we can learn more about migration patterns, animal behavior and communication." quoted from noaa


The goal of this project is to explore marine animals classification. I will be implementing two machine learning models, a neural network and convolutional neural network. The marine animals that I'll be classifying:
  • Killer Whale
  • False Killer Whale
  • Bowhead Whale
  • White Sided Dolphin
  • Risso Dolphin
  • Northern Right Whale
  • Humpback Whale
  • Sperm Whale
  • Short Finned Pilot Whale

About the Data

This project will use labeled raw audio from:


Audio Libraries used


Preparing data for classification

  • Reading and web scraping audio files and their labels.

  • All the audio files were sliced into 30 second clips. Audio files that were longer than 30 seconds were decomposed into lengths of 30 seconds clips which helped generate more data.

  • Next I duplicated all the audio files per class and augmented those halves. I randomly augmented each file

    • +/- 3 dB ,
    • +/- 2 semitones,
    • time stretch
    • and added some noise.
  • Here, I implemented Dolby IO for analysis and enhancement over 1,000 audio clips.

  • This doubled the size of data in each class where exactly half of the data in each class is an augmented version of the original file.

Here is some exploratory visual representations of each class using spectrograms and oscillograms.

  • Compare the waveform and the spectrogram from the dataset.


Visualize MFCCs and MFCCs delta

MFCCs wiki

  • Audio feature choice for speech recognition/identification (1970s)

  • Visualize MFCCs Humpback Whale audio sample


  • Visualize MFCCs Delta on the same Humpback Whale audio sample.


Transform the waveform dataset to have MFCCs images and their corresponding labels as integer IDs.

  • extracted 13 (MFCCs) on 10 segments over each 30 seconds audio files. "e.g. every 3 seconds"
  • generated more data to train on.


Class Balance:


Built and trained a Convolutional Neural Network:




Evaluate test set performance

running the model on the test set and check performance.


Display a confusion matrix

A confusion matrix is helpful to see how well the model did on each of the marine animals in the test set.


Run predictions on a new audio source

Finally, verifying the models' prediction output using an input audio outside of dataset.

CNN

The CNN model clearly recognized two sources in the audio file as "False Killer Whale and Dolphin."


Next steps

  • Gather more data when it becomes available.

  • Train models with (mel) spectrograms and compare the results.

  • Implement Tensorflow audio data pipeline.

  • Add more classes of marine animals to recognize.

  • Introduce human sounds into the dataset. e.g. vessel and boats.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published