As part of the Artificial Intelligence specialization at the ENSC, we participated in a data challenge provided by the University of Toulon in the ChallengeData website.
This challenge specifically aims to detect the presence of odontoceti clicks in underwater audio recordings in the Caribbean sea. The model will be evaluated on the ChallengeData website.
The dataset is composed of 23,168 audio files in WAV format, each of duration 200ms. The clicks are labeled with a binary variable: 1 if the file contains a click, 0 otherwise.
The objective of the challenge is to create a model that predicts the presence of odontoceti clicks in the test set with the highest accuracy.
The submissions are evaluated on the ROC AUC (area under the curve) metric.
The results must be submitted as a CSV file with 950 lines. Each line corresponds to a file of the test set and contains the prediction for this file. The prediction in percentage should be indicated and must not be rounded to binary labels.
We first used classical machine learning model, such as Linear Regression
or Random Forest
.
Then, we used the ReservoirPy
library, created and maintained by the Inria of Bordeaux. This uses the reservoir computing theory, part of the RNN's domain.
We also used a Convolutional Neural Network to classify the audio files. We used the Librosa library to extract the audio features.
Method | Result |
---|---|
Logistic Regression | 0.5981 |
Decision Tree | 0.6124 |
Bagged Tree | 0.6351 |
Random Forest | 0.6460 |
XGBoost | 0.6301 |
We didn't get the result we expected by using reservoir computing. In fact, the issue we got resided in the format of the results. As a matter of fact, ReservoirPy's results were not probabilities and could be above 1 or below 0. Therefore, we had to put a threshold and round these extreme values to either 1 or 0. This distorted our results and we obtained a score of only 0.48. However, this method has been implemented very late during the project, so we maybe have wrongfully used this library.
Final Score: 0.9566
We tried to use 2D convolutions because it is well-known that audio files can be represented with spectrograms.
However, results were really not convincing, in regard of the energetic consumption of our model training. Indeed, our project being related to the study of underwater life, we thought that having a very consuming and heavy model was totally inappropriate.
For information, our best result there was 0.86.
First of all, you may clone this repository on your computer.
git clone https://github.com/tristangclvs/spe_ia_clics_odontocetes.git
Then, download the .dataset
archive here and extract it in the main root of the cloned folder.
Creating a virtual environment
You may want to create a virtual environment for python. \
python -m venv NameOfYourEnvThen select your environment:
NameOfYourEnv/Scripts/activatesource NameOfYourEnv/bin/activate
To run the code in this repository, you will need to install the necessary dependencies:
pip install -r requirements.txt
The repository is structured as follows: