A comprehensive deep learning library for acoustic bioacoustics classification
Installation β’ Quick Start β’ Models β’ Documentation β’ Citation
- Overview
- Key Features
- Implemented Models
- Installation
- Quick Start
- System Requirements
- Dataset
- Model Performance
- API Reference
- Advanced Usage
- Contributing
- Citation
- License
This library provides state-of-the-art deep neural network implementations for mosquito audio classification and bioacoustics analysis. Built on TensorFlow/Keras, it offers a collection of modern architectures optimized for acoustic pattern recognition, supporting both research and production environments.
The models leverage advanced signal processing techniques and transformer-based architectures to achieve high accuracy in species classification from audio recordings. All implementations support GPU acceleration and include comprehensive evaluation tools.
Research Publication: View original publication on ScienceDirect
- π― Multiple Architectures: Nine state-of-the-art models including transformers, CNNs, RNNs, and hybrid approaches
- β‘ GPU Acceleration: Full CUDA support for high-performance training and inference
- π Audio Processing Pipeline: Built-in spectrogram generation and audio preprocessing
- π Evaluation Framework: Comprehensive metrics, confusion matrices, and visualization tools
- π Cross-Validation: Built-in k-fold cross-validation support
- π οΈ Flexible Configuration: Extensive hyperparameter customization via CLI
- π¦ Easy Integration: Simple API for quick prototyping and deployment
- π³ Docker Support: Containerized execution for reproducibility
The library includes the following architectures, each optimized for acoustic classification tasks:
| Model | Type | Description | Key Features |
|---|---|---|---|
| AST | Transformer | Audio Spectrogram Transformer | Self-attention on spectrogram patches |
| Conformer | Hybrid | Convolution-augmented Transformer | Combines CNN and transformer blocks |
| ConvNetX | CNN | Modern ConvNet architecture | Efficient convolutional design |
| EfficientNet | CNN | Compound scaling CNN | Optimized depth, width, and resolution |
| LSTM | RNN | Long Short-Term Memory | Sequential temporal modeling |
| MLP | Feedforward | Multi-Layer Perceptron | Fully connected baseline |
| MobileNet | CNN | Lightweight mobile architecture | Depthwise separable convolutions |
| ResidualModel | CNN | Residual Neural Network | Skip connections for deep training |
| Wav2Vec2 | Transformer | Self-supervised audio encoder | Contrastive learning on raw waveforms |
- Python 3.8 or higher
- pip package manager
- (Optional) CUDA 11.0+ for GPU support
# Clone the repository
git clone https://github.com/kayua/Audio-Classification-Library
cd Mosquitoes-Classification-Models
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# Install the package
pip install .# Create virtual environment
python3 -m venv ~/Python3venv/mosquito-classification
source ~/Python3venv/mosquito-classification/bin/activate
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# Install package
pip install .# Build Docker image
docker build -t mosquito-classification .
# Run container
docker run --gpus all -it mosquito-classification## π Basic Usage
### Quick Start
```python
from Engine.Models.AST import AudioSpectrogramTransformer
from Engine.Models.Conformer import Conformer
from Engine.Models.Wav2Vec2 import AudioWav2Vec2
from Engine.Models.LSTM import AudioLSTM
from Engine.Models.MLP import DenseModel
from Engine.Models.ResidualModel import ResidualModel
# Initialize the main training pipeline
main = Main()
main.__start__()
# Define the models you want to train
available_models = [
AudioSpectrogramTransformer,
Conformer,
AudioWav2Vec2,
AudioLSTM,
DenseModel,
ResidualModel
]
# Execute training and evaluation
main.__exec__(available_models, "Results")# Initialize
main = Main()
main.__start__()
# Train only Wav2Vec2
available_models = [AudioWav2Vec2]
# Run training pipeline
main.__exec__(available_models, "Results")After training, the following artifacts will be generated in the Results/ directory:
- Metrics: Comparative performance metrics across models
- Confusion Matrices: Visual representation of classification results
- Loss Curves: Training and validation loss over epochs
- ROC Curves: Receiver Operating Characteristic curves
- Results.pdf: Comprehensive report with all visualizations
# Install dependencies
pip3 install --upgrade pip
pip3 install -r requirements.txtThe training pipeline uses command-line arguments for configuration. Customize your training by passing arguments when initializing:
# Example with custom arguments
python main.py --epochs 100 --batch_size 32 --learning_rate 0.001
### Training from Command Line
```bash
# Train AST model
python train.py \
--model_name ast \
--dataset_directory ./data \
--number_epochs 100 \
--batch_size 32 \
--number_splits 5 \
--output_directory ./results
# Evaluate trained model
python evaluate.py \
--model_path ./results/best_model.h5 \
--test_data ./data/test \
--output_directory ./evaluation| Component | Minimum | Recommended |
|---|---|---|
| CPU | Any modern x86_64 | Multi-core (Intel i5/Ryzen 5+) |
| RAM | 4 GB | 8 GB or more |
| Storage | 10 GB free space | SSD with 20 GB free |
| GPU | None (CPU only) | NVIDIA GPU with CUDA 11+ |
| Component | Version | Notes |
|---|---|---|
| OS | Ubuntu 22.04+ | Linux-based distributions |
| Python | 3.8.10+ | Virtual environment recommended |
| TensorFlow | 2.4.1+ | GPU version for acceleration |
| CUDA | 11.0+ | Optional, for GPU support |
| Docker | 27.2.1+ | Optional, for containerization |
Note: GPU support is optional but significantly improves training speed. CPU-only execution is supported but may be slower for large datasets.
All models share a common set of training parameters:
# General parameters
--dataset_directory # Path to dataset
--number_epochs # Training epochs (default: 100)
--batch_size # Batch size (default: 32)
--number_splits # K-fold splits (default: 5)
--loss # Loss function (categorical_crossentropy)
--sample_rate # Audio sample rate (Hz)
--number_classes # Number of output classes
--output_directory # Results directory--ast_projection_dimension # Embedding dimension (default: 256)
--ast_number_heads # Attention heads (default: 8)
--ast_number_blocks # Transformer blocks (default: 6)
--ast_patch_size # Spectrogram patch size
--ast_hop_length # STFT hop length
--ast_size_fft # FFT window size
--ast_dropout # Dropout rate
--ast_optimizer_function # Optimizer (adam, sgd, rmsprop)--conformer_embedding_dimension # Embedding dimension
--conformer_number_heads # Attention heads
--conformer_number_conformer_blocks # Conformer blocks
--conformer_kernel_size # Convolution kernel size
--conformer_dropout_rate # Dropout rate
--conformer_max_length # Max sequence length--lstm_list_lstm_cells # LSTM units per layer [128, 64, 32]
--lstm_dropout_rate # Dropout rate
--lstm_recurrent_activation # Recurrent activation (tanh, sigmoid)
--lstm_optimizer_function # Optimizer--residual_number_layers # Number of residual blocks
--residual_filters_per_block # Filters per block
--residual_dropout_rate # Dropout rate
--residual_size_pooling # Pooling layer size
--residual_size_convolutional_filters # Conv filter size--wav_to_vec_number_heads # Attention heads
--wav_to_vec_context_dimension # Context dimension
--wav_to_vec_projection_mlp_dimension # MLP projection dimension
--wav_to_vec_quantization_bits # Quantization bits
--wav_to_vec_list_filters_encoder # Encoder filter sizesFor complete API documentation, see the full parameter reference.
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2024 Mosquito Classification Models
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
