Skip to content

CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning [Official PyTorch implementation]

Notifications You must be signed in to change notification settings

lab-emi/CleanUMamba

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning

Paper License: MIT

This repository contains the official PyTorch implementation and pre-trained models for the paper "CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning", presented at the 2025 International Symposium on Circuits and Systems (ISCAS).

Paper Abstract:

This paper presents CleanUMamba, a time-domain neural network architecture designed for real-time causal audio denoising directly applied to raw waveforms. CleanUMamba leverages a U-Net encoder-decoder structure, incorporating the Mamba state-space model in the bottleneck layer. By replacing conventional self-attention and LSTM mechanisms with Mamba, our architecture offers superior denoising performance while maintaining a constant memory footprint, enabling streaming operation. To enhance efficiency, we applied structured channel pruning, achieving an 8X reduction in model size without compromising audio quality. Our model demonstrates strong results in the Interspeech 2020 Deep Noise Suppression challenge. Specifically, CleanUMamba achieves a PESQ score of 2.42 and STOI of 95.1% with only 442K parameters and 468M MACs, matching or outperforming larger models in real-time performance.

This codebase is built upon the CleanUNet repository and was developed as part of a master thesis project from 2023 till 2024.

Architecture

CleanUMamba utilizes a U-Net like encoder-decoder architecture operating on raw waveforms. The bottleneck replaces conventional layers with Mamba blocks for efficient sequence modeling.

CleanUMamba Architecture

Results

Multi-Head Attention (MHA), LSTM, Mamba, and Mamba S4 Comparison

Comparison of models with same size unit and training conditions on the DNS no-reverb test set.

Model Params PESQ (WB) PESQ (NB) STOI (%)
Mamba 442K 2.42 2.95 95.1
Mamba S4 451K 2.36 2.90 94.9
MHA 443K 2.37 2.92 94.9
LSTM 443K 2.32 2.88 94.7

Full training runs at Mamba vs MHA vs LSTM vs MambaS4 For Audio Denoising.

Pruning experiments

Select pruning runs at CleanUMamba pruning.

Pruning Performance vs. Model Size

Full size model and finetuned pruned models

Model Params Look ahead PESQ (WB) PESQ (NB) STOI (%)
CleanUMamba E8 full 41.37M 48ms 3.067 3.507 97.4
CleanUMamba E8 high 41.37M 48ms 3.017 3.471 97.2
Pruned CleanUMamba E8 14.90M 48ms 2.910 3.397 97.0
6.00M 2.888 3.359 96.9
3.22M 2.746 3.253 96.4
1.94M 2.707 3.222 96.3
0.99M 2.558 3.102 95.8
492K 2.426 2.980 95.3
201K 2.189 2.745 94.2
CleanUMamba E6 high 27.21M 12ms 2.935 3.400 97.1
Pruned CleanUMamba E6 13.50M 12ms 2.855 3.346 96.9
7.31M 2.799 3.291 96.8
1.95M 2.602 3.128 96.1
1.00M 2.431 2.967 95.5
457K 2.237 2.796 94.8
207K 2.096 2.660 94.0

Full runs available at CleanUMamba training.

File Structure

.
├── configs # Model, training, pruning, and finetuning configurations
│   ├── config.json # General config (paths, batch size, etc.)
│   └── exp
│       ├── models      # Model architecture JSON configs (e.g., CleanUMamba E6/E8)
│       ├── pruning     # Pruning schedule JSON configs
│       └── finetune    # Finetuning schedule JSON configs
├── checkpoints # Pre-trained model checkpoints
│   ├── models      # Checkpoints for fully trained models (before pruning)
│   ├── pruned      # Checkpoints for pruned and finetuned models
│   └── experiments # Checkpoints for smaller scale experiments with different bottleneck layers
├── src # Source code
│   ├── network     # PyTorch network (CleanUMamba)
│   ├── pruning     # Pruning logic (pruning framework, importance calculation)
│   ├── training    # Training, pruning, and finetuning scripts
│   ├── util        # Utilities (datasets, losses, evaluation metrics)
│   └── examples    # Example scripts (checkpoint loading, denoising audio, streaming)
├── LICENSE # Project License
└── README.md # This file

Setup

The following instructions are for Ubuntu 22 and assume that (mini)conda(3) is installed. However, appart from the streaming demo most requirements should be platform agnosic.

  1. Clone the repository:

    git clone https://github.com/lab-emi/CleanUMamba.git
    cd CleanUMamba
  2. Create the Conda Environment with Python, PyTorch, CUDA, Mamba and all the other required packages.

    conda env create -f environment.yml
  3. Activate the Environment

    conda activate CleanUMambaEnv

The environment is now fully set up and ready for training and inference.

Dataset Preparation (DNS 2020)

This project uses the Interspeech 2020 Deep Noise Suppression (DNS) challenge dataset.

Follow the instructions derived from the CleanUNet repository to generate the training data:

  1. Download the dataset and pre-processing codes from the MS-SNSD GitHub repository.
  2. Assume the dataset is stored under ./dns relative to the MS-SNSD repository root.
  3. Before generating clean-noisy data pairs, modify the following parameters in their noisyspeech_synthesizer.cfg file:
    total_hours: 500
    snr_lower: -5
    snr_upper: 25
    total_snrlevels: 31
  4. Also update paths in noisyspeech_synthesizer.cfg to use Linux-style paths (even if on Windows for compatibility with typical Python scripts):
    noise_dir: ./datasets/noise
    speech_dir: ./datasets/clean
    noisy_destination: ./training_set/noisy
    clean_destination: ./training_set/clean
    noise_destination: ./training_set/noise
    log_dir: ./logs
    unit_tests_log_dir: ./unittests_logs
  5. For conciseness and to comply with our data loading codes, modify file names (lines 198-201) in their noisyspeech_synthesizer_singleprocess.py to:
    noisyfilename = 'fileid_' + str(file_num) + '.wav'
    cleanfilename = 'fileid_' + str(file_num) + '.wav'
    noisefilename = 'fileid_' + str(file_num) + '.wav'
  6. Generate the training data by running (from within the MS-SNSD directory):
    python noisyspeech_synthesizer_singleprocess.py
  7. It is also recommended to rename files in the test set for conciseness (run from within ./dns/datasets/test_set/synthetic/no_reverb/noisy/):
    cd ./dns/datasets/test_set/synthetic/no_reverb/noisy/
    for NAME in $(ls ./); do arr=(${NAME//fileid_/ }); mv ${NAME} noisy_fileid_${arr[1]}; done

Training

To train a new model, or continue training an existing one make sure to prepare the dataset you want to train on like in the example above. Then update the trainset_config root directory in the configs/config.json file. This file contains the training configuration used. Furthermore, the json files nested in configs/exp/ contain model specific and pruning configurations.

Finally, you can start a training run with train.py providing a model configuration with -e, for instance:

PYTHONPATH=./ python src/training/train.py -e configs/exp/models/DNS-CleanUMamba-3N-E8.json

This will initialize a model, load an existing checkpoint if available from a previous run and start/continue training. The model training progression will also be logged in wandb and periodically checkpoints are saved as configured in the config.json.

Multi-gpu training

For multi-gpu training update the "batch_size_total", "batch_size_per_gpu" and "n_gpus" under the "optimization" in the config.json. Then optionally set the gpu's that you want to use and call src/training/train_distributed.py just like train.py:

export CUDA_VISIBLE_DEVICES=1,2
PYTHONPATH=./ python src/training/train_distributed.py -e configs/exp/models/DNS-CleanUMamba-3N-E8.json

Pruning

For pruning a CleanUMamba model use the pruning.py file. With -t you can point to the config for the model to be pruned and with -e you can specify the config for how to prune.

PYTHONPATH=./ python src/training/pruning.py -t configs/exp/models/DNS-CleanUMamba-3N-E8.json -e configs/exp/pruning/DNS-CleanUMamba-Pruning12.json

Note that if you want to prune one of the checkpoints in ./checkpoints/.. you have to place the checkpoint in .exp/[exp_path name in model config]/checkpoint/ and give the checkpoint a number as name.

Examples

Denoising audio bulk

For loading a pretrained model and denoising audio samples from a folder call:

PYTHONPATH=./ python src/examples/denoise.py -c checkpoints/pruned/CleanUMamba-3N-E8_pruned-5M.pkl -i folder/with/files/to/denoise/ -o output/folder/

Running CleanUMamba in streaming mode

For running realtime denoising:

PYTHONPATH=./ python src/examples/streaming_demo.py

This will open a matplotlib plot with the spectrum of the raw input audio and the denoised output in realtime. Though the current plotting isn't that efficient so it's a bit laggy. Spectrum is for visualization only, the model acts on the raw waveform.

References

The code structure and training are adapted from CleanUNet (MIT license) and the DEMUCS (MIT License) denoiser. The MambaS4 implementation reproduces the Mamba S4 block described in Mamba combining the Mamba (Apache-2.0) block and the relevant functions for S4 (Apache-2.0). Evaluation metrics for CSIG, CBAK and COVL use the python implementation from CMGAN (MIT License).

Pruning had some inspiration from Torch-Pruning (MIT License).

Citation

@inproceedings{groot2025cleanumamba,
  title={CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning},
  author={Sjoerd Groot, Qinyu Chen, Jan C. van Gemert, Chang Gao},
  booktitle={ISCAS 2025},
  year={2025},
  organization={IEEE}
}

About

CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning [Official PyTorch implementation]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages