CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning

This repository contains the official PyTorch implementation and pre-trained models for the paper "CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning", presented at the 2025 International Symposium on Circuits and Systems (ISCAS).

Paper Abstract:

This paper presents CleanUMamba, a time-domain neural network architecture designed for real-time causal audio denoising directly applied to raw waveforms. CleanUMamba leverages a U-Net encoder-decoder structure, incorporating the Mamba state-space model in the bottleneck layer. By replacing conventional self-attention and LSTM mechanisms with Mamba, our architecture offers superior denoising performance while maintaining a constant memory footprint, enabling streaming operation. To enhance efficiency, we applied structured channel pruning, achieving an 8X reduction in model size without compromising audio quality. Our model demonstrates strong results in the Interspeech 2020 Deep Noise Suppression challenge. Specifically, CleanUMamba achieves a PESQ score of 2.42 and STOI of 95.1% with only 442K parameters and 468M MACs, matching or outperforming larger models in real-time performance.

This codebase is built upon the CleanUNet repository and was developed as part of a master thesis project from 2023 till 2024.

Architecture

CleanUMamba utilizes a U-Net like encoder-decoder architecture operating on raw waveforms. The bottleneck replaces conventional layers with Mamba blocks for efficient sequence modeling.

Results

Multi-Head Attention (MHA), LSTM, Mamba, and Mamba S4 Comparison

Comparison of models with same size unit and training conditions on the DNS no-reverb test set.

Model	Params	PESQ (WB)	PESQ (NB)	STOI (%)
Mamba	442K	2.42	2.95	95.1
Mamba S4	451K	2.36	2.90	94.9
MHA	443K	2.37	2.92	94.9
LSTM	443K	2.32	2.88	94.7

Full training runs at Mamba vs MHA vs LSTM vs MambaS4 For Audio Denoising.

Pruning experiments

Select pruning runs at CleanUMamba pruning.

Full size model and finetuned pruned models

Model	Params	Look ahead	PESQ (WB)	PESQ (NB)	STOI (%)
CleanUMamba E8 full	41.37M	48ms	3.067	3.507	97.4
CleanUMamba E8 high	41.37M	48ms	3.017	3.471	97.2
Pruned CleanUMamba E8	14.90M	48ms	2.910	3.397	97.0
	6.00M		2.888	3.359	96.9
	3.22M		2.746	3.253	96.4
	1.94M		2.707	3.222	96.3
	0.99M		2.558	3.102	95.8
	492K		2.426	2.980	95.3
	201K		2.189	2.745	94.2
CleanUMamba E6 high	27.21M	12ms	2.935	3.400	97.1
Pruned CleanUMamba E6	13.50M	12ms	2.855	3.346	96.9
	7.31M		2.799	3.291	96.8
	1.95M		2.602	3.128	96.1
	1.00M		2.431	2.967	95.5
	457K		2.237	2.796	94.8
	207K		2.096	2.660	94.0

Full runs available at CleanUMamba training.

File Structure

.
├── configs # Model, training, pruning, and finetuning configurations
│   ├── config.json # General config (paths, batch size, etc.)
│   └── exp
│       ├── models      # Model architecture JSON configs (e.g., CleanUMamba E6/E8)
│       ├── pruning     # Pruning schedule JSON configs
│       └── finetune    # Finetuning schedule JSON configs
├── checkpoints # Pre-trained model checkpoints
│   ├── models      # Checkpoints for fully trained models (before pruning)
│   ├── pruned      # Checkpoints for pruned and finetuned models
│   └── experiments # Checkpoints for smaller scale experiments with different bottleneck layers
├── src # Source code
│   ├── network     # PyTorch network (CleanUMamba)
│   ├── pruning     # Pruning logic (pruning framework, importance calculation)
│   ├── training    # Training, pruning, and finetuning scripts
│   ├── util        # Utilities (datasets, losses, evaluation metrics)
│   └── examples    # Example scripts (checkpoint loading, denoising audio, streaming)
├── LICENSE # Project License
└── README.md # This file

Setup

The following instructions are for Ubuntu 22 and assume that (mini)conda(3) is installed. However, appart from the streaming demo most requirements should be platform agnosic.

Clone the repository:

git clone https://github.com/lab-emi/CleanUMamba.git
cd CleanUMamba

Create the Conda Environment with Python, PyTorch, CUDA, Mamba and all the other required packages.
```
conda env create -f environment.yml
```
Activate the Environment
```
conda activate CleanUMambaEnv
```

The environment is now fully set up and ready for training and inference.

Dataset Preparation (DNS 2020)

This project uses the Interspeech 2020 Deep Noise Suppression (DNS) challenge dataset.

Follow the instructions derived from the CleanUNet repository to generate the training data:

Download the dataset and pre-processing codes from the MS-SNSD GitHub repository.

Assume the dataset is stored under ./dns relative to the MS-SNSD repository root.
Before generating clean-noisy data pairs, modify the following parameters in their noisyspeech_synthesizer.cfg file:
total_hours: 500
snr_lower: -5
snr_upper: 25
total_snrlevels: 31
Also update paths in noisyspeech_synthesizer.cfg to use Linux-style paths (even if on Windows for compatibility with typical Python scripts):
noise_dir: ./datasets/noise
speech_dir: ./datasets/clean
noisy_destination: ./training_set/noisy
clean_destination: ./training_set/clean
noise_destination: ./training_set/noise
log_dir: ./logs
unit_tests_log_dir: ./unittests_logs
For conciseness and to comply with our data loading codes, modify file names (lines 198-201) in their noisyspeech_synthesizer_singleprocess.py to:
noisyfilename = 'fileid_' + str(file_num) + '.wav'
cleanfilename = 'fileid_' + str(file_num) + '.wav'
noisefilename = 'fileid_' + str(file_num) + '.wav'
Generate the training data by running (from within the MS-SNSD directory):
python noisyspeech_synthesizer_singleprocess.py
It is also recommended to rename files in the test set for conciseness (run from within ./dns/datasets/test_set/synthetic/no_reverb/noisy/):
cd ./dns/datasets/test_set/synthetic/no_reverb/noisy/
for NAME in $(ls ./); do arr=(${NAME//fileid_/ }); mv ${NAME} noisy_fileid_${arr[1]}; done

Training

To train a new model, or continue training an existing one make sure to prepare the dataset you want to train on like in the example above. Then update the trainset_config root directory in the configs/config.json file. This file contains the training configuration used. Furthermore, the json files nested in configs/exp/ contain model specific and pruning configurations.

Finally, you can start a training run with train.py providing a model configuration with -e, for instance:

PYTHONPATH=./ python src/training/train.py -e configs/exp/models/DNS-CleanUMamba-3N-E8.json

This will initialize a model, load an existing checkpoint if available from a previous run and start/continue training. The model training progression will also be logged in wandb and periodically checkpoints are saved as configured in the config.json.

Multi-gpu training

For multi-gpu training update the "batch_size_total", "batch_size_per_gpu" and "n_gpus" under the "optimization" in the config.json. Then optionally set the gpu's that you want to use and call src/training/train_distributed.py just like train.py:

export CUDA_VISIBLE_DEVICES=1,2
PYTHONPATH=./ python src/training/train_distributed.py -e configs/exp/models/DNS-CleanUMamba-3N-E8.json

Pruning

For pruning a CleanUMamba model use the pruning.py file. With -t you can point to the config for the model to be pruned and with -e you can specify the config for how to prune.

PYTHONPATH=./ python src/training/pruning.py -t configs/exp/models/DNS-CleanUMamba-3N-E8.json -e configs/exp/pruning/DNS-CleanUMamba-Pruning12.json

Note that if you want to prune one of the checkpoints in ./checkpoints/.. you have to place the checkpoint in .exp/[exp_path name in model config]/checkpoint/ and give the checkpoint a number as name.

Examples

Denoising audio bulk

For loading a pretrained model and denoising audio samples from a folder call:

PYTHONPATH=./ python src/examples/denoise.py -c checkpoints/pruned/CleanUMamba-3N-E8_pruned-5M.pkl -i folder/with/files/to/denoise/ -o output/folder/

Running CleanUMamba in streaming mode

For running realtime denoising:

PYTHONPATH=./ python src/examples/streaming_demo.py

This will open a matplotlib plot with the spectrum of the raw input audio and the denoised output in realtime. Though the current plotting isn't that efficient so it's a bit laggy. Spectrum is for visualization only, the model acts on the raw waveform.

References

The code structure and training are adapted from CleanUNet (MIT license) and the DEMUCS (MIT License) denoiser. The MambaS4 implementation reproduces the Mamba S4 block described in Mamba combining the Mamba (Apache-2.0) block and the relevant functions for S4 (Apache-2.0). Evaluation metrics for CSIG, CBAK and COVL use the python implementation from CMGAN (MIT License).

Pruning had some inspiration from Torch-Pruning (MIT License).

Citation

@inproceedings{groot2025cleanumamba,
  title={CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning},
  author={Sjoerd Groot, Qinyu Chen, Jan C. van Gemert, Chang Gao},
  booktitle={ISCAS 2025},
  year={2025},
  organization={IEEE}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning

Architecture

Results

Multi-Head Attention (MHA), LSTM, Mamba, and Mamba S4 Comparison

Pruning experiments

Full size model and finetuned pruned models

File Structure

Setup

Dataset Preparation (DNS 2020)

Training

Multi-gpu training

Pruning

Examples

Denoising audio bulk

Running CleanUMamba in streaming mode

References

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
checkpoints		checkpoints
configs		configs
docs/images		docs/images
src		src
README.md		README.md
environment.yml		environment.yml

lab-emi/CleanUMamba

Folders and files

Latest commit

History

Repository files navigation

CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning

Architecture

Results

Multi-Head Attention (MHA), LSTM, Mamba, and Mamba S4 Comparison

Pruning experiments

Full size model and finetuned pruned models

File Structure

Setup

Dataset Preparation (DNS 2020)

Training

Multi-gpu training

Pruning

Examples

Denoising audio bulk

Running CleanUMamba in streaming mode

References

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages