This repository contains the official PyTorch implementation and pre-trained models for the paper "CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning", presented at the 2025 International Symposium on Circuits and Systems (ISCAS).
Paper Abstract:
This paper presents CleanUMamba, a time-domain neural network architecture designed for real-time causal audio denoising directly applied to raw waveforms. CleanUMamba leverages a U-Net encoder-decoder structure, incorporating the Mamba state-space model in the bottleneck layer. By replacing conventional self-attention and LSTM mechanisms with Mamba, our architecture offers superior denoising performance while maintaining a constant memory footprint, enabling streaming operation. To enhance efficiency, we applied structured channel pruning, achieving an 8X reduction in model size without compromising audio quality. Our model demonstrates strong results in the Interspeech 2020 Deep Noise Suppression challenge. Specifically, CleanUMamba achieves a PESQ score of 2.42 and STOI of 95.1% with only 442K parameters and 468M MACs, matching or outperforming larger models in real-time performance.
This codebase is built upon the CleanUNet repository and was developed as part of a master thesis project from 2023 till 2024.
CleanUMamba utilizes a U-Net like encoder-decoder architecture operating on raw waveforms. The bottleneck replaces conventional layers with Mamba blocks for efficient sequence modeling.
Comparison of models with same size unit and training conditions on the DNS no-reverb test set.
| Model | Params | PESQ (WB) | PESQ (NB) | STOI (%) |
|---|---|---|---|---|
| Mamba | 442K | 2.42 | 2.95 | 95.1 |
| Mamba S4 | 451K | 2.36 | 2.90 | 94.9 |
| MHA | 443K | 2.37 | 2.92 | 94.9 |
| LSTM | 443K | 2.32 | 2.88 | 94.7 |
Full training runs at Mamba vs MHA vs LSTM vs MambaS4 For Audio Denoising.
Select pruning runs at CleanUMamba pruning.
| Model | Params | Look ahead | PESQ (WB) | PESQ (NB) | STOI (%) |
|---|---|---|---|---|---|
| CleanUMamba E8 full | 41.37M | 48ms | 3.067 | 3.507 | 97.4 |
| CleanUMamba E8 high | 41.37M | 48ms | 3.017 | 3.471 | 97.2 |
| Pruned CleanUMamba E8 | 14.90M | 48ms | 2.910 | 3.397 | 97.0 |
| 6.00M | 2.888 | 3.359 | 96.9 | ||
| 3.22M | 2.746 | 3.253 | 96.4 | ||
| 1.94M | 2.707 | 3.222 | 96.3 | ||
| 0.99M | 2.558 | 3.102 | 95.8 | ||
| 492K | 2.426 | 2.980 | 95.3 | ||
| 201K | 2.189 | 2.745 | 94.2 | ||
| CleanUMamba E6 high | 27.21M | 12ms | 2.935 | 3.400 | 97.1 |
| Pruned CleanUMamba E6 | 13.50M | 12ms | 2.855 | 3.346 | 96.9 |
| 7.31M | 2.799 | 3.291 | 96.8 | ||
| 1.95M | 2.602 | 3.128 | 96.1 | ||
| 1.00M | 2.431 | 2.967 | 95.5 | ||
| 457K | 2.237 | 2.796 | 94.8 | ||
| 207K | 2.096 | 2.660 | 94.0 |
Full runs available at CleanUMamba training.
.
├── configs # Model, training, pruning, and finetuning configurations
│ ├── config.json # General config (paths, batch size, etc.)
│ └── exp
│ ├── models # Model architecture JSON configs (e.g., CleanUMamba E6/E8)
│ ├── pruning # Pruning schedule JSON configs
│ └── finetune # Finetuning schedule JSON configs
├── checkpoints # Pre-trained model checkpoints
│ ├── models # Checkpoints for fully trained models (before pruning)
│ ├── pruned # Checkpoints for pruned and finetuned models
│ └── experiments # Checkpoints for smaller scale experiments with different bottleneck layers
├── src # Source code
│ ├── network # PyTorch network (CleanUMamba)
│ ├── pruning # Pruning logic (pruning framework, importance calculation)
│ ├── training # Training, pruning, and finetuning scripts
│ ├── util # Utilities (datasets, losses, evaluation metrics)
│ └── examples # Example scripts (checkpoint loading, denoising audio, streaming)
├── LICENSE # Project License
└── README.md # This file
The following instructions are for Ubuntu 22 and assume that (mini)conda(3) is installed. However, appart from the streaming demo most requirements should be platform agnosic.
-
Clone the repository:
git clone https://github.com/lab-emi/CleanUMamba.git cd CleanUMamba -
Create the Conda Environment with Python, PyTorch, CUDA, Mamba and all the other required packages.
conda env create -f environment.yml
-
Activate the Environment
conda activate CleanUMambaEnv
The environment is now fully set up and ready for training and inference.
This project uses the Interspeech 2020 Deep Noise Suppression (DNS) challenge dataset.
Follow the instructions derived from the CleanUNet repository to generate the training data:
- Download the dataset and pre-processing codes from the MS-SNSD GitHub repository.
- Assume the dataset is stored under
./dnsrelative to the MS-SNSD repository root.- Before generating clean-noisy data pairs, modify the following parameters in their
noisyspeech_synthesizer.cfgfile:total_hours: 500 snr_lower: -5 snr_upper: 25 total_snrlevels: 31- Also update paths in
noisyspeech_synthesizer.cfgto use Linux-style paths (even if on Windows for compatibility with typical Python scripts):noise_dir: ./datasets/noise speech_dir: ./datasets/clean noisy_destination: ./training_set/noisy clean_destination: ./training_set/clean noise_destination: ./training_set/noise log_dir: ./logs unit_tests_log_dir: ./unittests_logs- For conciseness and to comply with our data loading codes, modify file names (lines 198-201) in their
noisyspeech_synthesizer_singleprocess.pyto:noisyfilename = 'fileid_' + str(file_num) + '.wav' cleanfilename = 'fileid_' + str(file_num) + '.wav' noisefilename = 'fileid_' + str(file_num) + '.wav'- Generate the training data by running (from within the MS-SNSD directory):
python noisyspeech_synthesizer_singleprocess.py- It is also recommended to rename files in the test set for conciseness (run from within
./dns/datasets/test_set/synthetic/no_reverb/noisy/):cd ./dns/datasets/test_set/synthetic/no_reverb/noisy/ for NAME in $(ls ./); do arr=(${NAME//fileid_/ }); mv ${NAME} noisy_fileid_${arr[1]}; done
To train a new model, or continue training an existing one make sure to prepare the dataset you want to train on like in the example above.
Then update the trainset_config root directory in the configs/config.json file. This file contains the training configuration used.
Furthermore, the json files nested in configs/exp/ contain model specific and pruning configurations.
Finally, you can start a training run with train.py providing a model configuration with -e, for instance:
PYTHONPATH=./ python src/training/train.py -e configs/exp/models/DNS-CleanUMamba-3N-E8.jsonThis will initialize a model, load an existing checkpoint if available from a previous run and start/continue training. The model training progression will also be logged in wandb and periodically checkpoints are saved as configured in the config.json.
For multi-gpu training update the "batch_size_total", "batch_size_per_gpu" and "n_gpus" under the "optimization" in the config.json.
Then optionally set the gpu's that you want to use and call src/training/train_distributed.py just like train.py:
export CUDA_VISIBLE_DEVICES=1,2
PYTHONPATH=./ python src/training/train_distributed.py -e configs/exp/models/DNS-CleanUMamba-3N-E8.jsonFor pruning a CleanUMamba model use the pruning.py file. With -t you can point to the config for the model to be pruned and with -e you can specify the config for how to prune.
PYTHONPATH=./ python src/training/pruning.py -t configs/exp/models/DNS-CleanUMamba-3N-E8.json -e configs/exp/pruning/DNS-CleanUMamba-Pruning12.jsonNote that if you want to prune one of the checkpoints in ./checkpoints/.. you have to place the checkpoint in .exp/[exp_path name in model config]/checkpoint/ and give the checkpoint a number as name.
For loading a pretrained model and denoising audio samples from a folder call:
PYTHONPATH=./ python src/examples/denoise.py -c checkpoints/pruned/CleanUMamba-3N-E8_pruned-5M.pkl -i folder/with/files/to/denoise/ -o output/folder/For running realtime denoising:
PYTHONPATH=./ python src/examples/streaming_demo.pyThis will open a matplotlib plot with the spectrum of the raw input audio and the denoised output in realtime. Though the current plotting isn't that efficient so it's a bit laggy. Spectrum is for visualization only, the model acts on the raw waveform.
The code structure and training are adapted from CleanUNet (MIT license) and the DEMUCS (MIT License) denoiser. The MambaS4 implementation reproduces the Mamba S4 block described in Mamba combining the Mamba (Apache-2.0) block and the relevant functions for S4 (Apache-2.0). Evaluation metrics for CSIG, CBAK and COVL use the python implementation from CMGAN (MIT License).
Pruning had some inspiration from Torch-Pruning (MIT License).
@inproceedings{groot2025cleanumamba,
title={CleanUMamba: A Compact Mamba Network for Speech Denoising using Channel Pruning},
author={Sjoerd Groot, Qinyu Chen, Jan C. van Gemert, Chang Gao},
booktitle={ISCAS 2025},
year={2025},
organization={IEEE}
}

