DDTSE: DISCRIMINATIVE DIFFUSION MODEL FOR TARGET SPEECH EXTRACTION

🏠 Introduction

We introduce DDTSE: Discriminative Diffusion Model for Target Speech Extraction and Speech Enhancement. We apply the same forward process as diffusion models and utilize the reconstruction loss similar to discriminative methods. Furthermore, we devise a two-stage training strategy to emulate the inference process during model training. DDTSE not only works as a standalone system, but also can further improve the performance of discriminative models without additional retraining. Experimental results demonstrate that DDTSE not only achieves higher perceptual quality but also accelerates the inference process by 3 times compared to the conventional diffusion model.

Please do not hesitate to tell us if you have any feedback!

📋 Contents

💬 Environment Setup

Create a new virtual environment with Python 3.8

Install the package dependencies via pip install -r requirements.txt.

🔍 Data preparation

Please make sure that you have downloaded Libri2Mix. If not, please refer to https://github.com/JorisCos/LibriMix and create your own Libri2Mix dataset.

📦 Training

Training is done by executing train.py. bash python train.py --base_dir <your_base_dir>

To run DDTSE for the first stage, please run bash training_command/stage1.sh

To run DDTSE for the second stage, please run bash training_command/stage2.sh

🤖 Inference:

To run DDTSE inference of multi-speaker noisy scenario for the first stage, please run

bash inference_command/stage1.sh

To run DDTSE inference of multi-speaker noisy scenario for the second stage, please run

bash inference_command/stage2.sh

⛺ Scoring

To evaluate the model performance, please run

python calc_metrics.py --gt_dir /directory_or_original_samples --enhanced_dir /directory_or_generated_samples

🔗 Citation

To cite this repository

@article{zhang2024ddtse,
  title={DDTSE: Discriminative Diffusion Model for Target Speech Extraction},
  author={Leying Zhang, Yao Qian, Linfeng Yu, Heming Wang, Hemin Yang, Shujie Liu, Long Zhou, Yanmin Qian},
  journal={IEEE Spoken Language Technology Workshop 2024},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__MACOSX		__MACOSX
inference_command		inference_command
multi_clean_spec_wav		multi_clean_spec_wav
multi_noisy_spec_wav		multi_noisy_spec_wav
sgmse		sgmse
single_enh_spec_wav		single_enh_spec_wav
training_command		training_command
LICENSE		LICENSE
README.md		README.md
calc_metrics.py		calc_metrics.py
enhancement.py		enhancement.py
index.html		index.html
load_table.js		load_table.js
norm_wav.py		norm_wav.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DDTSE: DISCRIMINATIVE DIFFUSION MODEL FOR TARGET SPEECH EXTRACTION

🏠 Introduction

📋 Contents

💬 Environment Setup

🔍 Data preparation

📦 Training

🤖 Inference:

⛺ Scoring

🔗 Citation

About

Releases

Packages

Languages

License

vivian556123/slt2024-ddtse

Folders and files

Latest commit

History

Repository files navigation

DDTSE: DISCRIMINATIVE DIFFUSION MODEL FOR TARGET SPEECH EXTRACTION

🏠 Introduction

📋 Contents

💬 Environment Setup

🔍 Data preparation

📦 Training

🤖 Inference:

⛺ Scoring

🔗 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages