TonSpeech

Audio processing project

Installation

Prerequisite

python 3.7 or higher
Ubuntu 18 or higher

(Optional) Create your virtual enviroment via script

python3 -m venv [your_venv_name]

then activate it

source [your_venv_name]/bin/activate

Install requirements
In your terminal, run

git clone https://github.com/tungedng2710/TonSpeech.git
cd TonSpeech 
pip install -r requirements.txt

Evaluation on Noise suppression task

PESQ and STOI are currently supported metrics to evaluate the quality of denoising model. To evaluate test sample, run

bash run_eval.sh

Options:

--trimmed_duration: (optional for long audio) length of sample batch (seconds), default value is -1 (no trimming)
--down_sample: 1 (True) or 0 (False)
--metric: pesq or stoi. For pesq, please make sure that the sample rate of the given audio file is 8k (for narrow band) or 16k (for wide band, wide band is default option).
--clean: path to the clean audio file (target). If you need to evaluate on Voicebank-DEMAND dataset, it will be the path to clean audio folder.
--denoised: path to the audio after being denoised (result of model).
--eval_on_dataset: 0: run with single audio (denoised-clean); 1: compare a clean audio with a folder of different denoised audio; 2: eval on Voicebank-DEMAND.

Speech Enhancement

Conformer-based MetricGAN (CMGAN)

Official implementation of CMGAN at this GitHub link

In the terminal, run the script below

python se_cmgan.py --noisy [path/to/noisy/audio/file]

Currently, GPU inference is highly recommended for CMGAN.

MetricGAN+

Official implementation of MetricGAN+ at this GitHub link

In the terminal, run the script below

python se_metricganplus.py --noisy [path/to/noisy/audio/or/folder]

The given path will be automatically check whether it is a file or folder.

Post-processing PCS

If you need to post-process the output with Perceptual Contrast Stretching (PCS), run

python se_pcs.py --noisy [folder/of/metricganplus_results]

The output of all methods above will be saved at default path. If you want to modify them, add the argument below to your script:

--saved_folder [path/to/saved/folder]

ONNX model

TonSpeech supports exporting MetricGAN+ to ONNX model by modifying short-time Fourier transform operator (Unfortunately, torch.stft and torch.istft are not supported in current opset version). To export onnx model, just run

python onnx.py

Dummy input is located in data folder. Currently, you shouldn't use another dummy input because it is related to the fixed signal length of Fourier transform operator. It will be fixed soon.

Alternative ways to test audio quality

Currently, TonSpeech only supports to evaluate with PESQ and STOI metric, while other methods are developing. There are some alternative ways that you can try:

ViSQOL (Virtual Speech Quality Objective Listener): Being developed by Google. Similar to PESQ, ViSQOL evaluate the quality of audio by comparing between reference (clean) and degraded (denoised) audio and then map the result to MOS score.
CSIG,CBAK,COVL: Popular metrics of Speech Enhancement task on paperswithcode.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
data		data
pretrained_models		pretrained_models
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
onnx.py		onnx.py
requirements.txt		requirements.txt
run_eval.sh		run_eval.sh
se_cmgan.py		se_cmgan.py
se_metricganplus.py		se_metricganplus.py
se_pcs.py		se_pcs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TonSpeech

Installation

Evaluation on Noise suppression task

Speech Enhancement

Conformer-based MetricGAN (CMGAN)

MetricGAN+

Post-processing PCS

ONNX model

Alternative ways to test audio quality

About

Releases

Packages

Languages

License

tungedng2710/TonSpeech

Folders and files

Latest commit

History

Repository files navigation

TonSpeech

Installation

Evaluation on Noise suppression task

Speech Enhancement

Conformer-based MetricGAN (CMGAN)

MetricGAN+

Post-processing PCS

ONNX model

Alternative ways to test audio quality

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages