Emotional Voice Conversion with Semi-Supervised Generative Modeling

Emotional Voice Conversion (EVC) is a task that aims to trans- form the emotional state of speech from one to another while preserving linguistic information and speaker identity. How- ever, many studies are limited by the requirement for parallel speech data between different emotional patterns, which is not widely available in real-life applications. Moreover, the pro- cess of annotating emotional data is highly time-consuming and labor-intensive. This paper proposes a novel semi-supervised generative model called SGEVC for emotional voice conver- sion. Our proposed method utilizes variation autoencoder (VAE) frameworks to disentangle linguistic, speaker identity, and emotion spaces. Additionally, we integrate Text-to-Speech (TTS) into the EVC framework to guide the linguistic content and design the SGEVC framework in an end-to-end manner. This paper demonstrates that using as little as 1% supervised data is sufficient to achieve emotional voice conversion. Our experimental results show that our proposed model achieves the state-of-the-art (SOTA) performance.

Visit our demo for audio samples.

We also provide the SGEVC-1 model.

Diagram of the proposed approach, showing the trainng procedure (left) and inference procedure(right).

Pre-requisites

Python >= 3.6
Clone this repository
Install python requirements. Please refer requirements.txt
1. You need to install sox first: apt-get install sox
Download pretrained SGEVC-1 modelSGEVC-1 model
Download datasets ESD dataset

Inference

python inference_VC.py

Training

python train.py -c configs/ESD_base.json -m ESD_chinese_semi_3_gamma_1.0_alpha_0.1

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.devcontainer		.devcontainer
.vscode		.vscode
PPG		PPG
SGEVC1		SGEVC1
SGEVC10		SGEVC10
Source		Source
StarGAN		StarGAN
Target		Target
configs		configs
data/DATA		data/DATA
filelists		filelists
listening_test		listening_test
logs		logs
monotonic_align		monotonic_align
resources		resources
text		text
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
addEmotionAndspeakerId.py		addEmotionAndspeakerId.py
attentions.py		attentions.py
commons.py		commons.py
data_utils.py		data_utils.py
docker-compose.yaml		docker-compose.yaml
filelists_en.py		filelists_en.py
index.html		index.html
inference_VC.py		inference_VC.py
inference_VC_en.py		inference_VC_en.py
layers.py		layers.py
losses.py		losses.py
mel_processing.py		mel_processing.py
models.py		models.py
modules.py		modules.py
phonemize.py		phonemize.py
preprocess_dataset.py		preprocess_dataset.py
requirements.txt		requirements.txt
temp.py		temp.py
temp2.py		temp2.py
tmp		tmp
train.py		train.py
transcribe.py		transcribe.py
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Emotional Voice Conversion with Semi-Supervised Generative Modeling

Pre-requisites

Inference

Training

Repository

About

Releases

Packages

Languages

wfloat/sgevc

Folders and files

Latest commit

History

Repository files navigation

Emotional Voice Conversion with Semi-Supervised Generative Modeling

Pre-requisites

Inference

Training

Repository

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages