Vietnamese Speech Synthesis with VITS text to speech model and TTS Coqui framework

This repository is dedicated to the customization and training of VITS (Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) for text-to-speech (TTS) applications using Vietnamese language data, utilizing the TTS Coqui framework. The repository contains the necessary code and resources to train VITS specifically for generating high-quality speech from Vietnamese text.

Pre-requisites

I highly recommend you to use conda virtual environment, with Python 3.11.5.

conda create -n vits python=3.10

In this repo, I use TTS framework version 0.17.5 for statibility.

pip install TTS==0.17.5

Dataset

Infore: a single speaker Vietnamese dataset with 14935 short audio clips of a female speaker.
After downloading and extracting dataset zip file, the directory tree should be like the image below with infore_16k_denoised folder contains all the .wav files and the metadata.tsv file contains all the wav filenames and their texts.
To load data samples, you have to define your formater function. I have defined my own formater function for this dataset in formater/customformater.py, you can customize your own for other datasets.

Training

Run the following command

python train_vits.py \
--output_path [output path for the training process] \
--data_path [path to the dataset directory] \
--restore_path [path to a pretrain model checkpoint] \ 
--epoch [number of epochs] \
--batch_size [batch size] \
--eval_batch_size [eval batch size] \ 
--continue_path [Path to a training folder to continue training] \
--sample_rate [sample rate of the audio data] \
--meta_filename [name of the metadata file] \

Inference

Please check the inference.py file

Demo

My trained model is published on this HuggingFace space. Because of the hardware condition, my model's voice is not very natural, I will try to improve the voice quality in the future :))).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
audios		audios
formater		formater
imgs		imgs
vn_characters		vn_characters
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
convert_quantize.py		convert_quantize.py
inference.py		inference.py
requirements.txt		requirements.txt
train_vits.py		train_vits.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vietnamese Speech Synthesis with VITS text to speech model and TTS Coqui framework

Pre-requisites

Dataset

Training

Inference

Demo

About

Releases

Packages

Languages

chnk58hoang/Vietnamese-Text-to-Speech

Folders and files

Latest commit

History

Repository files navigation

Vietnamese Speech Synthesis with VITS text to speech model and TTS Coqui framework

Pre-requisites

Dataset

Training

Inference

Demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages