- DCTTS with F0
- python 3.7
- pytorch 1.3
- pysptk
- librosa, scipy, tqdm, tensorboardX
- LJ Speech 1.1
- KSS, Korean female single speaker speech dataset.
-
Download the above dataset and modify the path in config.py. And then run the below command.
python prepro.py
-
The baseline DCTTS needs to train 100k+ steps
python train.py <gpu_id>
-
After training the baseline, you can train F0-DCTTS. Change "f0_mode=True" and "pretrained_path=..." in config.py. And then run the below command one more.
python train.py <gpu_id>
-
You can synthesize some speech with f0. You can control using "f0_factor=..." in config.py.
python synthesize.py <gpu_id>
- This method is easy and simple, but verrrrrrrrrry naive approach.
- In this code, I removed SSRN. Thus, you need another mel2wav vocoder. I recommend WaveGlow or Parallel WaveGAN.