Training HiFiGan -- avg loss not decreasing #975

skol101 · 2021-11-29T08:19:15Z

skol101
Nov 29, 2021

Describe the bug
Running for 240k steps no improvement is avg loss when training HiFiGan.

To Reproduce
Steps to reproduce the behavior:

Run the following command : CUDA_VISIBLE_DEVICES=0,1 python ../../TTS/TTS/bin/distribute.py --script train_hifigan.py
Training:

TRAINING (2021-11-29 10:06:07)

Expected behavior
Improvement in loss during training.

Environment (please complete the following information):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
PyTorch or TensorFlow version (use command below): pytorch 1.10.0
Python version: 3.8.11
CUDA/cuDNN version: py3.8_cuda11.3_cudnn8.2.0_0
GPU model and memory: 2xRTX 3090

Additional context
Add any other context about the problem here.

import os

from TTS.trainer import Trainer, TrainingArgs
from TTS.utils.audio import AudioProcessor
from TTS.vocoder.configs import HifiganConfig
from TTS.vocoder.datasets.preprocess import load_wav_data
from TTS.vocoder.models.gan import GAN

output_path = os.path.dirname(os.path.abspath(__file__))

config = HifiganConfig(
    batch_size=64,
    eval_batch_size=16,
    num_loader_workers=4,
    num_eval_loader_workers=4,
    run_eval=True,
    test_delay_epochs=5,
    epochs=1000,
    seq_len=8192,
    pad_short=2000,
    use_noise_augment=True,
    eval_split_size=10,
    print_step=25,
    print_eval=False,
    mixed_precision=False,
    lr_gen=1e-4,
    lr_disc=1e-4,
    data_path=os.path.join(output_path, "../datasets/vctk_all_22"),
    output_path=output_path,
)

# init audio processor
ap = AudioProcessor(**config.audio.to_dict())

# load training samples
eval_samples, train_samples = load_wav_data(config.data_path, config.eval_split_size)


# init model
model = GAN(config)

# init the trainer and 🚀
trainer = Trainer(
    TrainingArgs(),
    config,
    output_path,
    model=model,
    train_samples=train_samples,
    eval_samples=eval_samples,
    training_assets={"audio_processor": ap},
)
trainer.fit()

Script also generates config.json in the dir where train_hifigan.py resides as well as in the generated run dir.

{
    "model": "hifigan",
    "run_name": "coqui_tts",
    "run_description": "",
    "epochs": 1000,
    "batch_size": 64,
    "eval_batch_size": 16,
    "mixed_precision": false,
    "scheduler_after_epoch": false,
    "run_eval": true,
    "test_delay_epochs": 5,
    "print_eval": false,
    "dashboard_logger": "tensorboard",
    "print_step": 25,
    "plot_step": 100,
    "model_param_stats": false,
    "project_name": null,
    "log_model_step": null,
    "wandb_entity": null,
    "save_step": 10000,
    "checkpoint": true,
    "keep_all_best": false,
    "keep_after": 10000,
    "num_loader_workers": 4,
    "num_eval_loader_workers": 4,
    "use_noise_augment": true,
    "output_path": "/home/sk/work/hifigan",
    "distributed_backend": "nccl",
    "distributed_url": "tcp://localhost:54321",
    "audio": {
        "fft_size": 1024,
        "win_length": 1024,
        "hop_length": 256,
        "frame_shift_ms": null,
        "frame_length_ms": null,
        "stft_pad_mode": "reflect",
        "sample_rate": 22050,
        "resample": false,
        "preemphasis": 0.0,
        "ref_level_db": 20,
        "do_sound_norm": false,
        "log_func": "np.log10",
        "do_trim_silence": true,
        "trim_db": 45,
        "power": 1.5,
        "griffin_lim_iters": 60,
        "num_mels": 80,
        "mel_fmin": 0.0,
        "mel_fmax": null,
        "spec_gain": 20,
        "do_amp_to_db_linear": true,
        "do_amp_to_db_mel": true,
        "signal_norm": true,
        "min_level_db": -100,
        "symmetric_norm": true,
        "max_norm": 4.0,
        "clip_norm": true,
        "stats_path": null
    },
    "eval_split_size": 10,
    "data_path": "/home/sk/work/hifigan/../datasets/vctk_all_wavs",
    "feature_path": null,
    "seq_len": 8192,
    "pad_short": 2000,
    "conv_pad": 0,
    "use_cache": false,
    "wd": 1e-06,
    "optimizer": "AdamW",
    "optimizer_params": {
        "betas": [
            0.8,
            0.99
        ],
        "weight_decay": 0.0
    },
    "use_stft_loss": false,
    "use_subband_stft_loss": false,
    "use_mse_gan_loss": true,
    "use_hinge_gan_loss": false,
    "use_feat_match_loss": true,
    "use_l1_spec_loss": true,
    "stft_loss_weight": 0,
    "subband_stft_loss_weight": 0,
    "mse_G_loss_weight": 1,
    "hinge_G_loss_weight": 0,
    "feat_match_loss_weight": 108,
    "l1_spec_loss_weight": 45,
    "stft_loss_params": {
        "n_ffts": [
            1024,
            2048,
            512
        ],
        "hop_lengths": [
            120,
            240,
            50
        ],
        "win_lengths": [
            600,
            1200,
            240
        ]
    },
    "l1_spec_loss_params": {
        "use_mel": true,
        "sample_rate": 22050,
        "n_fft": 1024,
        "hop_length": 256,
        "win_length": 1024,
        "n_mels": 80,
        "mel_fmin": 0.0,
        "mel_fmax": null
    },
    "target_loss": "loss_0",
    "grad_clip": [
        5,
        5
    ],
    "lr_gen": 0.0001,
    "lr_disc": 0.0001,
    "lr_scheduler_gen": "ExponentialLR",
    "lr_scheduler_gen_params": {
        "gamma": 0.999,
        "last_epoch": -1
    },
    "lr_scheduler_disc": "ExponentialLR",
    "lr_scheduler_disc_params": {
        "gamma": 0.999,
        "last_epoch": -1
    },
    "use_pqmf": false,
    "diff_samples_for_G_and_D": false,
    "discriminator_model": "hifigan_discriminator",
    "generator_model": "hifigan_generator",
    "generator_model_params": {
        "upsample_factors": [
            8,
            8,
            2,
            2
        ],
        "upsample_kernel_sizes": [
            16,
            16,
            4,
            4
        ],
        "upsample_initial_channel": 512,
        "resblock_kernel_sizes": [
            3,
            7,
            11
        ],
        "resblock_dilation_sizes": [
            [
                1,
                3,
                5
            ],
            [
                1,
                3,
                5
            ],
            [
                1,
                3,
                5
            ]
        ],
        "resblock_type": "1"
    },
    "lr": 0.0001
}```

skol101 · 2021-11-29T08:26:51Z

skol101
Nov 29, 2021
Author

Also, when synthesizing speech all I can hear is noise (similar to pretrained hifigan and fastpitch).

`CUDA_VISIBLE_DEVICES=0 tts --use_cuda true --text "This was the time when I was young and happy and what not used to go here and there, then every where I see my eyes never despise the horizon." --config_path fast_pitch_vctk-November-28-2021_02+38PM-0000000/config.json --model_path fast_pitch_vctk-November-28-2021_02+38PM-0000000/best_model.pth.tar --vocoder_path ../hifigan/coqui_tts-November-28-2021_01+58PM-0000000/checkpoint_240000.pth.tar --vocoder_config ../hifigan/config.json --out_path vctk_fp_hfgan.wav --speaker_idx VCTK_p226

Using model: fast_pitch
Init speaker_embedding layer.
Vocoder Model: hifigan
Generator Model: hifigan_generator
Discriminator Model: hifigan_discriminator
`

0 replies

skol101 · 2021-11-29T09:53:35Z

skol101
Nov 29, 2021
Author

I guess my problem was that I didn't put all wavs from the vctk dataset into one folder (as in LJSpeech dataset). We'll see how it proceeds. So far, avg loss is going down, but seems to be converging to 16.0

0 replies

skol101 · 2021-11-29T12:52:13Z

skol101
Nov 29, 2021
Author

Apparently, I should have launched vocoder training through train_vocoder.py --config_path config.json instead of distributed training of train_hifigan.py.

@erogol it's really not that apparent the different of two training options, considering the latter leads nowhere.
Also, is there a way to run train_vocoder.py through distributed training?

5 replies

erogol Nov 29, 2021
Maintainer

they both should work equally. If they don't there is an issue. Please file an issue. I cannot test myself since I don't have multiple GPUs currently.

skol101 Nov 29, 2021
Author

Done #976
I think it's possible to test distribute.py on a single GPU.

erogol Nov 30, 2021
Maintainer

If you know the way, go for it ;)

skol101 Dec 2, 2021
Author

Is this ok when loss_0 is staying around 16 and doesn't go anylower?

skol101 Dec 3, 2021
Author

HiFiGan is untrainable on VCTK dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training HiFiGan -- avg loss not decreasing #975

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 5 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Training HiFiGan -- avg loss not decreasing #975

skol101 Nov 29, 2021

Replies: 3 comments · 5 replies

skol101 Nov 29, 2021 Author

skol101 Nov 29, 2021 Author

skol101 Nov 29, 2021 Author

erogol Nov 29, 2021 Maintainer

skol101 Nov 29, 2021 Author

erogol Nov 30, 2021 Maintainer

skol101 Dec 2, 2021 Author

skol101 Dec 3, 2021 Author

skol101
Nov 29, 2021

Replies: 3 comments 5 replies

skol101
Nov 29, 2021
Author

skol101
Nov 29, 2021
Author

skol101
Nov 29, 2021
Author

erogol Nov 29, 2021
Maintainer

skol101 Nov 29, 2021
Author

erogol Nov 30, 2021
Maintainer

skol101 Dec 2, 2021
Author

skol101 Dec 3, 2021
Author