You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The result of analysis-synthesis is a longer speech audio. Is there something wrong here? The code prepend a 0.5s silence before the analysis, but the resulting audio is NOT 0.5s longer than the source audio. For example, this file is 7.34s in duration, but the systhesized one is 8.04s.
There is an error message, which I'm not sure if it has something to the change in duration:
Error(s) in loading state_dict for StagedVQVAE:
Unexpected key(s) in state_dict: "mel_spectrogram.mel_stft.mel_scale.fb", "mel_spectrogram.mel_stft.spectrogram.window"
The text was updated successfully, but these errors were encountered:
The result of analysis-synthesis is a longer speech audio. Is there something wrong here? The code prepend a 0.5s silence before the analysis, but the resulting audio is NOT 0.5s longer than the source audio. For example, this file is 7.34s in duration, but the systhesized one is 8.04s.
There is an error message, which I'm not sure if it has something to the change in duration:
Error(s) in loading state_dict for StagedVQVAE:
Unexpected key(s) in state_dict: "mel_spectrogram.mel_stft.mel_scale.fb", "mel_spectrogram.mel_stft.spectrogram.window"
The text was updated successfully, but these errors were encountered: