A tool integrated with TTS, ASR and VC.
Chinese, English, German, Japanese
cd monotonic_align
mkdir monotonic_align
python setup.py build_ext --inplace
Running the script the first time for a model will download that specific model; it stores (on windows) the model at C:\Users<username>. cache\whisper<model> . Once downloaded, the model doesn't need to be downloaded again.
whisper japanese.wav --language Japanese
whisper japanese.wav --language Japanese --task translate
C:\Users<user>.cache\whisper
- Open Control Panel.
- Select Hardware and Sound.
- Manage Audio Devices.
- Playback (headphones) tab.
- Select the device to be used.
- Click Properties.
- Select the Advanced tab.
conda config --set proxy_servers.http http://127.0.0.1:7890
conda config --set proxy_servers.https https://127.0.0.1:7890
conda config --remove-key proxy_servers.http
conda config --remove-key proxy_servers.https
1.UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at ..\aten\src\ATen\native\SpectralOps.cpp:639.)
solution: Set the last parameter of stft method as False
spec = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[str(y.device)], center=center, pad_mode='reflect', normalized=False, onesided=True,return_complex=False)
2.title1|test_data/3752-4944-0027.wav|test_data/p225_001.wav
First part is the converted result, the second part is the text source and the third part is the utterance tone.