A PyTorch implementation of QuartzNet, an End-to-End ASR on LJSpeech dataset.
Set preferred configurations in config.py
and run ./run_docker.sh
(don't forget about correct volume
option)
wget https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2 # download data
tar xjf LJSpeech-1.1 # unzip data
python train.py
You will need to log in to your account in wandb.ai
for monitoring logs.
Every 10'th checkpoint after 40'th epoch will be saved in model{epoch}.pth
.
Set path_to_file
with .wav in config
, from_pretrained=True
, then
python inference.py
The result will be saved in path_to_file.txt