-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chore/small changes to wav2vec2 finetuning #54
Chore/small changes to wav2vec2 finetuning #54
Conversation
…characters_to_keep
Awesome with these results! Good job making a much nicer ASR framework give good results :D |
Alright, ready for review now. I'm currently optimising the training hyperparameters and I'll just leave the config change for another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR implements the following:
With this setup, we get quite close to reproducing the original setup. More precisely, we arrive at around 16 WER on Common Voice 9.0, which is not too far away from the SOTA 11 WER, and seems to mainly be due to overfitting. Here are the training plots:
Notably, I sample NST-da and CV9.0 equally, so we go through a lot of Common Voice samples in 120k steps! The sampling ratios should probably be changed, and the masking probabilities should probably be increased.
In any case, I think we're probably close enough to a reproduction to trust the framework and start training models on more data, in any case.
Oh, and I tried freezing the parameters, and also to increase the warmup time. Freezing the parameters didn't help at all, and the increased warmup time might have helped a little bit.