Speech to text & Denoiser using Wav2Vec pretrained model. Denoiser using Dual-signal Transformation LSTM Network. Fine-Tune Wav2Vec2 model
We follow the next steps:
- Data preparation
- Data preprocessing
- Modeling with Wav2Vec2 model
- Modeling after denoise
- Fine-tune Wav2Vec multi-language ASR
From Wec2Vec2_Denoise.ipynb:
Levenshtein metrics | Mean | Median |
---|---|---|
Word Error Rate | 0.26 | 0.20 |
Match Error Rate | 0.25 | 0.2 |
Word Information Lost | 0.40 | 0.36 |