Authors: Kip McCharen, Pavan Kumar Bondalapati, Siddarth Surapaneni
SYS 6016: Deep Learning
University of Virginia
School of Data Science
May 13, 2021
Adapted from SpeechBrain.
Significant work has been done on automatic speech recognition (ASR) techniques, notably including fairly successful implementations such as Siri and Alexa; however, ASR is a different task than automatic phone recognition (APR), which involves consistently identifying not words but the unique and irreducible sounds from which words may be formed. In recent years, phone detection has shown its prominence in unique tasks such as transcribing poorly documented language (e.g. Inuktitut), tracking children’s exposure to word diversity, and automating the detection of certain speech and voice disorders. In this paper, we articulate our process of minimizing the phone error rate (PER) by employing numerous deep learning models.
This folder contains the scripts to train a seq2seq RNN-based system using TIMIT, a speech dataset that is available from University of Pennsylvania's Lingusitic Data Consortium.
Run this command to train the model:
python train.py train/train.yaml
Release | hyperparams file | Val. PER | Test PER | Model link | GPUs |
---|---|---|---|---|---|
21-04-08 | train_with_wav2vec2.yaml | 7.11 | 8.04 | https://drive.google.com/drive/folders/1-IbO7hldwrRh4rwz9xAYzKeeMe57YIiq?usp=sharing | 1xV100 32GB |
-
!pip install speechbrain
-
!pip install transformers
-
!git clone https://github.com/kipmccharen/sys6016_DL_project
-
%cd ..
-
!gdown --id '1EIfBmwiT0RF3-U81-Qu5K4J27N31BdB5' ## --output /content/speechbrain_s2s_wav2vec_ckpt.zip
-
!unzip speechbrain_s2s_wav2vec_ckpt.zip
-
!rm speechbrain_s2s_wav2vec_ckpt.zip
-
%cd /content/data/trainwav2vec/save/
-
!gdown --id '1oZunuiwhMLfwtMeKAYJwr4DMjvE1LUIN' --output label_encoder.txt
-
%cd /content/
-
!python sys6016_DL_project/train_with_wav2vec2.py sys6016_DL_project/hparams/train_with_wav2vec2.yaml --data_folder /content/data/ --output_folder /content/data/trainwav2vec/ --new_json /content/sys6016_DL_project/data/new_train.json