Automatic Speech Recognition or ASR is one of the tasks in NLP which consists of transcribing the corresponding text onto an audio clip.
With the advent of deep learning, significant advances have been made in terms of speech recognition.
In this repository, we will implement the models that have allowed this advance in the Wolof language.
To achieve our goals in this project we will implement two models related to the paper:
-
[2015/12] Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
-
[2015/08] Listen, Attend and Spell
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
install the dependencies for this project by running the following commands in your terminal:
pip install -r requirements.txt
run the deepspeech2 model by running the following command in your terminal:
python deep-speech2/src/train.py --train_file="./input/Train.csv" \
--dev_file="./input/Test.csv" \
--audio_dir="./input/clips" \
--n_filters=256 \
--conv_stide=2 \
--conv_border='valid' \
--n_lstm_units=256 \
--n_dense_units=42 \
--epochs=10 \
--batch_size=32 \
--output_dir="./output" \
Here are some useful papers for automatique speech recognition :
-
[2012/11] Sequence Transduction with Recurrent Neural Networks
-
[2014/11] Voice Recognition Using MFCC Algorithm
-
[2014/12] Deep Speech: Scaling up end-to-end speech recognition
-
[2015/08] Listen, Attend and Spell
-
[2015/12] Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
-
[2017/06] Advances in Joint CTC-Attention based E2E ASR with a Deep CNN Encoder and RNN-LM
-
[2017/07] Attention Is All You Need
-
[2017/12] State-of-the-art Speech Recognition with Sequence-to-Sequence Models
-
[2017/12] An Analsis Of Incorporating An External Language Model Into A Sequence-to-Sequence Model
-
[2018/04] Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition
-
[2019/02] On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
-
[2019/04] SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
-
[2019/04] wav2vec: Unsupervised Pre-training for Speech Recognition
-
[2019/08] Korean Grapheme Unit-based Speech Recognition Using Attention-CTC Ensemble Network
-
[2019/08] Jasper: An End-to-End Convolutional Neural Acoustic Model
-
[2019/11] End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
-
[2019/12] SpecAugment on Large Scale Datasets
-
[2020/04] ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for ASR of Contact Centers
-
[2020/05] ContextNet: Improving Convolutional Neural Networks for ASR with Global Context
-
[2020/05] Conformer: Convolution-augmented Transformer for Speech Recognition
-
[2020/06] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations