arabic-speech-to-text

This repository contains the code for training the QuartzNet ASR model (NeMo) on the QCRI-AL Jazeera Corpus.

Data preprocessing

Download the QCRI-AL Jazeera Corpus. The script a_preprocess_xml.py extracts the text segments from the xml files. The script b_filter_ds.py removes segments that include latin script or numerals. The script c_split_ds.py creates a training set and a test set from the segments.

TODO

Upload pretrained model
...

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
data		data
pretrained		pretrained
utils		utils
.gitignore		.gitignore
README.md		README.md
a_preprocess_xml.py		a_preprocess_xml.py
b_filter_ds.py		b_filter_ds.py
c_split_ds.py		c_split_ds.py
d_prepare_ckpt.py		d_prepare_ckpt.py
infer.py		infer.py
test.ipynb		test.ipynb
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arabic-speech-to-text

Data preprocessing

TODO

About

Releases

Packages

Languages

nipponjo/arabic-speech-to-text

Folders and files

Latest commit

History

Repository files navigation

arabic-speech-to-text

Data preprocessing

TODO

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages