ASR Dysarthria

Automatic speech recognition for people with dysarthria

This repo is under heavy research and development and so the README.md is outdated. Sorry!

I deployed a web page so you can use a model in your browser: https://asr-dysarthria-preliminary.pages.dev/

Training

Use this Jupyter Notebook wav2vec2-large-xls-r-300m-dysarthria-big-dataset.ipynb to train your own model

Installation

Prerequisities:

Python >= 3.10
Anaconda

Steps:

conda install --file requirements.txt

Inference

In directory cli-app:

Run model.safetensors: python -m run

Run ONNX: python -m onnx_run

Adjust these scripts if needed (by default they translate a file.wav file in cli-app folder)

Deploying

Download and convert trained model (model.safetensors file)

mkdir models
python scripts/convert_model.py --url https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset/resolve/main/model.safetensors --output models

Serve it

cd web-app
python -m http.server

Pretrained models

[Recommended] Loss: 0.0864, Wer: 0.182 https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria-big-dataset
Loss: 0.0615 Wer: 0.1764 https://huggingface.co/jmaczan/wav2vec2-large-xls-r-300m-dysarthria

Datasets

Uaspeech https://huggingface.co/datasets/Vinotha/uaspeechall
TORGO https://huggingface.co/datasets/jmaczan/TORGO

Description

The code here is based on Patrick von Platen's article and notebook https://huggingface.co/blog/fine-tune-xlsr-wav2vec2

Resources

Dataset

Big

https://huggingface.co/datasets/jmaczan/TORGO

Small

https://huggingface.co/datasets/jmaczan/TORGO-very-small

Others

https://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/

https://pytorch.org/audio/stable/tutorials/speech_recognition_pipeline_tutorial.html

https://huggingface.co/docs/datasets/v2.16.1/audio_dataset

https://distill.pub/2017/ctc/

https://ai.meta.com/blog/self-supervision-and-building-more-robust-speech-recognition-systems/

Cite

If you use this repository in your research, please use the following citation:

@misc{Maczan_ASR_Dysarthria_2024,
  title = "Research on Automatic Speech Recognition for dysarthric speech",
  author = "{Maczan, Jędrzej Paweł}",
  howpublished = "\url{https://github.com/jmaczan/asr-dysarthria}",
  year = 2024,
  publisher = {GitHub}
}

License

MIT License

Author

Jędrzej Paweł Maczan

https://huggingface.co/jmaczan | jed@maczan.pl | https://github.com/jmaczan

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
inference		inference
logs		logs
models		models
to_onnx		to_onnx
training		training
web-app		web-app
.DS_Store		.DS_Store
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md
requirements.txt		requirements.txt
vocab.json		vocab.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASR Dysarthria

Training

Installation

Inference

Deploying

Pretrained models

Datasets

Description

Resources

Papers

Code

Data

Dataset

Big

Small

Others

Cite

License

Author

About

Releases

Packages

Languages

jmaczan/asr-dysarthria

Folders and files

Latest commit

History

Repository files navigation

ASR Dysarthria

Training

Installation

Inference

Deploying

Pretrained models

Datasets

Description

Resources

Papers

Code

Data

Dataset

Big

Small

Others

Cite

License

Author

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages