Skip to content

csalt-research/accented-codebooks-asr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CSALT @ IITB

Accented Speech Recognition With Accent-specific Codebooks

Empirical Methods in Natural Language Processing(EMNLP) 2023

Downloads Contributors Forks Stargazers

Table Of Contents

About The Repository

This repository hosts the artefacts pertaining to our paper Accented Speech Recognition With Accent-specific Codebooks accepted to the main conference of EMNLP 2023.

The main contributions of our paper are as follows:

🔎 A new accent adaptation technique that uses a set of learnable codebooks and a new beam-search decoding algorithm to achieve significant performance improvement on both seen and unseen accents.

Reproducible splits on Commonvoice dataset for accented ASR setup to facilitate fair comparisons across existing and new accent adaptation techniques.

Getting Started

The repository contains two folders:

  • data 📁 - Contains the train, dev and test splits used for all our experiments. Additionally, the folder also contians scripts used to generate those splits. More details can be found here.
  • espnet_code 📁 - Contains code to run our experiments on ESPnet toolkit. Detailed instruction on how to run our experiments can be found here.

Prerequisites and Installation

  • ESPnet installation: Follow the instructions here.
  • Clone the repository containing our code and dataset.
git clone https://github.com/csalt-research/accented-codebooks-asr.git
  • Additionally, to run the dataset creation script, run the following:
pip install -r accented-codebooks-asr/data/requirements.txt

Training

  1. Extract the csvs from the tar file in data folder
tar  -xvzf accented-codebooks-asr/data/dataset.tar.gz 
  1. Copy the files from espnet_code into ESPnet egs
cp -r accented-codebooks-asr/espnet_code/* <espnet_root_folder>/egs/commonvoice/asr1
  1. Enter the path to the the directory hosting our splits in run.sh
csvdir=  # Path to the directory hosting all our csvs.
  1. Run the script
./run.sh

Dataset Statistics

The statistics of train, dev and test splits used in our experiments are as follows:

Accent Train 100h (in hours) Train (in hours) Dev (in hours) Test (in hours)
Australia 6.95 45.36 4.33 0.46
Canada 6.79 41.13 1.16 1.21
England 19.51 119.9 3.22 1.65
Scotland 2.69 16.21 0.23 0.16
US 64.12 400.1 8.32 4.87
Africa - - - 1.71
Hongkong - - - 0.52
India - - - 0.58
Ireland - - - 1.94
Malaysia - - - 0.39
Newzealand - - - 2.11
Philippines - - - 0.90
Singapore - - - 0.64
Wales - - - 0.27

Roadmap

See the open issues for a list of proposed features (and known issues) relevant to this work. For ESPnet related features/issues, checkout their github repository.

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  • If you have suggestions for adding or removing projects, feel free to open an issue to discuss it, or directly create a pull request after you edit the README.md file with necessary changes.
  • Please open an individual PR for each suggestion.

Creating A Pull Request

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/NewFeature)
  3. Commit your Changes (git commit -m 'Add appropriate commit message'). The correct way to write your commit message can be found here
  4. Push to the Branch (git push origin feature/NewFeature)
  5. Open a Pull Request

Authors

Citation

If you use this code for your research, please consider citing our work.

@misc{prabhu2023accented,
      title={Accented Speech Recognition With Accent-specific Codebooks}, 
      author={Darshan Prabhu and Preethi Jyothi and Sriram Ganapathy and Vinit Unni},
      year={2023},
      eprint={2310.15970},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

Distributed under the MIT License. See LICENSE for more information.