Empirical Methods in Natural Language Processing(EMNLP) 2023
- About The Repository
- Getting Started
- Roadmap
- Dataset Statistics
- Contributing
- Authors
- Citation
- License
This repository hosts the artefacts pertaining to our paper Accented Speech Recognition With Accent-specific Codebooks accepted to the main conference of EMNLP 2023.
The main contributions of our paper are as follows:
🔎 A new accent adaptation technique that uses a set of learnable codebooks
and a new beam-search decoding
algorithm to achieve significant performance improvement on both seen and unseen accents.
✅ Reproducible splits on Commonvoice dataset for accented ASR setup to facilitate fair comparisons across existing and new accent adaptation techniques.
The repository contains two folders:
- data 📁 - Contains the train, dev and test splits used for all our experiments. Additionally, the folder also contians scripts used to generate those splits. More details can be found here.
- espnet_code 📁 - Contains code to run our experiments on ESPnet toolkit. Detailed instruction on how to run our experiments can be found here.
- ESPnet installation: Follow the instructions here.
- Clone the repository containing our code and dataset.
git clone https://github.com/csalt-research/accented-codebooks-asr.git
- Additionally, to run the dataset creation script, run the following:
pip install -r accented-codebooks-asr/data/requirements.txt
- Extract the csvs from the
tar
file in data folder
tar -xvzf accented-codebooks-asr/data/dataset.tar.gz
- Copy the files from espnet_code into ESPnet egs
cp -r accented-codebooks-asr/espnet_code/* <espnet_root_folder>/egs/commonvoice/asr1
- Enter the path to the the directory hosting our splits in
run.sh
csvdir= # Path to the directory hosting all our csvs.
- Run the script
./run.sh
The statistics of train, dev and test splits used in our experiments are as follows:
Accent | Train 100h (in hours) | Train (in hours) | Dev (in hours) | Test (in hours) |
---|---|---|---|---|
Australia | 6.95 | 45.36 | 4.33 | 0.46 |
Canada | 6.79 | 41.13 | 1.16 | 1.21 |
England | 19.51 | 119.9 | 3.22 | 1.65 |
Scotland | 2.69 | 16.21 | 0.23 | 0.16 |
US | 64.12 | 400.1 | 8.32 | 4.87 |
Africa | - | - | - | 1.71 |
Hongkong | - | - | - | 0.52 |
India | - | - | - | 0.58 |
Ireland | - | - | - | 1.94 |
Malaysia | - | - | - | 0.39 |
Newzealand | - | - | - | 2.11 |
Philippines | - | - | - | 0.90 |
Singapore | - | - | - | 0.64 |
Wales | - | - | - | 0.27 |
See the open issues for a list of proposed features (and known issues) relevant to this work. For ESPnet related features/issues, checkout their github repository.
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- If you have suggestions for adding or removing projects, feel free to open an issue to discuss it, or directly create a pull request after you edit the README.md file with necessary changes.
- Please open an individual PR for each suggestion.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/NewFeature
) - Commit your Changes (
git commit -m 'Add appropriate commit message'
). The correct way to write your commit message can be found here - Push to the Branch (
git push origin feature/NewFeature
) - Open a Pull Request
- Darshan Prabhu - M.Tech, CSE, IIT Bombay - Darshan Prabhu
- Preethi Jyothi - Associate Professor, CSE, IIT Bombay - Preethi Jyothi
- Sriram Ganapathy - Associate Professor, EE, IISc Bangalore - Sriram Ganapathy
- Vinit Unni - Ph.D, CSE, IIT Bombay - Vinit Unni
If you use this code for your research, please consider citing our work.
@misc{prabhu2023accented,
title={Accented Speech Recognition With Accent-specific Codebooks},
author={Darshan Prabhu and Preethi Jyothi and Sriram Ganapathy and Vinit Unni},
year={2023},
eprint={2310.15970},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Distributed under the MIT License. See LICENSE for more information.