A curated list of papers and resources for children's automatic speech recognition.
- My Science Tutor (MyST)
- Freely available for non-commercial use
- Age: Grade 3-5
- 400k hours, 230k utterances, coversational speech
- 100k utterances have been transcribed
- CSLU Kids' Speech Corpus (OGI)
- Age: K0 - G11
- PF-STAR
- Age: 4-14 years old
- The CMU Kids Corpus
- Wikipedia Page
- Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
- Shared-Adapters: A Novel Transformer-based Parameter Efficient Transfer Learning Approach For Children’s Automatic Speech Recognition
- Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch
- Improving child speech recognition with augmented child-like speech
- Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
- Children’s Speech Recognition through Discrete Token Enhancement
- Automatic Speech Recognition Tuned for Child Speech in the Classroom
- Improved Children’s Automatic Speech Recognition Combining Adapters and Synthetic Data Augmentation
- Build a 50+ Hours Chinese Mandarin Corpus for Children’s Speech Recognition
- Exploring Adapters with Conformers for Children’s Automatic Speech Recognition
- Sparsely Shared LoRA on Whisper for Child Speech Recognition
- SASB workshop Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations
- SASB workshop SOA: Reducing domain mismatch in SSL Pipeline by Speech Only Adaptation for low resource ASR
- Arxiv 2024 - Evaluation of state-of-the-art ASR Models in Child-Adult Interactions
- Journal of Electrical Systems 2024 - SVCGAN: Speaker Voice Conversion Generative Adversarial Network for Children's Speech Conversion and Recognition
- JASA 2024 - ChildAugment: Data Augmentation Methods for Zero-Resource Children's Speaker Verification
- Interspeech 2023 - Data augmentation for children ASR and child-adult speaker classification using voice conversion methods
- ICASSP 2023 - Using Modified Adult Speech as Data Augmentation for Child Speech Recognition
- Interspeech 2022 - Spectral Modification Based Data Augmentation for Improving End-to-End ASR for Children’s Speech
- ICASSP 2022 - LPC Augment: An LPC-Based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects
- Speech Communication 2021 - Fundamental frequency feature warping for frequency normalization and data augmentation in child automatic speech recognition
- ICASSP 2021 - Fundamental Frequency Feature Normalization and Data Augmentation for Child Speech Recognition
- Interspeech 2020 - Data Augmentation Using Prosody and False Starts to Recognize Non-native Children’s Speech
- Interspeech 2020 - Voice Conversion Based Data Augmentation to Improve Children’s Speech Recognition in Limited Data Scenario
- ASRU 2019 - Data Augmentation Based on Vowel Stretch for Improving Children's Speech Recognition
- ASRU 2019 - GANs for Chidren: A Generative Data Augmentation Strategy for Children Speech Recognirion Interspeech 2019 - A Frequency Normalization Technique for Kindergarten Speech Recognition Inspired by the Role of fo in Vowel Perception
- IEEE SPL 2019 - Significance of Pitch-Based Spectral Normalization for Children’s Speech Recognition
- Interspeech 2016 - Improving Children’s Speech Recognition through Out-of-Domain Data Augmentation
- Arxiv 2024 - Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings
- IEEE/ACM TASLP 2024 - Effect of Modeling Glottal Activity Parameters on Zero-Shot Children's ASR
- IEEE Access 2024 - Exploring Native and Non-Native English Child Speech Recognition With Whisper
- Arxiv 2023 - Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
- Interspeech 2023 - Adaptation of Whisper models to child speech recognition
- Under-review Speech Communication 2022 - Improving Children's Speech Recognition by Fine-tuning Self-supervised Adult Speech Representations
- IEEE JSTSP 2022 - Towards Better Domain Adaptation for Self-supervised Models: A Case Study of Child ASR
- Interspeech 2022 - DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR
- Interspeech 2022 - Transfer Learning for Robust Low-Resource Children's Speech ASR with Transformers and Source-Filter Warping
- ICASSP 2021 - Bi-APC: Bidirectional Autoregressive Predictive Coding for Unsupervised Pre-training and Its Application to Children's ASR
- Computer Speech & Language 2020 - Transfer Learning from Adult to Children for Speech Recognition: Evaluation, Analysis and Recommendations
- Interspeech 2019 - Improving ASR Systems for Children with Autism and Language Impairment Using Domain-Focused DNN Transfer Techniques
- WOCCI 2016 - Improving DNN-Based Automatic Recognition of Non-native Children's Speech with Adult Speech
- Augmentative and Alternative Communication 2024 - The development of synthetic child speech in three South African languages
- Arxiv 2024 - Data Efficient Child-Adult Speaker Diarization with Simulated Conversations
- Speech Communication 2024 - Automatic speaker and age identification of children from raw speech using sincNet over ERB scale
- Arxiv 2024 - Personalized Speech Recognition for Children with Test-Time Adaptation
- SN Computer Science 2024 - nvestigating Lattice-Free Acoustic Modeling for Children Automatic Speech Recognition in Low-Resource Settings Under Mismatched Conditions
- ASRU 2023 - No Pitch Left Behind: Addressing Gender Unbalance In Automatic Speech Recognition Through Pitch Manipulation
- SLT 2022 - A Zero-Shot Approach to Identifying Children's Speech in Automatic Gender Classification
- ICASSP 2022 - Towards Better Meta-Initialization with Task Augmentation for Kindergarten-aged Speech Recognition
- Interspeech 2021 - Age-Invariant Training for End-to-End Child Speech Recognition using Adversarial Multi-Task Learning
- ICASSP 2020 - Learning Domain Invariant Representations for Child-Adult Classification from Speech
- Interspeech 2019 - Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network
- ISCSLP 2018 - A Study on Acoustic Modeling for Child Speech Based on Multi-Task Learning
- ICASSP 2019 - Improving Children Speech Recognition Through Feature Learning from Raw Speech Signal
- Connecting Speech science and Speech technology for Children’s Speech
- Interspeech 2023, Interspeech 2024
- MERLIon CCS Challenge: Language Identification on Code-Switched Child-Directed Speech
- Interspeech 2023
- ETLT 2021: Shared Task on ASR for Non-Native Children's Speech
- Interspeech 2021
- CSRC: Children Speech Recognition Challenge
- SLT 2021
- Spoken Language Processing for Children's Speech
- Interspeech 2019
This is an active repository and your contributions are always welcome!