Isolating Child Speech using Convolutional Neural Networks

Replicating Experiments

Setup

!git clone https://github.com/davidwmcdevitt/pedzstar-speech-pipeline

!pip install -r requirements.txt

Preparing Data Directories

Experiments are initialized from the raw audio files of the Speech Exemplars and Evaluation Database (SEED). Access to the SEED dataset can be requested at https://pedzstarlab.soc.northwestern.edu/

The pipeline searches for directories in the following format:

!mkdir ./experiments/

!mkdir ./experiments/[experiment_name]

!mkdir ./experiments/[experiment_name]/data

!mkdir ./experiments/[experiment_name]/data/children

!mkdir ./experiments/[experiment_name]/data/adults

!mkdir ./experiments/[experiment_name]/data/mixed

!mkdir ./experiments/[experiment_name]/data/transitions

Unzip data folders to the following directories:

!unzip -o /content/drive/MyDrive/PEDZSTAR/input_files/seed_children.zip -d ./experiments/[experiment_name]/data

!unzip -o /content/drive/MyDrive/PEDZSTAR/input_files/seed_adults.zip -d ./experiments/[experiment_name]/data

Experiment replication can be initialized with the following command

2-Class

!python train.py --repo_path [destination of git clone] ./experiments/[experiment_name]/data --model_name [experiment_name] --num_epochs 999 --train_split 0.9 --break_count 15 --batch_size 16 --clip_size 1 --noise_level 0.2 --class_weights --log_training --seed_only

3-Class (Overlap)

!python train.py --repo_path [destination of git clone] ./experiments/[experiment_name]/data --model_name [experiment_name] --num_epochs 999 --train_split 0.9 --break_count 15 --batch_size 16 --clip_size 1 --noise_level 0.2 --class_weights --log_training --seed_only --overlap_class

3-Class (Transition)

!python train.py --repo_path [destination of git clone] ./experiments/[experiment_name]/data --model_name [experiment_name] --num_epochs 999 --train_split 0.9 --break_count 15 --batch_size 16 --clip_size 1 --noise_level 0.2 --class_weights --log_training --seed_only --transition_class

4-Class

!python train.py --repo_path [destination of git clone] ./experiments/[experiment_name]/data --model_name [experiment_name] --num_epochs 999 --train_split 0.9 --break_count 15 --batch_size 16 --clip_size 1 --noise_level 0.2 --class_weights --log_training --seed_only --overlap_class --transition_class

Background

Gathering targeted child speech data in both research and classroom settings consequently involves the capture of non-targeted adult speech in the recordings, often as prompts for the child speaker or in conversational turn-taking tasks ¹. To utilize this data in future work, researchers must undergo a labor-intensive identification process to isolate the child speech, thereby constraining resources and limiting the growth of extensive child speaker datasets for speech pathology research.

Deep learning toolkits have made rapid advancements in audio and speech applications possible. While Open Source speaker diarization pipelines like OpenAI's Whisper ² address the "who said what and when" problem, they struggle to differentiate between child and adult speakers. This limitation arises from various intrinsic and extrinsic factors affecting child speech.

Our proposal offers an alternative approach using a convolutional neural network (CNN) architecture that better addresses the needs of isolating child speech.

Methodology

We employ deep learning methods such as CNNs, which have proven effective in audio classification tasks. Our CNN architecture comprises six convolutional blocks, each containing a 2D convolutional layer. The model can classify audio segments as belonging to child or adult speakers and can also handle more complex scenarios, such as overlapping speech between a child and an adult.

For this study, we use the Speech Exemplars and Evaluation Database (SEED), which contains 17,000 utterances from 69 child and 33 adult speakers collected in clinical and classroom settings ³.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
data.py		data.py
model.py		model.py
requirements.txt		requirements.txt
speech_pipeline_experiments.ipynb		speech_pipeline_experiments.ipynb
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Isolating Child Speech using Convolutional Neural Networks

Replicating Experiments

Setup

Preparing Data Directories

Background

Methodology

Citations

About

Releases

Packages

Languages

davidwmcdevitt/pedzstar-speech-pipeline

Folders and files

Latest commit

History

Repository files navigation

Isolating Child Speech using Convolutional Neural Networks

Replicating Experiments

Setup

Preparing Data Directories

Background

Methodology

Citations

Footnotes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages