DSegKNN: Unsupervised Word Segmentation using K Nearest Neighbors

Tzeviya Sylvia Fuchs (fuchstz@cs.biu.ac.il)
Yedid Hoshen (yedid.hoshen@mail.huji.ac.il)
Joseph Keshet (joseph.keshet@cs.biu.ac.il)

DSegKNN, is an unsupervised kNN-based approach for word segmentation in speech utterances. This method relies on self-supervised pre-trained speech representations, and compares each audio segment of a given utterance to its K nearest neighbors within the training set.

The paper can be found here.

If you find our work useful, please cite:

@inproceedings{fuchs22_interspeech,
  author={Tzeviya Fuchs and Yedid Hoshen and Yossi Keshet},
  title={{Unsupervised Word Segmentation using K Nearest Neighbors}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={4646--4650},
  doi={10.21437/Interspeech.2022-11474}
}

Installation instructions

Python 3.8+
Pytorch 1.10.0
torchaudio 0.10.0
numpy
scipy
faiss
soundfile

Download the code:

git clone https://github.com/MLSpeech/DSegKNN.git

How to use

In this example, we will demonstrate how to run DSegKNN on the Buckeye corpus.

We use the same experimental setup as in "Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation (INTERSPEECH 2020)"" (Paper, Code, script based on the one by Felix Kreuk):
- split long wavs into smaller chunks (cut during silences)
- leave 0.2 seconds of silence in the beginning and end
- there are no non-speech utterances
Run the script as follows:

python buckeye_preprocess.py --spkr --source buckeye/speech/ --target datasets/buckeye_split/ --min_phonemes 20 --max_phonemes 50

This should create train, val and test folders in your chosen target directory buckeye_split. Each folder contains cut .wav files, with corresponding .word and '.phn' files containig the start and end times of words/phonemes within the .wav file.

Run run_segmenter.py with the following options:

 python knn_segmenter.py --win [number of frames to concatenate]
 			 --train_n [number of training examples to use] 
 			 --eval_n [number of evaluation examples to use]
 			 --layer [index of output layer of embedding architecture]
 			 --knn [number of nearest neighbors to compare to]
 			 --arc [architecture name: BASE || LARGE || LARGE_LV60K || XLSR53 || HUBERT_BASE || HUBERT_LARGE || HUBERT_XLARGE]
 			 --width [parameter for scipy.signal's find_peaks]
 			 --distance [parameter for scipy.signal's find_peaks]
 			 --prominence [parameter for scipy.signal's find_peaks]
 			 --train_dir [path to training directory]
 			 --val_dir [path to validation directory]

For example:

 python knn_segmenter.py --win 10
 			 --train_n 200
 			 --eval_n -1
 			 --layer 13
 			 --knn 20
 			 --arc HUBERT_LARGE
 			 --width 2
 			 --distance 4
 			 --prominence 4
 			 --train_dir datasets/buckeye_split/train/
 			 --val_dir datasets/buckeye_split/val/

Should result with:

 Final result: 31.015404643089606 32.232243517474635 31.612118531623173 3.923337091319068 40.71275576844716

which are the precision, recall, F-score, OS, and R-value.

(There could be some slight differences in results because 200 randomly drawn training examples are used).

For comparison, the evaluation script eval_segmentation.py used here is by Herman Kamper.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
buckeye_preprocess.py		buckeye_preprocess.py
data_loader.py		data_loader.py
eval_segmentation.py		eval_segmentation.py
knn_segmenter.py		knn_segmenter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DSegKNN: Unsupervised Word Segmentation using K Nearest Neighbors

Installation instructions

How to use

About

Releases

Packages

Languages

MLSpeech/DSegKNN

Folders and files

Latest commit

History

Repository files navigation

DSegKNN: Unsupervised Word Segmentation using K Nearest Neighbors

Installation instructions

How to use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages