Speaker Identificition

inzva AI Projects #5 - Speaker Identification

Project Description

In this project we tried to solve the problem of Speaker Identification which is a process of recognizing a person from a voice utterance. We implemented the methods propsed in Deep CNNs With Self-Attention for Speaker Identification paper on both Tensorflow-Keras and Pytorch.

Dataset

We used below datasets:

VCTK dataset is easy to use, no license agreement is required and it is easy to use after download.

For the VoxCeleb dataset, it is recommended to visit its website to sign up and find download and conversion scripts for the datasets.

The data split text file for identification will be required.

The files under dataloaders used for loading the data with datagens in Keras and dataloaders in Pytorch. The scripts can generate file paths in runtime or read from a txt file directly. It is recommended to generate txt files. Check this notebook to generate such a file.

It is also recommended to generate pickle files from audio features first and load them. Our data loaders works with that way too. Check out scripts under utils folder to create such files.

Preprocess

Before feeding the audio files into our models, we extract filter bank coefficients from them. Check out here for the complete process. Our implementation is under utils/preprocessed_feature_extraction.py

Models

We implemented below architectures:

Results

We achieved

Nearest Neighbor Search

After training our models, we extracted embeddings with the trained model and used knn algorithm to find closest neighboors of the extracted embeddings. Such system can be used to find the closest voice utterances and their class labels for a given audio signal.

Check out extract_embeds.py and closest_celeb.py scripts for the implementation of this method.

Project Dependencies

Keras
Pytorch
MatPlotLib
TensorFlow
Pickle
Numpy
Librosa

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.ipynb_checkpoints		.ipynb_checkpoints
ResNet		ResNet
dataloaders		dataloaders
models		models
preprocessing		preprocessing
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Save_VoxCelebTxts.ipynb		Save_VoxCelebTxts.ipynb
TrainVoxCeleb2.py		TrainVoxCeleb2.py
Train_VCTK_Keras.ipynb		Train_VCTK_Keras.ipynb
Train_VoxCeleb1_Class.py		Train_VoxCeleb1_Class.py
Train_VoxCeleb_Class.py		Train_VoxCeleb_Class.py
closest_celeb.py		closest_celeb.py
datagen_vctk.py		datagen_vctk.py
extract_embeds.py		extract_embeds.py
load_vctk.py		load_vctk.py
main.ipynb		main.ipynb
model-keras.py		model-keras.py
run_model_torch_voxceleb1.py		run_model_torch_voxceleb1.py
run_model_voxceleb.py		run_model_voxceleb.py
train_vctk_keras.py		train_vctk_keras.py
utils.py		utils.py
vgg-model-torch.py		vgg-model-torch.py
vggish.py		vggish.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaker Identificition

Project Description

Dataset

Preprocess

Models

Results

Nearest Neighbor Search

Project Dependencies

About

Releases

Packages

Languages

License

inzva/inzpeech

Folders and files

Latest commit

History

Repository files navigation

Speaker Identificition

Project Description

Dataset

Preprocess

Models

Results

Nearest Neighbor Search

Project Dependencies

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages