V-SAD - Visual-Speech Activity Detection.

Detecting speech activity with neural networks via facial images. The purpose of the network is to identify if the image belongs to speech or non-speech

The networks are compiled with Anaconda and Spyder utilising python - 3.6.12, Keras (GPU) - 2.2.4, TensorFlow (GPU) - 1.12

The networks are trained with Train, Validation, and Test sets with a ratio of 70:15:15 whereby Train and Validation sets are used to train the network whilst Test set is used to evaluate the networks. The data is split on speaker-dependent i.e. data from an individual is occurs in each set, as opposed to speaker-independent where data is split based on individuals

Dataset - https://conradsanderson.id.au/vidtimit/ C. Sanderson and B.C. Lovell Multi-Region Probabilistic Histograms for Robust and Scalable Identity Inference. Lecture Notes in Computer Science (LNCS), Vol. 5558, pp. 199-208, 2009.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
CNN - RNN		CNN - RNN
CNN		CNN
CRNN		CRNN
feature extraction		feature extraction
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

V-SAD - Visual-Speech Activity Detection.

About

Releases

Packages

Languages

License

sajR/V-SAD

Folders and files

Latest commit

History

Repository files navigation

V-SAD - Visual-Speech Activity Detection.

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages