Skip to content

Latest commit

 

History

History
30 lines (19 loc) · 1.77 KB

readme.md

File metadata and controls

30 lines (19 loc) · 1.77 KB

Automatic Lip Reading using deep learning techniques

""Fifteen speakers (five men and ten women) positioned in the frustum of a MS Kinect sensor and utter ten times a set of ten words and ten phrases (see the table below). Each instance of the dataset consists of a synchronized sequence of color and depth images (both of 640x480 pixels). The MIRACL-VC1 dataset contains a total number of 3000 instances."" image image

We have limited the scope of the project to only predicting the words.

Modules

The main code cells are in the files ./data_genertor.ipynb and ./architectures/3d_cnn.ipynb

data_generator.ipynb : Crops lips from face images and store them in the same folder structure as the original. image image

Extracted features: image

training_model.ipynb : Model:

image

Results:

image

At Epoch 45, the last epoch, the Validation accuracy of the model was 0.5850 which is expected as this was a simple 3DCNN with no memory retention like RNNs.