Automatic Lip Reading using deep learning techniques

Dataset : MIRACL-VC1 https://sites.google.com/site/achrafbenhamadou/-datasets/miracl-vc1?pli=1

""Fifteen speakers (five men and ten women) positioned in the frustum of a MS Kinect sensor and utter ten times a set of ten words and ten phrases (see the table below). Each instance of the dataset consists of a synchronized sequence of color and depth images (both of 640x480 pixels). The MIRACL-VC1 dataset contains a total number of 3000 instances.""

We have limited the scope of the project to only predicting the words.

Modules

The main code cells are in the files ./data_genertor.ipynb and ./architectures/3d_cnn.ipynb

data_generator.ipynb : Crops lips from face images and store them in the same folder structure as the original.

Extracted features:

training_model.ipynb : Model:

Results:

At Epoch 45, the last epoch, the Validation accuracy of the model was 0.5850 which is expected as this was a simple 3DCNN with no memory retention like RNNs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Automatic Lip Reading using deep learning techniques

Dataset : MIRACL-VC1 https://sites.google.com/site/achrafbenhamadou/-datasets/miracl-vc1?pli=1

Modules

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Automatic Lip Reading using deep learning techniques

Dataset : MIRACL-VC1 https://sites.google.com/site/achrafbenhamadou/-datasets/miracl-vc1?pli=1

Modules