Skip to content

Lip-reading neural network leveraging 3D convolutional layers and bidirectional LSTMs to transcribe speech from video sequences based on lip movements. Utilizes Connectionist Temporal Classification (CTC) loss for alignment-free character sequence prediction.

Notifications You must be signed in to change notification settings

grohan1130/lip-reader-cv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

lip-reader-cv

This is a deep learning-based model designed to transcribe speech from video sequences by analyzing lip movements. This repository implements a comprehensive pipeline for lip-reading, including data preprocessing, model training, and prediction. The model leverages 3D convolutional layers for spatiotemporal feature extraction, bidirectional LSTMs for sequence learning, and a Connectionist Temporal Classification (CTC) loss function for alignment-free text prediction.

Features

  • Video Processing: Preprocesses video data by extracting grayscale frames and normalizing them.
  • Text Alignments: Maps frames to text alignments for supervised learning.
  • Neural Network Architecture: Combines 3D convolutional layers and bidirectional LSTMs to process spatial and temporal information.
  • CTC Loss: Enables the model to handle sequence-to-sequence learning without explicit alignment.
  • Real-Time Evaluation: Generates predictions after every training epoch to monitor model performance.

Model Architecture

3D Convolutional Layers:

  • Extract spatiotemporal features from video frames.
  • Includes three Conv3D layers followed by ReLU activation and MaxPooling layers.

Bidirectional LSTMs:

  • Process sequential data to capture temporal dependencies.
  • Two layers of bidirectional LSTMs with dropout regularization.

Dense Output Layer:

  • Classifies characters using softmax activation.

Loss Function

CTC Loss:

  • Used for sequence alignment and transcription.

Dependencies

Install the required libraries using pip: pip install tensorflow opencv-python matplotlib gdown imageio

About

Lip-reading neural network leveraging 3D convolutional layers and bidirectional LSTMs to transcribe speech from video sequences based on lip movements. Utilizes Connectionist Temporal Classification (CTC) loss for alignment-free character sequence prediction.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published