This project focuses on translating Ukrainian Sign Language into text using advanced machine learning models. The real-time interface processes camera feed and provides textual representations of Ukrainian Sign Language gestures(in our case letters).
- Total Videos: 638
- Duration: Each video is 2-3 seconds long
- Participants: 8 different individuals
- Letters: І, Ї, А, В, Д, И, К, Л, М, Н, О, Р, С, Т, У, Ю.
- Conditions: Various dynamic backgrounds
- Note: Dataset is not attached.
- Frame Extraction: Extract the first 60 frames (2 seconds) from each video.
- Frame Sampling: Sample frames to reduce the data size.
- Rescaling: Lower the resolution and crop frames from 16:9 to 1:1.
- Tensor Formation: Create tensors of shape (12, 360, 360, 3) representing (Time, Height, Width, Channels).
- Data Augmentation: Using the Kornia library.
- Models Tried:
- CNN-LSTM
- (2+1)D CNN
- Long-term Recurrent Convolutional Networks
Challenges: Encountered low accuracy with initial models.
- Architecture: ResNet50V2 base with additional 3D convolutional and pooling layers, followed by fully connected layers.
- Training: Stratified K-Fold (3 Folds) with ADAM Optimizer.
- Improvements: Added L2 Regularization and Dropout to enhance performance.
Final Test Accuracy: 85.93%
- Downloaded the trained model from here.
- Download and run
real_time_interface.py. Note: specify the correct path to the model inside the file.
- Mykhailo Humeniuk
- Andrii Pletinka

