Sign languages are a language system that allows those hard of hearing to communicate. The extraction of complex head and hand movements along with their constantly changing shapes for recognition of sign language is considered a difficult problem in computer vision. This paper proposes the recognition of American Sign Language (ASL) video gestures using Convolutional Neural Networks (CNN) to extract high level features and Long Short-Term Memory (LSTM) to generate captions as well as a deeper network such as MobileNet v2. The database is trained from scratch using the 12,000 videos available from Kaggle’s World Level American Sign Language data-set. Each sign occupies 40 frames in a video. Training was performed with 60% of the data-set, validating with 20% and testing with the remaining 20%. The final selected architecture has only 9 layers including dropout layers, which have increased the training accuracy to 100%, validation accuracy to 80%, and testing accuracy to 94%.
-
Notifications
You must be signed in to change notification settings - Fork 0
tararelan/COMP4471-Final-Project-2022
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Deep Learning and Sign Language Interpretation