TLDR; This project uses Google's Mediapipe Hand Landmarks to detect key landmarks and to create a skeleton by joining the keypoints on a white background image. The images is then trained on CNN, where VGG16 is used as the base model, pertaining to the fact that the dataset is created by the user and is too small for the CNN model.
The saved model is then used to predict hand gestures by comparing with the hand landmark "skeleton". The prediction is then used by another prediction function that explicilty compares the landmark coordinates for ASL alphabets to make sure the predictions are correct.
PyQt6 is chosen as the GUI for displaying the result. The user can further press a "Speak" button to pronounce the character, which is based on Pyttsx3 library.
Still few bugs remain to be smoothened out!