Sign Language Accessibility for TikTok Creators and Audience
2024 TikTok TechJam
View Demo · Report Bug · Request Feature
Introducing AiSL
, an AI-powered tool that turns videos with sign language into inclusive and exciting videos with auto-generated captions, auto-generated voice-over, and auto-generated emoji captions.
AiSL
allows TikTok Deaf Creators to create and understand accessible and inclusive content with sign language easily with AiSL's AI-powered generation.
Here is a quick demo of the app. We hope you enjoy it.
Liked it? Please give a ⭐️ to AiSL.
AiSL
comes with 3 key AI features:
Sign-Language-to-Text converts sign language to text captions as sign language appears in the video.
Sign-Language-to-Speech converts sign language to a voiceover that plays over the video as the sign language appears in the video.
Sign-Language-to-Emoji converts sign language to emoji text captions as the sign language appears in the video.
- User's original video (in .mp4) is passed as input to the MediaPipe Gesture Recognizer model that we have fine tuned, and the model outputs the captions with the appropriate time stamps.
- The output captions with time stamps are processed algorithmically before passing to the next processing stage.
- The output captions are passed as inputs to the Text-to-Speech model (Google Translate text-to-speech API). An audio file is outputted here.
- At the same time, the output captions are turned into embeddings using
sentence-transformers/all-MiniLM-L6-v2
before conducting RAG retrieval from the vector store containing documents of emojis and descriptions. The retrieved documents (context) are passed as prompt together with the original generated captions to the Gemini Pro model for text to emoji translation. - The generated captions, generated audio file, and generated emoji captions are processed together with the original video to generate an edited video using python
cv2
package.
- Get a copy of this repository by opening up your terminal and run:
git clone https://github.com/Vinny0712/AiSL.git
- Install dependencies
In the frontend/
directory, run
yarn
- Set up Environment Variables
Create a .env
file in the frontend/
directory with all the environment variables listed in the .env.example
.
# .env file with all your environment variables
NEXT_PUBLIC_PRODUCTION_SERVER_URL=
- Start up the application
yarn dev
And you are ready to start using the frontend! The web application is running on http://localhost:3000/.
- In the
backend/
directory, create a python virtual environment and activate it.
python -m venv .venv
. .venv\Scripts\activate # The .venv activation command might differ depending on your operating system
- Install the required packages.
pip install -r requirements.txt
- Set up Environment Variables
Create a .env
file in the backend/
directory with all the environment variables listed in the .env.example
.
# .env file with all your environment variables
HUGGINGFACE_TOKEN=
GOOGLE_API_KEY=
PRODUCTION_CLIENT_URL=
- In the
/app
directory, start the application.
cd app
uvicorn main:app --reload
And you are ready to start using the Backend! The server application is running on http://127.0.0.1:8000/
Script for quick startup:
cd backend
. .venv/Scripts/activate
cd app
uvicorn main:app --reload
Congratulations, you have successfully created your own copy of AiSL.
Frontend
- Next.js (Deployed on Vercel)
Backend
- FastAPI (Deployed on Google Cloud Run)
Video Editing
- CV2 python package.
AI Models
- Sign Language to Text
- Model: MediaPipe Gesture Recognizer (Finetune)
- Finetune Dataset: WLASL Video (https://www.kaggle.com/datasets/risangbaskoro/wlasl-processed)
- Text to Speech
- Model: Google Translate text-to-speech API
- Text to Emoji
- Vectorstore (RAG) with emoji.csv as datasource
- Embeddings: sentence-transformers/all-MiniLM-L6-v2
- LLM Model: gemini-pro
Datasets
- Finetune Dataset: WLASL Video (https://www.kaggle.com/datasets/risangbaskoro/wlasl-processed)
- RAG Dataset: Emoji.csv (~500 records of emoji with description generated from OpenAI ChatGPT)
APIs used
- HuggingFace API
- Embeddings: sentence-transformers/all-MiniLM-L6-v2
- Google API
- LLM Model: gemini-pro
- Text to Speech: Google Translate text-to-speech API
Have an idea or improvement to make? Create an issue and make a pull request!