Skip to content

Latest commit

 

History

History
235 lines (158 loc) · 7.03 KB

README.md

File metadata and controls

235 lines (158 loc) · 7.03 KB

AiSL

AiSL

Sign Language Accessibility for TikTok Creators and Audience

2024 TikTok TechJam

licence forks stars issues pull-requests

View Demo · Report Bug · Request Feature

👋🏻 Introducing AiSL

AiSL landing page

Introducing AiSL, an AI-powered tool that turns videos with sign language into inclusive and exciting videos with auto-generated captions, auto-generated voice-over, and auto-generated emoji captions.

AiSL allows TikTok Deaf Creators to create and understand accessible and inclusive content with sign language easily with AiSL's AI-powered generation.


🚀 Demo

Here is a quick demo of the app. We hope you enjoy it.

Liked it? Please give a ⭐️ to AiSL.


🔥 Features

AiSL comes with 3 key AI features:

Feature 1: Sign-Language-to-Text 📑

Sign-Language-to-Text converts sign language to text captions as sign language appears in the video.

Feature 2: Sign-Language-to-Speech 🔊

Sign-Language-to-Speech converts sign language to a voiceover that plays over the video as the sign language appears in the video.

Feature 3: Sign-Language-to-Emoji 👋🏻

Sign-Language-to-Emoji converts sign language to emoji text captions as the sign language appears in the video.


AI Architecture

AiSL landing page

  1. User's original video (in .mp4) is passed as input to the MediaPipe Gesture Recognizer model that we have fine tuned, and the model outputs the captions with the appropriate time stamps.
  2. The output captions with time stamps are processed algorithmically before passing to the next processing stage.
  3. The output captions are passed as inputs to the Text-to-Speech model (Google Translate text-to-speech API). An audio file is outputted here.
  4. At the same time, the output captions are turned into embeddings using sentence-transformers/all-MiniLM-L6-v2 before conducting RAG retrieval from the vector store containing documents of emojis and descriptions. The retrieved documents (context) are passed as prompt together with the original generated captions to the Gemini Pro model for text to emoji translation.
  5. The generated captions, generated audio file, and generated emoji captions are processed together with the original video to generate an edited video using python cv2 package.

💪🏻 Try Yourself

  1. Get a copy of this repository by opening up your terminal and run:
git clone https://github.com/Vinny0712/AiSL.git

Frontend Setup Instructions

  1. Install dependencies

In the frontend/ directory, run

yarn
  1. Set up Environment Variables

Create a .env file in the frontend/ directory with all the environment variables listed in the .env.example.

# .env file with all your environment variables

NEXT_PUBLIC_PRODUCTION_SERVER_URL=
  1. Start up the application
yarn dev

And you are ready to start using the frontend! The web application is running on http://localhost:3000/.


Backend Setup Instructions

  1. In the backend/ directory, create a python virtual environment and activate it.
python -m venv .venv
. .venv\Scripts\activate # The .venv activation command might differ depending on your operating system
  1. Install the required packages.
pip install -r requirements.txt
  1. Set up Environment Variables

Create a .env file in the backend/ directory with all the environment variables listed in the .env.example.

# .env file with all your environment variables

HUGGINGFACE_TOKEN=
GOOGLE_API_KEY=
PRODUCTION_CLIENT_URL=
  1. In the /app directory, start the application.
cd app
uvicorn main:app --reload

And you are ready to start using the Backend! The server application is running on http://127.0.0.1:8000/

Script for quick startup:

cd backend
. .venv/Scripts/activate
cd app
uvicorn main:app --reload

Congratulations, you have successfully created your own copy of AiSL.


🏗️ Tech Stack

Frontend

  • Next.js (Deployed on Vercel)

Backend

  • FastAPI (Deployed on Google Cloud Run)

Video Editing

  • CV2 python package.

AI Models

  1. Sign Language to Text
  2. Text to Speech
    • Model: Google Translate text-to-speech API
  3. Text to Emoji
    • Vectorstore (RAG) with emoji.csv as datasource
    • Embeddings: sentence-transformers/all-MiniLM-L6-v2
    • LLM Model: gemini-pro

Datasets

APIs used

  1. HuggingFace API
    • Embeddings: sentence-transformers/all-MiniLM-L6-v2
  2. Google API
    • LLM Model: gemini-pro
    • Text to Speech: Google Translate text-to-speech API

✨ Contributors


💡 Contributing

Have an idea or improvement to make? Create an issue and make a pull request!