This is a full-stack, production-ready AI Voice Call Translation Application that provides real-time, duplex translation between Urdu and English speakers.
- Real-time Duplex Communication: User A speaks in Urdu, and User B hears it in English. User B speaks in English, and User A hears it in Urdu.
- AI-Powered: Leverages the Gemini API for high-quality Speech-to-Text, Translation, and Text-to-Speech.
- Web-Based: Built with modern web technologies for accessibility and ease of use.
- Frontend: React, TypeScript, TailwindCSS, WebRTC
- Backend: Node.js, Express, WebSocket (ws), Gemini API
- Deployment: Ready for Vercel (Frontend) and Render (Backend).
- One-Click Calling: Simple "Start Call" button to initiate communication.
- Live Waveform Animation: Visual feedback when a user is speaking.
- Translation Subtitles: Displays the translated text for both users in real-time.
- Language Toggle: Easily switch the source and target languages.
- Robust Error Handling: Manages microphone permissions, network issues, and API errors gracefully.
- Loading States: Clear visual indicators during translation and processing.
- Node.js (v18 or later)
npmoryarn- A valid Gemini API Key
git clone <https://github.com/AyeshaNasirWebDeveloper/AI-Calling-Translator.git>Navigate to the backend directory:
cd backendInstall dependencies:
npm installCreate an environment file:
Create a .env file in the backend directory and add your Gemini API key:
GEMINI_API_KEY=your_gemini_api_key_here
Run the backend server:
npm startThe backend server will start on http://localhost:3001.
Navigate to the frontend directory:
cd ../frontendInstall dependencies:
npm installRun the frontend development server:
npm run devThe frontend application will be accessible at http://localhost:5173 (or another port if 5173 is busy).
- Signaling: When a user starts the app, it connects to the backend WebSocket server. The backend manages the WebRTC signaling process to establish a peer-to-peer connection between the two users.
- Voice Streaming: Once connected, users' microphone audio is streamed directly to each other using WebRTC. The audio is also captured and sent to the backend via WebSocket for processing.
- AI Processing Pipeline:
- The backend receives the audio chunk.
- Speech-to-Text: The audio is converted to text using the Gemini API.
- Translation: The resulting text is translated to the target language (Urdu or English).
- Text-to-Speech: The translated text is converted back into audio.
- Receiving Translation: The processed audio and subtitle text are sent back to the client, who then plays the audio and displays the subtitle.
- Push your code to a GitHub repository.
- Create a new "Web Service" on Render and connect your repository.
- Set the "Start Command" to
npm start. - Add your
GEMINI_API_KEYas an environment variable in the Render dashboard. - Deploy!
- Push your code to a GitHub repository.
- Create a new project on Vercel and connect your repository.
- Vercel will automatically detect that it's a React/Vite project.
- Before deploying, set the
VITE_BACKEND_URLenvironment variable in the Vercel project settings to your deployed Render backend URL. - Deploy!
This project is designed to be a complete, deployable solution. The code is modular and commented to be beginner-friendly.