UniLingoStream is a real-time translation tool designed to break down language barriers while watching video streams on your computer. It captures audio output from your system, transcribes the spoken language, translates it using AI, and displays the translated text on your screen.
- Real-time audio capture from your system
- Transcription of spoken language using Google Cloud Speech-to-Text API
- Translation of transcribed text using Google Cloud Translation API
- On-screen display of translated text
- Python 3.7 or higher
- Google Cloud account with Speech-to-Text and Translation APIs enabled
- macOS with BlackHole installed for audio capture
First, ensure you have pip installed, then install the required Python packages:
pip install -r requirements.txt- Go to the Google Cloud Console.
- Enable the "Speech-to-Text API" and "Cloud Translation API".
- Navigate to the IAM & Admin > Service accounts page.
- Click "Create Service Account".
- Follow the prompts to create a new service account and download the JSON key file.
To ensure that UniLingoStream captures the audio output of your computer correctly, you need to configure your system's audio settings. Follow these steps:
-
Install BlackHole
If you haven't installed BlackHole yet, follow the BlackHole installation guide.
-
Set Up Audio MIDI Setup
- Open
Audio MIDI Setupfrom theApplications > Utilitiesfolder. - Click the
+button in the bottom left corner and selectCreate Multi-Output Device. - In the Multi-Output Device, check the boxes for your primary audio output (e.g., built-in speakers or external headphones) and
BlackHole 16ch.
- Open
-
Configure Sound Settings
- Open
System Preferencesand go toSound. - In the
Outputtab, select the Multi-Output Device you just created. - In the
Inputtab, selectBlackHole 16chas the input device.
- Open
git clone https://github.com/ChingEnLin/UniLingoSream
cd UniLingoSreamEnsure your system's audio output is routed through BlackHole, then run the main application:
python main.pyThe application will start capturing audio, transcribing, translating, and displaying the translated text on your screen.
audio_capturer.py: Handles real-time audio capturing.transcriber.py: Manages transcription and translation of audio.display.py: Manages the display of translated text using Tkinter.main.py: Main script to initialize and run the application.
Contributions are welcome! Please submit a pull request or open an issue to discuss improvements or bug fixes.
This project is licensed under the MIT License.
This `README.md` file provides detailed instructions on setting up, running, and understanding the project. It also outlines the structure of the project and provides an overview of the main components. This should help new users get started with your project and understand its functionality.