A practical tool for interacting with Google's AI through voice, camera, screen, and text. This project creates a seamless experience for communicating with an AI assistant using multiple input methods.
- Talk to the AI using your microphone
- Type messages and get both text and voice responses
- Share your screen for visual assistance
- Use your camera for visual queries
- Works quickly with minimal delay
- Easily adjust settings to your preferences
- Keeps track of conversations for reference
You'll need:
- A computer running Windows, Mac, or Linux
- Python 3.8 or newer
- Internet connection
- Microphone (for voice features)
- Google API credentials
Follow these steps to get started:
-
Download the code
git clone https://github.com/yourusername/gemini-voice-assistant.git
-
Go to the project folder
cd gemini-voice-assistant
-
Create a separate environment (recommended)
python -m venv env env\Scripts\activate
-
Install required packages
pip install -r requirements.txt
-
Set up your credentials
- Create a file named
.env
in the main folder - Add your Google API key:
GOOGLE_API_KEY = your_api_key_here
- Create a file named
-
In the main.py file, hash all mode examples and unhash just one at a time to use a specific feature:
# Examples:
# To run audio mode:
main(input_mode=INPUT_MODE_AUDIO)
# To run text mode:
# main(input_mode=INPUT_MODE_TEXT)
# To run camera mode:
# main(input_mode=INPUT_MODE_CAMERA)
# To run screen mode with monitor index:
# main(input_mode=INPUT_MODE_SCREEN, monitor_index=DEFAULT_MONITOR_INDEX)
To talk with the assistant:
python main.py
gemini-real-time/
├── .env
├── .gitignore
├── main.py
├── requirements.txt
├── src/
├── config.py
├── handlers/
│ ├── audio_handler.py
│ ├── camera_handler.py
│ ├── screen_handler.py
│ └── text_handler.py
├── logs/
│ └── app.log
└── utils/
└── logger.py
You can change how the assistant works by editing the src/config.py
file:
- Change the AI model version
- Adjust audio quality settings
- Modify input/output preferences
- Audio not working: Check your microphone connections and system permissions
- Missing packages: Run
pip install -r requirements.txt
again - API errors: Verify your Google API key is correct and has proper permissions