This project is a sample AI assistant that uses OpenAI and Google Generative AI models to provide responses based on user prompts and webcam images. The assistant can also convert text responses to speech.
-
API Keys: You need an
OPENAI_API_KEY
and aGOOGLE_API_KEY
to run this code. Store them in a.env
file in the root directory of the project, or set them as environment variables. -
Apple Silicon Users: If you are running the code on Apple Silicon, install
portaudio
by running the following command:brew install portaudio
-
Create a Virtual Environment:
python3 -m venv .venv
-
Activate the Virtual Environment:
source .venv/bin/activate
-
Update pip and Install Required Packages:
pip install -U pip pip install -r requirements.txt
To start the assistant, run the following command:
python3 assistant.py
- Webcam Stream: The assistant uses your webcam to capture images.
- Voice Input: Speak into your microphone to provide prompts.
- Text-to-Speech: The assistant will respond with synthesized speech.
To stop the assistant, close the webcam window or press Esc
or q
.
This project is licensed under the MIT License.