Voice-GPT

Overview

The Voice-GPT project is a Python-based application that serves as a virtual assistant, capable of engaging in voice-based conversations with users. It utilizes various technologies and APIs for speech recognition, text-to-speech conversion, and natural language processing to provide a seamless conversational experience.

The project consists of several components, including:

User Interface (UI): A graphical user interface built using the wxPython library, where users can initiate and participate in conversations with the virtual assistant.
Audio Recorder: This component records audio input from the user, which is then converted into text for further processing.
Speech-to-Text Converter: Converts recorded audio into text using a machine learning model, enabling the virtual assistant to understand spoken language.
Text-to-Speech Converter: Converts text responses generated by the virtual assistant into audio for playback to the user.
Conversation Engine: Uses the OpenAI GPT-3 model to generate responses to user queries and prompts, making the virtual assistant capable of meaningful interactions.
Audio Player: Plays audio responses generated by the Text-to-Speech Converter to provide a spoken response to the user.

Getting Started

Prerequisites

Before running the Personal Assistant project, ensure that you have the following dependencies installed:

Python 3.x
wxPython
pydub
whisper
gtts
pyaudio
OpenAI GPT-3 API (You will need an API key for this)
Other dependencies mentioned in the code comments

Installation

Clone this repository to your local machine:

git clone https://github.com/kristo-godari/voice-gpt.git

Install the required Python libraries using pip:

pip install -r requirements.txt

Obtain an OpenAI GPT-3 API key by signing up for access on the OpenAI website: https://beta.openai.com/signup/ Create a configuration file named role-play-conversation.properties in the config directory and populate it with the following information:

[text]
initial-prompt = Hello, how can I assist you today?
text-to-speach-language = en
openai-api-key = YOUR_OPENAI_API_KEY

Replace YOUR_OPENAI_API_KEY with the API key you obtained from OpenAI.

Usage

Run the Personal Assistant application using the following command:

python main.py

The application will launch the graphical user interface, allowing you to interact with the virtual assistant. Follow these steps to use the application:

Click the "Reply" button to start recording your voice.
Speak your message or question to the virtual assistant.
Click the "Stop recording and send reply" button to stop recording.
The virtual assistant will process your query and provide a text and audio response.
The conversation continues, and you can ask additional questions or provide instructions.

Features

Voice input: Users can speak to the virtual assistant, which converts their speech to text for processing.
Text input: Users can also type text directly into the application to interact with the virtual assistant.
Natural language understanding: The application utilizes the OpenAI GPT-3 model to understand and generate human-like responses.
Text-to-speech conversion: Responses from the virtual assistant are converted to audio for a more natural conversational experience.
Multithreading: The application uses multithreading to handle audio recording and processing simultaneously, ensuring a smooth user experience.

Troubleshooting

If you encounter any issues with the application, please check that you have installed all the required dependencies and configured the role-play-conversation.properties file correctly. Make sure your system's microphone is correctly configured and working. Ensure a stable internet connection, as the OpenAI GPT-3 model requires an internet connection to generate responses.

Future Improvements

Implement additional conversational features and expand the capabilities of the virtual assistant. Add user authentication and personalization to tailor responses based on user preferences. Enhance error handling and provide more informative feedback to users. Improve the graphical user interface for a more user-friendly experience.

Contributions

Contributions to this project are welcome! Feel free to open issues or submit pull requests to help improve the project.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Voice-GPT

Overview

Getting Started

Prerequisites

Installation

Usage

Features

Troubleshooting

Future Improvements

Contributions

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Voice-GPT

Overview

Getting Started

Prerequisites

Installation

Usage

Features

Troubleshooting

Future Improvements

Contributions

License