This repository contains a Streamlit web application that leverages speech-to-text, text-to-speech, and Connection to LLM API using Request. The project is built for an interactive and engaging speech experience, allowing users to record their voice, generate AI responses, and listen to the output in a fun, festive way. Made especially for the festival of Diwali.
- Speech Recording: The app allows users to record their voice directly through the interface using
audio_recorder
. - Connect to API Endpoint: Using
request
, users can connect to any LLM Endpoint to inference. For saving cost, we tested it out using LMStudio and LLama 3.2 3B - Text-to-Speech: The AI response is converted into speech with the
silero Hindi Language
TTS model, and the resulting audio is played back to the user.
The app's interface includes:
- An audio recording button that prompts the user to "Say Something Bombastic..."
- The recorded message is transcribed and displayed in the chat interface.
- AI generates a response based on the user's input, which is then transliterated and converted into audio.
For generating the required output, I created this prompt. Feel free to modify it and use accordingly:
- [ You are दिवाली एआई, specially created for the festive occasion of Diwali. Your mission is to assist users with any queries regarding Diwali celebrations. Your responses must always be positive, full of energy, and include a playful pun or festive humor. All responses should be in Devanagari language, including the name दिवाली एआई. Never use any other Language to respond back. Keep your replies clear, concise, short and funny. ]
- Streamlit Interface: The UI is built with
Streamlit
, providing a clean and responsive design for user interaction. - Speech Recognition: Audio input is processed using the
speech_recognition
package. - Text-to-Speech: AI-generated responses are turned into audio using the
silero
TTS model for Hindi voices. - Aksharamukha Transliteration: Converts AI responses from Devanagari to Romanized ISO format.
-
Clone the repository:
git clone https://github.com/Gurneet1928/Diwali-Voice-AI.git cd speech-diwali-ai
-
Set up a virtual environment (recommended):
python3 -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt --use-deprecated=legacy-resolver
-
Download necessary models: The
silero
TTS model and other language models are loaded directly fromtorch.hub
within the code.
-
Run the Streamlit app:
streamlit run app.py
-
Configure the LLM Endpoint API (Optional):
- Go to
common - utils.py
- Change the variables
url
andheaders
as required and save the file. - If you are using LMStudio, the default Endpoint will be
http://localhost:1234/v1/chat/completions
, so you can Ignore this.
- Go to
-
Interact with the app:
- Click on the microphone button to record your voice.
- Wait for the app to transcribe the audio, generate an AI response, and listen to the response in audio format.
- Python 3.7+
- Libraries:
streamlit
audio_recorder_streamlit
torch
speech_recognition
aksharamukha
silero-models
- Transliteration: Currently, the app transliterates text from Devanagari to Romanized ISO format. This can be adjusted by modifying the transliteration language pairs in the
transliterate.process()
function. - Voice Settings: The TTS model defaults to
hindi_male
, but this can be changed by selecting a different speaker from thesilero
models.
- Minor Lag when the response is fetched from backend and converted to Audio format. Can depend on Device to Device
Feel free to open an issue or a pull request if you would like to contribute or encounter any issues!
MIT License
Distributed under the License of MIT, which provides permission to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software. Check LICENSE file for more info.
OR Free to use But please make sure attribute the developer....
Free Software, Hell Yeah!
Reached the End ? I appreciate you reading this README in its entirety (maybe). Please remember to give this software a star if you found it useful in any way. ƪ(˘⌣˘)ʃ ƪ(˘⌣˘)ʃ
and also