ConversaVoice

AI-Powered Voice Assistant with Emotional Intelligence

Features • Quick Start • Architecture • API Usage • Tech Stack

Overview

ConversaVoice is a context-aware voice assistant that understands emotions and responds with appropriate tone and empathy. It combines cutting-edge AI technologies to create natural, emotionally intelligent conversations.

🎤 You speak → 🧠 AI understands → 💬 Smart response → 🔊 Natural voice

Why ConversaVoice?

Traditional Assistants	ConversaVoice
Monotone responses	Emotional, expressive speech
Forgets context	Remembers conversation history
Generic replies	Personalized, context-aware
Robotic voice	Natural human-like tone

Features

🎙️ Voice Input & Output

Speech-to-Text: Groq Whisper API for fast, accurate transcription
Text-to-Speech: Azure Neural TTS with emotional expressiveness
Real-time: Low-latency streaming pipeline

🧠 Intelligent Responses

LLM-Powered: Groq API with Llama 3.3 70B for smart replies
Context-Aware: Remembers conversation history
Emotion Detection: Adapts tone based on user sentiment

💭 Emotional Intelligence

Sentiment Analysis: Detects frustration, happiness, confusion
Adaptive Prosody: Changes pitch, rate, and tone dynamically
Empathetic Responses: De-escalation when user is frustrated

🔄 Conversation Memory

Redis-Backed: Persistent session storage
Repetition Detection: Knows when user repeats themselves
Preference Tracking: Remembers user preferences

🎨 Expressive Speech (SSML)

30+ Emotion Styles: Cheerful, empathetic, calm, excited...
Word Emphasis: Stress important words naturally
Prosody Control: Fine-tune pitch, rate, and volume

🔒 Reliability

Fallback System: Auto-switch to local models if cloud fails
Ollama Backup: Local LLM fallback
Piper TTS Backup: Local voice synthesis

Quick Start

Prerequisites

Python 3.10+
Docker (for Redis)
API Keys:
- Groq API (Free)
- Azure Speech (Free tier)

Installation

# Clone the repository
git clone https://github.com/Speech-Synthesis/ConversaVoice.git
cd ConversaVoice

# Create virtual environment
python -m venv venv

# Activate (Windows)
.\venv\Scripts\activate

# Activate (Linux/Mac)
source venv/bin/activate

# Install dependencies
pip install -r backend/requirements.txt

# Copy environment file
cp .env.example .env
# Edit .env with your API keys

Configure Environment

# .env file
GROQ_API_KEY=your_groq_api_key
AZURE_SPEECH_KEY=your_azure_key
AZURE_SPEECH_REGION=eastus
REDIS_HOST=localhost
REDIS_PORT=6379
STT_BACKEND=groq

Run Locally

1. Start Redis:

docker run -d -p 6379:6379 redis

2. Start Backend (Terminal 1):

cd backend
uvicorn main:app --reload --port 8000

3. Start Frontend (Terminal 2):

cd frontend
streamlit run app.py

4. Open Browser:

http://localhost:8501

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        ConversaVoice                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐  │
│  │ Frontend │───▶│ Backend  │───▶│   LLM    │───▶│   TTS    │  │
│  │Streamlit │    │ FastAPI  │    │  Groq    │    │  Azure   │  │
│  └──────────┘    └────┬─────┘    └──────────┘    └──────────┘  │
│                       │                                         │
│       ┌───────────────┼───────────────┐                        │
│       ▼               ▼               ▼                        │
│  ┌─────────┐    ┌──────────┐    ┌──────────┐                   │
│  │  Redis  │    │   STT    │    │   NLP    │                   │
│  │ Memory  │    │  Groq    │    │Sentiment │                   │
│  └─────────┘    └──────────┘    └──────────┘                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Data Flow

User speaks
    │
    ▼
┌─────────────────┐
│  Groq Whisper   │  ← Speech-to-Text
│    (STT)        │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Sentiment +    │  ← Analyze emotion
│  Context Check  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Redis Memory   │  ← Fetch history
│  + Preferences  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Groq LLM       │  ← Generate response
│  (Llama 3.3)    │     with emotion style
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Azure TTS      │  ← Convert to speech
│  (Neural Voice) │     with prosody
└────────┬────────┘
         │
         ▼
    User hears response

API Usage

Python SDK

from src.orchestrator import Orchestrator
import asyncio

async def main():
    # Initialize
    orch = Orchestrator(session_id="user-123")
    await orch.initialize()

    # Process voice/text
    result = await orch.process_text("I'm frustrated with my order!")

    print(f"Response: {result.assistant_response}")
    print(f"Emotion Style: {result.style}")  # "empathetic"
    print(f"Latency: {result.latency_ms}ms")

    await orch.shutdown()

asyncio.run(main())

REST API

Health Check:

curl http://localhost:8000/api/health

Create Session:

curl -X POST http://localhost:8000/api/session

Chat:

curl -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello!", "session_id": "your-session-id"}'

Transcribe Audio:

curl -X POST http://localhost:8000/api/transcribe \
  -F "audio=@recording.wav" \
  -F "session_id=your-session-id"

Synthesize Speech:

curl -X POST http://localhost:8000/api/synthesize \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello!", "style": "cheerful"}'

Emotional Prosody

ConversaVoice adapts voice characteristics based on context:

Style	When Used	Voice Effect
`neutral`	Normal conversation	Standard tone
`cheerful`	Good news, greetings	Higher pitch, faster
`empathetic`	User frustrated/sad	Softer, slower
`patient`	Explaining complex topics	Calm, measured
`de_escalate`	User very angry	Very soft, slow

SSML Example

<speak version="1.0" xmlns:mstts="http://www.w3.org/2001/mstts">
  <voice name="en-US-JennyNeural">
    <mstts:express-as style="empathetic" styledegree="1.3">
      I understand how frustrating this must be.
      <emphasis level="strong">We'll fix this right away.</emphasis>
    </mstts:express-as>
  </voice>
</speak>

Project Structure

ConversaVoice/
├── backend/                 # FastAPI backend
│   ├── api/
│   │   ├── routes.py       # API endpoints
│   │   └── models.py       # Pydantic models
│   ├── services/
│   │   └── orchestrator_service.py
│   └── main.py             # App entry point
│
├── frontend/               # Streamlit UI
│   ├── app.py             # Main UI
│   └── api_client.py      # Backend client
│
├── src/                    # Core modules
│   ├── llm/               # LLM clients
│   │   ├── groq_client.py
│   │   └── ollama_client.py
│   ├── tts/               # Text-to-Speech
│   │   ├── azure_client.py
│   │   ├── piper_client.py
│   │   └── ssml_builder.py
│   ├── stt/               # Speech-to-Text
│   │   ├── groq_whisper_client.py
│   │   └── whisper_client.py
│   ├── memory/            # Conversation memory
│   │   ├── redis_client.py
│   │   └── vector_store.py
│   ├── nlp/               # NLP utilities
│   │   └── sentiment.py
│   ├── orchestrator.py    # Main pipeline
│   └── fallback.py        # Fallback manager
│
├── scripts/               # CLI tools
│   ├── main.py           # Interactive CLI
│   └── transcribe.py     # Transcription tool
│
└── .env                   # Configuration

Tech Stack

Component	Technology	Purpose
Frontend	Streamlit	Web UI
Backend	FastAPI	REST API
LLM	Groq (Llama 3.3 70B)	Response generation
STT	Groq Whisper	Speech recognition
TTS	Azure Neural TTS	Voice synthesis
Memory	Redis	Conversation storage
Embeddings	Sentence Transformers	Repetition detection
Fallback LLM	Ollama	Offline backup
Fallback TTS	Piper	Offline backup

Environment Variables

Variable	Description	Required
`GROQ_API_KEY`	Groq API key for LLM & STT	Yes
`AZURE_SPEECH_KEY`	Azure Speech Services key	Yes
`AZURE_SPEECH_REGION`	Azure region (e.g., eastus)	Yes
`REDIS_HOST`	Redis server host	Yes
`REDIS_PORT`	Redis server port	Yes
`STT_BACKEND`	`groq` or `local`	No (default: groq)
`BACKEND_API_URL`	Backend URL for frontend	No

Roadmap

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Made by the ConversaVoice Team

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
backend		backend
frontend		frontend
scripts		scripts
src		src
.env.example		.env.example
.env.example.backup		.env.example.backup
.gitignore		.gitignore
README.md		README.md
RUNNING.md		RUNNING.md
flutter_integration.md		flutter_integration.md
render.yaml		render.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConversaVoice

Overview

Why ConversaVoice?

Features

🎙️ Voice Input & Output

🧠 Intelligent Responses

💭 Emotional Intelligence

🔄 Conversation Memory

🎨 Expressive Speech (SSML)

🔒 Reliability

Quick Start

Prerequisites

Installation

Configure Environment

Run Locally

Architecture

Data Flow

API Usage

Python SDK

REST API

Emotional Prosody

SSML Example

Project Structure

Tech Stack

Environment Variables

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

Speech-Synthesis/ConversaVoice

Folders and files

Latest commit

History

Repository files navigation

ConversaVoice

Overview

Why ConversaVoice?

Features

🎙️ Voice Input & Output

🧠 Intelligent Responses

💭 Emotional Intelligence

🔄 Conversation Memory

🎨 Expressive Speech (SSML)

🔒 Reliability

Quick Start

Prerequisites

Installation

Configure Environment

Run Locally

Architecture

Data Flow

API Usage

Python SDK

REST API

Emotional Prosody

SSML Example

Project Structure

Tech Stack

Environment Variables

Roadmap

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages