An AI-powered voice assistant that detects emotional states from speech and responds with appropriate empathy using open-source tools.
- Real-time Speech Recognition - Powered by OpenAI Whisper
- Emotion Detection - Analyzes voice tone, pitch, and prosodic features
- Empathic Response Generation - Context-aware responses using local LLM
- Adaptive Text-to-Speech - Voice output that matches emotional context
- Conversation Memory - Tracks emotional context across interactions
- Privacy-First - All processing happens locally, no data sent to external services
Audio Input → Speech Recognition → Emotion Detection → Response Generation → Text-to-Speech → Audio Output
↓ ↓ ↓ ↓ ↓
Microphone Whisper STT Librosa + Local LLM Piper TTS
ML Classifier (Ollama/HF)
- Happy - Joyful, excited, positive
- Sad - Melancholic, disappointed, down
- Angry - Frustrated, irritated, upset
- Anxious - Worried, stressed, nervous
- Calm - Peaceful, relaxed, content
- Neutral - Balanced, matter-of-fact
- Python 3.8+ (3.9-3.11 recommended)
- FFmpeg (for audio processing)
- At least 4GB RAM (for local LLM)
- Clone or download the project:
cd empathic-voice-companion- Create virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Run the safe installer:
python install.pyThis installer handles dependency compatibility issues automatically.
If you prefer manual installation:
# Install minimal dependencies
pip install -r requirements-minimal.txt
# Then run setup for models
python setup_models.pypython main.pypython app.py
# Open http://localhost:8000 in your browserpython api_server.py
# API available at http://localhost:8001Edit config.yaml to customize:
- Emotion detection sensitivity
- Response personality styles
- Voice models and settings
- Audio input/output devices
empathic-voice-companion/
├── src/
│ ├── speech/
│ │ ├── recognition.py # Whisper STT integration
│ │ └── synthesis.py # Piper TTS integration
│ ├── emotion/
│ │ ├── detector.py # Emotion detection engine
│ │ └── features.py # Audio feature extraction
│ ├── response/
│ │ ├── generator.py # LLM response generation
│ │ └── empathy.py # Empathic response patterns
│ ├── memory/
│ │ └── conversation.py # Conversation history
│ └── utils/
│ ├── audio.py # Audio processing utilities
│ └── config.py # Configuration management
├── models/ # Downloaded AI models
├── data/ # Training data and samples
├── tests/ # Unit tests
├── web/ # Web interface files
├── main.py # Main CLI application
├── app.py # Web application
├── api_server.py # REST API server
├── requirements.txt # Python dependencies
├── config.yaml # Configuration file
└── setup_models.py # Model download script
MIT License - See LICENSE file for details
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
- OpenAI Whisper for speech recognition
- Librosa for audio analysis
- Piper TTS for speech synthesis
- Hugging Face for ML models