This project is an Emotion-Aware Text-to-Speech (TTS) system that detects emotions from input text and generates emotionally aligned speech.
It combines sentiment analysis with speech synthesis parameters (rate, volume, voice) to make the output sound more human-like and empathetic.
- Emotion Detection → Uses
TextBlobto analyze sentiment and map text into happy, sad, or neutral with intensity. - Dynamic Speech Synthesis → Adjusts speech rate, volume, and voice based on detected emotion and intensity.
- FastAPI Backend → Exposes a
/speakAPI endpoint to generate emotional speech. - Streamlit Frontend → Simple UI for users to input text and listen to emotional speech.
- Female Voice Support → Configured
pyttsx3to use a female voice (instead of default male).
.
├── app.py # FastAPI backend (API to generate speech)
├── config.py # Emotion mapping (rate, volume adjustments per emotion)
├── emotion_detector.py # Emotion detection using TextBlob sentiment analysis
├── streamlit_app.py # Streamlit frontend UI
├── tts_engine.py # Text-to-Speech generation logic
- User Input → Text is entered via Streamlit frontend.
- Emotion Detection →
emotion_detector.pyuses sentiment polarity:- Positive → Happy 😃
- Negative → Sad 😢
- Neutral → Neutral 😐
Intensity is scaled between 0 → 1.
- Parameter Mapping → In
config.py, each emotion has base values for:- Speech rate (words per minute)
- Speech volume (loudness)
- Variations depending on intensity
- Speech Synthesis →
tts_engine.pyusespyttsx3to generate speech:- Adjusts rate & volume dynamically
- Selects a female voice (
voices[1].id)
- API Response →
app.pystreams back.wavaudio to the frontend. - Frontend Playback → Streamlit displays detected emotion + plays audio.
Follow these steps to set up and run the project:
git clone https://github.com/Anshul21107/Empathy-Engine
cd Empathy-Enginepython -m venv venv
venv\Scripts\activatepip install -r requirements.txtpython app.pystreamlit run streamlit_app.py- Input:
"I am so excited about this new project!" - Detected: Emotion =
happy, Intensity ≈0.9 - Output: Faster, louder, female-voiced audio with happy tone.
POST /speak
{
"text": "Your input sentence here"
}Response: WAV audio stream + headers:
X-Emotion: Detected emotionX-Intensity: Emotion intensity
- Implemented sentiment-based prosody control (rate, volume, voice).
- Used FastAPI for scalable backend + Streamlit for quick prototyping UI.
- Showed ability to integrate NLP + TTS + Web APIs.
- Designed
EMOTION_MAPfor realistic emotion simulation. - Extended TTS to female voice selection (instead of default male).
Here’s an example of emotionally generated speech of this text: "I am deeply saddened by this unfortunate event. It's truly heartbreaking.":
Developed by Anshul Katiyar

