Add auto-play voice streaming to Gradio chat app#4
Open
Conversation
- Add SQLite models for conversations, fragments, summaries, and embeddings - Create cron job script for daily automated ingestion - Implement dual storage to both SQLite and Neo4j - Add idempotent processing with checksum-based deduplication - Include system broadcast alerts for Neo4j failures - Add Gradio UI tab for cron job monitoring - Create integration tests for storage and ingestion flow - Add sample conversation generator for testing This provides a foundation for automated conversation processing with better scalability than Neo4j alone, while maintaining compatibility. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Integrated ElevenLabs API for text-to-speech streaming - Uses Gradio's native audio component with autoplay - Streams audio chunks as sentences complete - Added voice toggle checkbox to enable/disable feature - Shows visual streaming indicator during audio generation - Uses voice ID from eleven.py configuration - Added comprehensive unit tests for voice functionality 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
marklubin
commented
Jun 15, 2025
kairix-engine/gradio_chat.py
Outdated
|
|
||
| try: | ||
| # Perform the text-to-speech conversion | ||
| response = elevenlabs_client.text_to_speech.convert( |
Owner
Author
There was a problem hiding this comment.
isnt it async? you should yield as soon as get chunk
kairix-engine/gradio_chat.py
Outdated
Comment on lines
+104
to
+130
| # If voice is enabled, buffer sentences for TTS | ||
| if enable_voice and elevenlabs_client: | ||
| sentence_buffer += chunk | ||
|
|
||
| # Check for sentence boundaries | ||
| if any(punct in sentence_buffer for punct in ['.', '!', '?']): | ||
| # Get the last complete sentence | ||
| for punct in ['.', '!', '?']: | ||
| if punct in sentence_buffer: | ||
| parts = sentence_buffer.split(punct) | ||
| if len(parts) > 1: | ||
| complete_sentence = parts[0] + punct | ||
| sentence_buffer = punct.join(parts[1:]) | ||
|
|
||
| # Generate audio for complete sentence | ||
| audio_data = text_to_speech(complete_sentence.strip()) | ||
| if audio_data: | ||
| yield "", history, (44100, audio_data), True | ||
| else: | ||
| yield "", history, None, True | ||
| break | ||
| else: | ||
| yield "", history, None, True | ||
| else: | ||
| yield "", history, None, True | ||
| else: | ||
| yield "", history, None, False |
kairix-engine/gradio_chat.py
Outdated
| optimize_streaming_latency="0", | ||
| output_format="mp3_22050_32", | ||
| text=text, | ||
| model_id="eleven_multilingual_v2", |
Owner
Author
There was a problem hiding this comment.
find and use fastest model
kairix-engine/gradio_chat.py
Outdated
Comment on lines
+69
to
+72
| stability=0.0, | ||
| similarity_boost=1.0, | ||
| style=0.0, | ||
| use_speaker_boost=True, |
Owner
Author
There was a problem hiding this comment.
very thes setting match best practices
kairix-engine/gradio_chat.py
Outdated
Comment on lines
+84
to
+87
| except Exception as e: | ||
| logging.error(f"Error in text-to-speech: {e}") | ||
| return None | ||
|
|
Owner
Author
There was a problem hiding this comment.
need to shiow visual indicator of error. add panel with log stream in second tab
kairix-engine/gradio_chat.py
Outdated
Comment on lines
+132
to
+140
| # Process any remaining text for TTS | ||
| if enable_voice and elevenlabs_client and sentence_buffer.strip(): | ||
| audio_data = text_to_speech(sentence_buffer.strip()) | ||
| if audio_data: | ||
| yield "", history, (44100, audio_data), False | ||
| else: | ||
| yield "", history, None, False | ||
| else: | ||
| yield "", history, None, False |
kairix-engine/gradio_chat.py
Outdated
Comment on lines
169
to
173
| audio_output = gr.Audio( | ||
| visible=False, | ||
| autoplay=True, | ||
| elem_id="audio-player" | ||
| ) |
- Made text_to_speech_stream fully async with immediate chunk yielding - Removed all sentence buffering - stream directly to ElevenLabs - Use eleven_flash_v2_5 model for fastest performance (~75ms latency) - Updated voice settings to match best practices (stability=0.5, similarity_boost=0.75, style=0.0) - Added visual error indicator and log stream in second tab - Made audio output component visible - Updated all tests to match new implementation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Changed chat model from o3-mini to gpt-4o-mini in basic_chat.py - Fixed YAML escaping in message_history.py to properly handle special characters - Now using yaml.dump() to ensure proper escaping of quotes, newlines, etc. - All tests passing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
text_to_speech()function that converts text to audio using ElevenLabs APIgr.Audiocomponent withautoplay=Truefor seamless playbackeleven.py(0NkECxcbkydDMspBKvQp)Test Plan
🤖 Generated with Claude Code