-
Notifications
You must be signed in to change notification settings - Fork 52
WIP: Latency Improvements #457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThis pull request introduces a hybrid text aggregation system for TTS backends, combining character-count buffering with optional sentence-boundary detection. Two new aggregator classes handle text chunking, while configuration and integration updates enable the feature across Sarvam and ElevenLabs TTS services through dynamic configuration settings. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| else: | ||
| # Character-count only mode (exactly like Bolna) | ||
| text_aggregator = CharacterCountOnlyAggregator( | ||
| buffer_size=config.min_chars | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep hybrid max_chars when sentence detection disabled
The else branch of if config.enable_sentence_detection switches to CharacterCountOnlyAggregator, so when BB_TEXT_AGGREGATION_TYPE=hybrid and BB_TEXT_AGGREGATION_ENABLE_SENTENCE_DETECTION=false (documented as a hybrid-mode toggle in app/core/config/dynamic.py), the max_chars safety net is skipped. With long unbroken tokens (URLs, IDs, code snippets), this aggregator won’t flush until the final LLM end/flush, reintroducing multi‑second latency and unbounded buffering. Consider using HybridTextAggregator(..., enable_sentence_detection=False) instead of swapping aggregators.
Useful? React with 👍 / 👎.
| else: | ||
| # Character-count only mode (exactly like Bolna) | ||
| text_aggregator = CharacterCountOnlyAggregator( | ||
| buffer_size=config.min_chars | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep hybrid max_chars when sentence detection disabled
The else branch of if config.enable_sentence_detection switches to CharacterCountOnlyAggregator, so when BB_TEXT_AGGREGATION_TYPE=hybrid and BB_TEXT_AGGREGATION_ENABLE_SENTENCE_DETECTION=false (documented as a hybrid-mode toggle in app/core/config/dynamic.py), the max_chars safety net is skipped. With long unbroken tokens (URLs, IDs, code snippets), this aggregator won’t flush until the final LLM end/flush, reintroducing multi‑second latency and unbounded buffering. Consider using HybridTextAggregator(..., enable_sentence_detection=False) instead of swapping aggregators.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (7)
docs/40_CHAR_TEXT_AGGREGATION_IMPLEMENTATION.md (1)
1-482: Well-structured implementation guide.This documentation comprehensively covers the hybrid text aggregation system, including architecture, configuration options, troubleshooting, and performance comparisons. The explanations are clear and actionable.
Minor markdown formatting suggestions based on static analysis:
- Add language specifiers to code blocks (e.g., lines 103, 116, 147, 253, 258) for better syntax highlighting
- Convert the bare URL on line 478 to a markdown link:
[Pipecat Documentation](https://docs.pipecat.ai/)app/ai/voice/agents/breeze_buddy/tts/__init__.py (1)
43-55: Consider extracting shared text aggregation config loading logic.The same logic for loading text aggregation configuration is duplicated in both
get_sarvam_tts_service(lines 43-54) andget_elevenlabs_tts_service(lines 78-89). Consider extracting this into a helper function to reduce duplication.🔎 Proposed refactor
async def _load_text_aggregation_config(): """Load text aggregation configuration from dynamic config.""" aggregation_type = await BB_TEXT_AGGREGATION_TYPE() min_chars = await BB_TEXT_AGGREGATION_MIN_CHARS() max_chars = await BB_TEXT_AGGREGATION_MAX_CHARS() enable_sentence_detection = await BB_TEXT_AGGREGATION_ENABLE_SENTENCE_DETECTION() use_hybrid_aggregator = aggregation_type != "none" if aggregation_type == "character_count": enable_sentence_detection = False return { "use_hybrid_aggregator": use_hybrid_aggregator, "min_chars": min_chars, "max_chars": max_chars, "enable_sentence_detection": enable_sentence_detection, }Then use it in both functions:
aggregation_config = await _load_text_aggregation_config() # ... pass **aggregation_config to the config dataclassapp/ai/voice/agents/breeze_buddy/utils/hybrid_text_aggregator.py (2)
96-108: Redundant check for buffer length.At line 98, the check
if buffer_len > 0is always true at this point since we've already added at least one character to the buffer (line 87). The check can be removed or simplified.🔎 Proposed simplification
# TRIGGER 2: Sentence boundary (if enabled) if self.enable_sentence_detection and char in SENTENCE_ENDING_PUNCTUATION: - if buffer_len > 0: # Only yield if we have content - aggregation = Aggregation( - text=self._buffer.strip(), - type=AggregationType.SENTENCE - ) - logger.debug( - f"Yielding sentence boundary: '{aggregation.text}' ({len(aggregation.text)} chars)" - ) - self._buffer = "" - yield aggregation + aggregation = Aggregation( + text=self._buffer.strip(), + type=AggregationType.SENTENCE + ) + logger.debug( + f"Yielding sentence boundary: '{aggregation.text}' ({len(aggregation.text)} chars)" + ) + self._buffer = "" + yield aggregation continue
204-265: Consider adding a max_chars safety net to CharacterCountOnlyAggregator.Unlike
HybridTextAggregator, this class has no maximum character limit. If text is generated without natural break characters, the buffer could grow very large untilflush()is called. While unlikely in practice, consider adding amax_charssafety similar to the hybrid aggregator for consistency.🔎 Proposed enhancement
class CharacterCountOnlyAggregator(BaseTextAggregator): - def __init__(self, buffer_size: int = 40): + def __init__(self, buffer_size: int = 40, max_chars: int = 500): super().__init__() self._buffer = "" self.buffer_size = buffer_size + self.max_chars = max_chars logger.info(f"Initialized CharacterCountOnlyAggregator with buffer_size={buffer_size}") async def aggregate(self, text: str) -> AsyncIterator[Aggregation]: for char in text: self._buffer += char + # Safety net: force yield if buffer gets too large + if len(self._buffer) >= self.max_chars: + aggregation = Aggregation( + text=self._buffer.strip(), + type=AggregationType.WORD + ) + logger.warning( + f"Force yielding at max_chars: '{aggregation.text}' ({len(aggregation.text)} chars)" + ) + self._buffer = "" + yield aggregation + continue + # Yield when we reach buffer_size AND current char is a natural break if len(self._buffer) >= self.buffer_size and char in NATURAL_BREAK_CHARS: # ... existing codeapp/core/config/dynamic.py (1)
151-163: Consider adding validation for BB_TEXT_AGGREGATION_TYPE.The getter allows any string value, but only "none", "hybrid", and "character_count" are valid. Invalid values would result in
use_hybrid_aggregator = True(since they're != "none") which might not be the intended behavior for typos.🔎 Proposed validation
async def BB_TEXT_AGGREGATION_TYPE() -> str: """ Returns BB_TEXT_AGGREGATION_TYPE from Redis. Options: - "none": Use default Pipecat SimpleTextAggregator (sentence-only) - "hybrid": Use HybridTextAggregator (40-char + sentence boundaries) - "character_count": Use CharacterCountOnlyAggregator (pure 40-char buffering) Default: "hybrid" (recommended for balanced latency and quality) """ value = await get_config("BB_TEXT_AGGREGATION_TYPE", "hybrid", str) valid_options = {"none", "hybrid", "character_count"} if value not in valid_options: logger.warning( f"Invalid BB_TEXT_AGGREGATION_TYPE '{value}', falling back to 'hybrid'. " f"Valid options: {valid_options}" ) return "hybrid" return valueapp/ai/voice/tts/elevenlabs.py (1)
50-75: Minor: Redundant parameter in HybridTextAggregator instantiation.At line 58,
enable_sentence_detection=Trueis explicitly set, but this is already the condition for entering this branch (line 53). Consider usingconfig.enable_sentence_detectionfor consistency, even though the value is always True here.🔎 Proposed change
text_aggregator = HybridTextAggregator( min_chars=config.min_chars, max_chars=config.max_chars, - enable_sentence_detection=True, + enable_sentence_detection=config.enable_sentence_detection, )docs/BREEZE_BUDDY_STT_LLM_TTS_COMPLETE_ANALYSIS.md (1)
1-2118: Comprehensive technical analysis.This is an excellent deep-dive document covering the entire voice pipeline architecture. The clarification section (lines 2019-2096) is particularly valuable for dispelling the 50-character buffer misconception.
Minor markdown formatting issues flagged by static analysis could be addressed:
- Add language specifiers to code blocks (e.g., lines 39, 76, 170, 458, 587, etc.)
- Use consistent strong marker style (asterisks instead of underscores)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
app/ai/voice/agents/breeze_buddy/tts/__init__.pyapp/ai/voice/agents/breeze_buddy/utils/hybrid_text_aggregator.pyapp/ai/voice/tts/elevenlabs.pyapp/ai/voice/tts/sarvam.pyapp/core/config/dynamic.pydocs/40_CHAR_TEXT_AGGREGATION_IMPLEMENTATION.mddocs/BREEZE_BUDDY_STT_LLM_TTS_COMPLETE_ANALYSIS.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-12T10:56:55.990Z
Learnt from: Swetha-160303
Repo: juspay/clairvoyance PR: 424
File: app/ai/voice/agents/automatic/tts/__init__.py:132-152
Timestamp: 2025-12-12T10:56:55.990Z
Learning: In app/ai/voice/agents/automatic/tts/__init__.py, FishAudioTTSService does not support the text_filters parameter, unlike ElevenLabsTTSService and GoogleTTSService which do support it.
Applied to files:
app/ai/voice/tts/elevenlabs.pyapp/ai/voice/agents/breeze_buddy/tts/__init__.py
🧬 Code graph analysis (4)
app/ai/voice/tts/elevenlabs.py (1)
app/ai/voice/agents/breeze_buddy/utils/hybrid_text_aggregator.py (4)
text(67-69)text(225-227)CharacterCountOnlyAggregator(204-275)HybridTextAggregator(28-201)
app/ai/voice/tts/sarvam.py (1)
app/ai/voice/agents/breeze_buddy/utils/hybrid_text_aggregator.py (4)
text(67-69)text(225-227)HybridTextAggregator(28-201)CharacterCountOnlyAggregator(204-275)
app/ai/voice/agents/breeze_buddy/tts/__init__.py (2)
app/core/config/dynamic.py (4)
BB_TEXT_AGGREGATION_ENABLE_SENTENCE_DETECTION(184-190)BB_TEXT_AGGREGATION_MAX_CHARS(175-181)BB_TEXT_AGGREGATION_MIN_CHARS(166-172)BB_TEXT_AGGREGATION_TYPE(152-163)app/ai/voice/tts/elevenlabs.py (2)
build_elevenlabs_tts(38-86)ElevenLabsConfig(22-35)
app/core/config/dynamic.py (1)
app/services/live_config/store.py (1)
get_config(211-235)
🪛 LanguageTool
docs/BREEZE_BUDDY_STT_LLM_TTS_COMPLETE_ANALYSIS.md
[style] ~252-~252: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...o amplitude to consider - Filters out very quiet background noise ### Sarvam STT Config...
(EN_WEAK_ADJECTIVE)
[grammar] ~1512-~1512: Ensure spelling is correct
Context: ...PT-3.5-turbo vs GPT-4 - Savings: ~50-100ms first token - Risk: Lower quality re...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
docs/40_CHAR_TEXT_AGGREGATION_IMPLEMENTATION.md
103-103: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
116-116: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
147-147: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
253-253: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
258-258: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
289-289: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
340-340: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
357-357: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
374-374: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
478-478: Bare URL used
(MD034, no-bare-urls)
docs/BREEZE_BUDDY_STT_LLM_TTS_COMPLETE_ANALYSIS.md
39-39: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
76-76: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
112-112: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
112-112: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
138-138: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
138-138: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
170-170: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
458-458: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
509-509: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
587-587: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
669-669: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
669-669: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
687-687: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
687-687: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
857-857: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
1230-1230: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
1341-1341: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
1533-1533: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
1599-1599: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
1694-1694: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
1857-1857: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1863-1863: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
2059-2059: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (10)
app/ai/voice/agents/breeze_buddy/tts/__init__.py (2)
32-71: LGTM - Text aggregation integration for Sarvam TTS.The configuration loading and parameter passing to
SarvamTTSConfigis correctly implemented. The logic to disable sentence detection forcharacter_countmode aligns with the expected behavior documented in the implementation guide.
74-104: LGTM - Text aggregation integration for ElevenLabs TTS.The implementation mirrors the Sarvam TTS service correctly, ensuring consistent behavior across both TTS backends.
app/ai/voice/agents/breeze_buddy/utils/hybrid_text_aggregator.py (2)
122-163: LGTM - Break point handling with force fallback.The
_yield_at_break_pointmethod correctly handles splitting at natural break points with a force fallback. The empty Aggregation return provides a safe default for edge cases.
266-275: LGTM - Interruption and reset handlers.The interruption and reset handlers correctly clear the buffer and provide appropriate logging for debugging.
app/core/config/dynamic.py (1)
166-190: LGTM - Numeric and boolean config getters.The min_chars, max_chars, and enable_sentence_detection getters are correctly implemented with sensible defaults that match the documentation.
app/ai/voice/tts/elevenlabs.py (2)
31-35: LGTM - New text aggregation configuration fields.The new configuration fields are well-documented with sensible defaults matching the latency optimization goals.
76-86: Verify pipecat-ai version supportstext_aggregatorparameter.The
text_aggregatorparameter passed toElevenLabsTTSServicerequires pipecat-ai version >= 0.0.95. The requirements.txt does not pin a specific version, so verify that the deployed environment uses a compatible version (>= 0.0.95) that supports this parameter.app/ai/voice/tts/sarvam.py (3)
99-103: LGTM - New text aggregation configuration fields for Sarvam.The configuration fields mirror those in ElevenLabsConfig, ensuring consistent behavior across TTS backends.
142-167: Consistent implementation with ElevenLabs.The aggregator selection logic correctly mirrors the ElevenLabs implementation. The same minor note applies:
enable_sentence_detection=Trueat line 150 could useconfig.enable_sentence_detectionfor consistency.
168-179: text_aggregator parameter is correctly passed through inheritance.LanguageAwareSarvamTTSextendsSarvamTTSServicewithout overriding__init__, so the parent class constructor directly receives and handles thetext_aggregatorparameter. This follows the same pattern asElevenLabsTTSServicein the codebase, confirming the pipecat TTS services support this parameter.
| │ 🔵 BUFFER POINT 2: TTS Text Aggregation │ | ||
| │ │ | ||
| │ Collects: LLMTextFrame objects │ | ||
| │ Size: 50 chars OR sentence boundary │ | ||
| │ │ | ||
| │ Buffer: "Hello, how can I help you today" │ | ||
| │ ↓ (reaches '?' or 50+ chars) │ | ||
| │ Output: Send to TTS WebSocket │ | ||
| └────────────┬──────────────────────────────────────┘ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistency in buffering description.
Lines 1273-1275 state "Size: 50 chars OR sentence boundary" which contradicts the critical clarification section starting at line 2019 that correctly explains Pipecat only uses sentence boundaries (not 50-char threshold). Consider updating this section to match the clarification.
🔎 Proposed fix
│ 🔵 BUFFER POINT 2: TTS Text Aggregation │
│ │
│ Collects: LLMTextFrame objects │
-│ Size: 50 chars OR sentence boundary │
+│ Trigger: Sentence boundary ONLY (client-side) │
+│ Note: min_buffer_size=50 is server-side (Sarvam) │
│ │
│ Buffer: "Hello, how can I help you today" │
-│ ↓ (reaches '?' or 50+ chars) │
+│ ↓ (reaches '?' - sentence boundary) │
│ Output: Send to TTS WebSocket │🤖 Prompt for AI Agents
In docs/BREEZE_BUDDY_STT_LLM_TTS_COMPLETE_ANALYSIS.md around lines 1267 to 1275,
the TTS Text Aggregation block incorrectly states "Size: 50 chars OR sentence
boundary"; change this to match the clarification at ~line 2019 by removing the
50-char threshold and documenting that Pipecat aggregates by sentence boundary
only (e.g., "Size: sentence boundary only"), and update the example
text/annotations if needed so the buffer/output explanation aligns with
sentence-boundary behavior.
34fc755 to
86b7e9e
Compare
Summary by CodeRabbit
Release Notes
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.