-
Notifications
You must be signed in to change notification settings - Fork 52
feat: Latency imporvements for buddy #455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request introduces a comprehensive latency tracking and optimization system for the Breeze Buddy voice agent. It adds latency tracking infrastructure (LatencyTracker, data models), frame processors for STT/LLM/TTS, buffered LLM streaming for parallel synthesis, configuration flags, and documentation. Changes
Sequence Diagram(s)sequenceDiagram
participant AudioFrame as AudioRawFrame
participant STTProc as STTLatencyProcessor
participant LLMProc as LLMLatencyProcessor
participant TTSProc as TTSLatencyProcessor
participant Tracker as LatencyTracker
Note over STTProc,Tracker: STT Phase
AudioFrame->>STTProc: First frame arrives
STTProc->>Tracker: Track STT start
Note over STTProc,Tracker: Transcription arrives
STTProc->>STTProc: Count frames, measure TTFB
STTProc->>Tracker: track_component(STT, TTFB, duration, metadata)
Note over LLMProc,Tracker: LLM Phase
LLMProc->>Tracker: Track LLM start on LLMRunFrame
LLMProc->>LLMProc: Capture first token time, count tokens
LLMProc->>Tracker: track_component(LLM, TTFB, duration, metadata)
Note over TTSProc,Tracker: TTS Phase
TTSProc->>Tracker: Track TTS start on TTSStartedFrame
TTSProc->>TTSProc: Measure audio chunks, bytes, TTFB
TTSProc->>Tracker: track_component(TTS, TTFB, duration, audio_stats)
TTSProc->>Tracker: end_turn() finalizes and computes total latency
Tracker->>Tracker: Log summary with percentiles & export to Langfuse
sequenceDiagram
participant Client as LLM Client
participant LLMWrapper as BreezeBuddyLLMWrapper
participant BaseStream as Base LLM Stream
participant Buffer as BufferedLLMStreamWrapper
participant TTS as TTS Consumer
Note over Client,Buffer: LLM Streaming Flow (Buffer Enabled)
Client->>LLMWrapper: _stream_chat_completions(context)
LLMWrapper->>BaseStream: Request stream
BaseStream-->>Buffer: Raw token chunks
Note over Buffer: Buffer accumulation
Buffer->>Buffer: Accumulate text in buffer
Buffer->>Buffer: Check thresholds (buffer_size/min_buffer_size)
Buffer->>Buffer: Align to word boundaries (optional)
par Parallel Synthesis
Buffer-->>TTS: Emit buffered chunk (TTFB improved)
TTS->>TTS: Start TTS synthesis in parallel
end
Buffer->>Buffer: Yield remaining on completion
Buffer-->>LLMWrapper: Final content
LLMWrapper-->>Client: Streamed response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if len(split_result) == 2: | ||
| # Found a word boundary | ||
| chunk_to_yield = split_result[0] | ||
| self.buffer = split_result[1] | ||
|
|
||
| # Only yield if chunk is substantial enough | ||
| if len(chunk_to_yield) >= self.min_buffer_size: | ||
| return chunk_to_yield |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve buffer when boundary chunk is too small
The word-boundary extraction mutates self.buffer before it knows whether it will emit a chunk. If the prefix is shorter than min_buffer_size, the method returns None after already assigning self.buffer = split_result[1], which permanently discards the prefix (and the separating space). Any stream where the last space falls early in the buffer will lose text and produce concatenated/incorrect speech. Consider only slicing the buffer once you’re sure the chunk will be emitted, or reattach the dropped prefix when skipping.
Useful? React with 👍 / 👎.
| logger.info( | ||
| f"[LLM Latency] Turn {self.current_turn_id}: " | ||
| f"TTFB={first_byte_latency:.0f}ms, " | ||
| f"total={total_duration:.0f}ms, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard LLM latency logging when no first token arrives
The log message formats first_byte_latency with .0f unconditionally. If no TTSSpeakFrame arrives before LLMFullResponseEndFrame (e.g., empty LLM response, TTS disabled, or a downstream error), first_byte_latency remains None and this f-string raises TypeError, interrupting frame processing. A fallback string/0 or a conditional log avoids crashing the pipeline in these cases.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (7)
app/ai/voice/agents/breeze_buddy/processors/__init__.py (1)
1-19: LGTM!The package initialization correctly re-exports the latency tracking processors and factory function. The module structure is clean and follows standard Python packaging conventions.
The static analysis hint about sorting
__all__is a minor style preference that can be addressed optionally.Optional: Sort __all__ alphabetically
__all__ = [ + "create_latency_processors", "STTLatencyProcessor", "LLMLatencyProcessor", "TTSLatencyProcessor", - "create_latency_processors", ]app/ai/voice/agents/breeze_buddy/utils/llm_buffer_streaming.py (1)
143-178: Consider usingClassVarfor mutable class attributes.The static analysis correctly identifies that mutable class attributes (dicts) should be annotated with
ClassVarto clarify they are class-level, not instance-level.Suggested fix
+from typing import ClassVar, Dict + class LLMBufferConfig: """Configuration for LLM buffer-based streaming.""" # Aggressive (lowest latency, may cut words) - AGGRESSIVE = { + AGGRESSIVE: ClassVar[Dict[str, int | bool]] = { "buffer_size": 30, "min_buffer_size": 15, "enable_word_boundary": True } # Balanced (good latency, preserves words) - BALANCED = { + BALANCED: ClassVar[Dict[str, int | bool]] = { "buffer_size": 40, "min_buffer_size": 20, "enable_word_boundary": True } # Conservative (higher quality, slightly more latency) - CONSERVATIVE = { + CONSERVATIVE: ClassVar[Dict[str, int | bool]] = { "buffer_size": 60, "min_buffer_size": 30, "enable_word_boundary": True }app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py (3)
68-68: Remove extraneousfprefix from string without placeholders.- logger.trace(f"[STT Latency] Audio input started") + logger.trace("[STT Latency] Audio input started")
263-263: Remove extraneousfprefix from string without placeholders.- logger.trace(f"[TTS Latency] TTS started") + logger.trace("[TTS Latency] TTS started")
178-181: PotentialIndexErrorifframe.textis shorter than 50 characters.String slicing in Python is safe for short strings (returns available characters), but the log message appends
...which may be misleading for short texts.Suggested improvement
text_preview = frame.text[:50] + "..." if len(frame.text) > 50 else frame.text logger.debug( f"[LLM Latency] First token received: {ttfb:.0f}ms, " f"text='{text_preview}'" )app/ai/voice/agents/breeze_buddy/utils/latency_tracker.py (1)
224-226: Use descriptive variable names instead ofl.The variable
lis ambiguous and can be confused with1(one) in some fonts.Suggested fix
# Extract TTFB and total duration - ttfb_values = [l.first_byte_latency_ms for l in latencies if l.first_byte_latency_ms is not None] - total_values = [l.total_duration_ms for l in latencies if l.total_duration_ms is not None] + ttfb_values = [lat.first_byte_latency_ms for lat in latencies if lat.first_byte_latency_ms is not None] + total_values = [lat.total_duration_ms for lat in latencies if lat.total_duration_ms is not None]docs/LATENCY_OPTIMIZATION.md (1)
91-103: Add language specifiers to fenced code blocks for proper syntax highlighting.Several code blocks in the document are missing language specifiers, which affects readability when rendered.
Suggested fix for lines 91-103
-``` +```text User speaks → VAD detects end → STT finalizes → LLM starts → Response ↑ Waiting for complete transcriptWith interim results:
-+text
User speaks → Interim results → LLM starts early → Response
↑
Processing begins while user still speakingSimilar fixes needed for lines 298, 327, 332, and 490.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
app/ai/voice/agents/breeze_buddy/processors/__init__.pyapp/ai/voice/agents/breeze_buddy/processors/latency_tracking.pyapp/ai/voice/agents/breeze_buddy/services/__init__.pyapp/ai/voice/agents/breeze_buddy/services/llm_wrapper.pyapp/ai/voice/agents/breeze_buddy/utils/latency_tracker.pyapp/ai/voice/agents/breeze_buddy/utils/llm_buffer_streaming.pyapp/core/config/static.pydocs/BOLNA_VS_BREEZE_BUDDY_OPTIMIZATION_GAP_ANALYSIS.mddocs/LATENCY_OPTIMIZATION.md
🧰 Additional context used
🧬 Code graph analysis (4)
app/ai/voice/agents/breeze_buddy/services/__init__.py (1)
app/ai/voice/agents/breeze_buddy/services/llm_wrapper.py (1)
BreezeBuddyLLMWrapper(22-48)
app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py (1)
app/ai/voice/agents/breeze_buddy/utils/latency_tracker.py (4)
LatencyTracker(49-362)track_component(131-186)start_turn(82-95)end_turn(97-129)
app/ai/voice/agents/breeze_buddy/services/llm_wrapper.py (2)
app/ai/voice/agents/breeze_buddy/utils/llm_buffer_streaming.py (4)
BufferedLLMStreamWrapper(16-140)LLMBufferConfig(143-178)get_config(168-178)stream_with_buffer(45-106)app/ai/voice/agents/breeze_buddy/template/context.py (1)
context(72-74)
app/ai/voice/agents/breeze_buddy/processors/__init__.py (1)
app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py (4)
STTLatencyProcessor(30-121)LLMLatencyProcessor(124-220)TTSLatencyProcessor(223-317)create_latency_processors(320-357)
🪛 LanguageTool
docs/LATENCY_OPTIMIZATION.md
[grammar] ~3-~3: Ensure spelling is correct
Context: ...ucing voice conversation latency by 600-700ms** --- ## 📋 Table of Contents 1. [Overview](#ove...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~70-~70: Ensure spelling is correct
Context: ... 3:** Test - Make a call and notice 200-400ms faster response times! ✅ **That's it!*...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~82-~82: Ensure spelling is correct
Context: ...rim Results (DETAILED) Impact: 200-400ms latency reduction Effort: 5-10 minu...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~318-~318: Ensure spelling is correct
Context: ...: LLM Buffer Streaming Impact: 200-300ms latency reduction Effort: 4-6 hours...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~473-~473: Ensure spelling is correct
Context: ... in logs - ✅ P95 latency reduced by 200-400ms - ✅ No STT accuracy degradation **Phase 2...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~573-~573: Ensure spelling is correct
Context: ...es to .env - Restart server - Get 200-400ms improvement! Phase 2 (2-3 hours): ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
docs/BOLNA_VS_BREEZE_BUDDY_OPTIMIZATION_GAP_ANALYSIS.md
[grammar] ~23-~23: Ensure spelling is correct
Context: ...ruption handling with sequence IDs (100-200ms reduction) 7. Implement smart caching f...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~24-~24: Ensure spelling is correct
Context: ...nt smart caching for common phrases (50-200ms reduction) **Larger Projects (1-2 week...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~27-~27: Ensure spelling is correct
Context: ...igrate to queue-based architecture (100-300ms reduction) 9. Add intelligent buffering...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~28-~28: Ensure spelling is correct
Context: ...gent buffering throughout pipeline (100-200ms reduction) --- ## 📊 DETAILED GAP ANA...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~65-~65: Ensure spelling is correct
Context: ...finalizes the transcript. This adds 200-400ms per turn. Recommendation: ⭐ **IMME...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~148-~148: Ensure spelling is correct
Context: ...ntations are well-tuned. Breeze Buddy's 300ms is actually slightly faster. **Recomme...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~655-~655: Ensure spelling is correct
Context: ...s) could be pre-synthesized, saving 100-200ms each time. Recommendation: ⭐ **MED...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~770-~770: Ensure spelling is correct
Context: ...Estimated Total Latency Reduction: 500-900ms* ### 1.1 Enable Soniox Interim Results ⭐⭐⭐ - ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~774-~774: Ensure spelling is correct
Context: ...Effort: 5 minutes - Impact: 200-400ms reduction - Files: .env - **Actio...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~798-~798: Ensure spelling is correct
Context: ... - Effort: 4 hours - Impact: 50-100ms reduction - Files: `services/llm_wr...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~812-~812: Ensure spelling is correct
Context: ...Estimated Total Latency Reduction: 200-500ms* ### 2.1 Integrate Latency Tracking ⭐⭐ - **Ef...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~816-~816: Ensure spelling is correct
Context: ...ng ⭐⭐ - Effort: 1 day - Impact: 0ms (visibility only, enables future optimi...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~828-~828: Ensure spelling is correct
Context: ...- Effort: 3-5 days - Impact: 50-200ms per cached phrase - Files: Create `...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~842-~842: Ensure spelling is correct
Context: ...Estimated Total Latency Reduction: 200-500ms* ### 3.1 Hybrid Queue Architecture ⭐⭐⭐ - **Ef...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~846-~846: Ensure spelling is correct
Context: ...Effort: 2-3 weeks - Impact: 100-300ms reduction - Files: Significant refa...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~854-~854: Ensure spelling is correct
Context: ....3-0.5s audio chunks (4096-8192 bytes @ 8kHz) ### 3.3 Add Filler Sounds & Backchann...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~886-~886: Ensure spelling is correct
Context: ...4 hours) Expected Improvement: 250-500ms reduction + full visibility into remain...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
docs/LATENCY_OPTIMIZATION.md
3-3: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
24-24: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
30-30: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
91-91: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
98-98: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
189-189: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
189-189: Strong style
Expected: asterisk; Actual: underscore
(MD050, strong-style)
298-298: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
327-327: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
332-332: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
490-490: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/BOLNA_VS_BREEZE_BUDDY_OPTIMIZATION_GAP_ANALYSIS.md
561-561: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
770-770: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
812-812: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
842-842: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.14.10)
app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py
68-68: f-string without any placeholders
Remove extraneous f prefix
(F541)
263-263: f-string without any placeholders
Remove extraneous f prefix
(F541)
app/ai/voice/agents/breeze_buddy/utils/llm_buffer_streaming.py
147-151: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
154-158: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
161-165: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
app/ai/voice/agents/breeze_buddy/processors/__init__.py
14-19: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
app/ai/voice/agents/breeze_buddy/utils/latency_tracker.py
225-225: Ambiguous variable name: l
(E741)
226-226: Ambiguous variable name: l
(E741)
324-324: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (12)
app/core/config/static.py (1)
134-145: LGTM!The new configuration constants follow the established patterns in this file. Good defaults:
- Latency tracking enabled by default (visibility without side effects)
- Buffer streaming disabled by default (opt-in for the more experimental feature)
- Sensible 40-char buffer size matching the Bolna reference implementation
docs/BOLNA_VS_BREEZE_BUDDY_OPTIMIZATION_GAP_ANALYSIS.md (1)
1-904: Comprehensive and well-structured analysis document.The gap analysis provides valuable context for latency optimization priorities with clear effort/impact assessments and actionable recommendations. The phased roadmap aligns well with the implementation code in this PR.
app/ai/voice/agents/breeze_buddy/utils/llm_buffer_streaming.py (2)
121-140: Edge case: buffer with no spaces returns None indefinitely.If the buffer contains text without any spaces (e.g., a very long word or URL) and is between
buffer_sizeandbuffer_size * 2,_extract_chunk_at_boundaryreturnsNone, causing chunks to accumulate until the 2x threshold. This is acceptable behavior but worth noting.The force-yield at 2x buffer_size (lines 133-138) is a good safeguard that prevents unbounded growth.
45-106: Well-implemented async streaming with proper error handling.Good practices observed:
- Buffer reset at start of each stream (line 60-61)
- Proper
CancelledErrorhandling that discards partial state- Exception handler yields remaining buffer before propagating
- Informative logging at key points
app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py (1)
320-357: Well-designed factory function with clear documentation.The
create_latency_processorsfunction provides a clean API with a helpful docstring example showing pipeline integration.app/ai/voice/agents/breeze_buddy/utils/latency_tracker.py (2)
290-325: Langfuse export handles errors gracefully.The broad
Exceptioncatch on line 324 is appropriate here since export failures should not crash the application, and it logs the error for debugging.
49-80: Well-structured LatencyTracker initialization.The tracker properly initializes session state, turn tracking structures, and connection latency storage. The use of
defaultdictfor component latencies is appropriate.app/ai/voice/agents/breeze_buddy/services/__init__.py (1)
1-13: LGTM!Clean package initialization that correctly re-exports
BreezeBuddyLLMWrapperfor convenient access.docs/LATENCY_OPTIMIZATION.md (1)
1-597: Excellent implementation guide with clear phased approach.The documentation provides:
- Clear baseline state assessment
- Actionable steps with specific file references
- Expected outcomes with measurable targets
- Troubleshooting guidance
This aligns well with the actual implementation code in the PR.
app/ai/voice/agents/breeze_buddy/services/llm_wrapper.py (3)
1-6: LGTM!Clear module docstring explaining the latency optimization purpose.
8-19: LGTM!Imports are well-organized and all appear to be used.
38-48: This code is not currently used in the agent implementation.
BreezeBuddyLLMWrapperis defined and exported but never instantiated. The actual agents (agent.pyandwebsocket_bot.py) directly useAzureLLMServiceinstead. The buffer streaming feature is not integrated into the active codebase.If this class is reactivated in the future, the original concurrency concern about the shared
buffer_wrapperinstance would be valid—multiple concurrent streams would corrupt buffer state sincestream_with_bufferresets mutable state at the start. However, this is not a current issue as the code is not deployed.Likely an incorrect or invalid review comment.
| logger.info( | ||
| f"[STT Latency] Turn {turn_id}: " | ||
| f"TTFB={first_byte_latency:.0f}ms, " | ||
| f"total={total_duration:.0f}ms, " | ||
| f"transcript='{frame.text[:50]}...'" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential TypeError when first_byte_latency is None.
If no interim results are received before the final transcription, first_byte_latency will be None, and formatting it with :.0f will raise a TypeError.
Suggested fix
logger.info(
f"[STT Latency] Turn {turn_id}: "
- f"TTFB={first_byte_latency:.0f}ms, "
+ f"TTFB={first_byte_latency:.0f}ms, " if first_byte_latency else "TTFB=N/A, "
f"total={total_duration:.0f}ms, "
f"transcript='{frame.text[:50]}...'"
)Or more cleanly:
ttfb_str = f"{first_byte_latency:.0f}ms" if first_byte_latency else "N/A"
logger.info(
f"[STT Latency] Turn {turn_id}: "
f"TTFB={ttfb_str}, "
f"total={total_duration:.0f}ms, "
f"transcript='{frame.text[:50]}...'"
)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py around lines
109-114, the logger formats first_byte_latency with "{:.0f}" which raises a
TypeError when first_byte_latency is None; change the logging to format TTFB
conditionally (e.g., ttfb_str = f"{first_byte_latency:.0f}ms" if
first_byte_latency is not None else "N/A") and then use ttfb_str in the
f-string, leaving total_duration formatted as before and preserving the
transcript slice.
| logger.info( | ||
| f"[LLM Latency] Turn {self.current_turn_id}: " | ||
| f"TTFB={first_byte_latency:.0f}ms, " | ||
| f"total={total_duration:.0f}ms, " | ||
| f"tokens={self.tokens_count}" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same None formatting issue for LLM first_byte_latency.
Apply the same fix as for STT to handle cases where no TTSSpeakFrame is received before LLMFullResponseEndFrame.
Suggested fix
ttfb_str = f"{first_byte_latency:.0f}ms" if first_byte_latency else "N/A"
logger.info(
f"[LLM Latency] Turn {self.current_turn_id}: "
f"TTFB={ttfb_str}, "
f"total={total_duration:.0f}ms, "
f"tokens={self.tokens_count}"
)🤖 Prompt for AI Agents
In app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py around lines
207 to 212, the logger formats first_byte_latency which can be None causing
"None" in output; change the formatting to display a fallback like "N/A" when
first_byte_latency is falsy (e.g., compute ttfb_str =
f"{first_byte_latency:.0f}ms" if first_byte_latency else "N/A") and use ttfb_str
in the f-string instead of formatting first_byte_latency directly.
| logger.info( | ||
| f"[TTS Latency] Turn {turn_id}: " | ||
| f"TTFB={first_byte_latency:.0f}ms, " | ||
| f"total={total_duration:.0f}ms, " | ||
| f"chunks={self.audio_chunks_count}" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same None formatting issue for TTS first_byte_latency.
Apply the same fix as for STT/LLM to handle cases where no TTSAudioRawFrame is received before TTSStoppedFrame.
Suggested fix
ttfb_str = f"{first_byte_latency:.0f}ms" if first_byte_latency else "N/A"
logger.info(
f"[TTS Latency] Turn {turn_id}: "
f"TTFB={ttfb_str}, "
f"total={total_duration:.0f}ms, "
f"chunks={self.audio_chunks_count}"
)🤖 Prompt for AI Agents
In app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py around lines
301 to 306, the logger formats first_byte_latency directly which can produce
"None" when no TTSAudioRawFrame was received; change the logging to compute a
ttfb_str that is "N/A" when first_byte_latency is falsy/None (e.g., ttfb_str =
f"{first_byte_latency:.0f}ms" if first_byte_latency else "N/A") and then use
that ttfb_str in the f-string for TTFB while keeping the rest of the fields the
same.
| if self.enable_buffer_streaming: | ||
| config = LLMBufferConfig.get_config("balanced") | ||
| config["buffer_size"] = BREEZE_BUDDY_LLM_BUFFER_SIZE | ||
| self.buffer_wrapper = BufferedLLMStreamWrapper(**config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutating shared class-level config dictionary.
LLMBufferConfig.get_config("balanced") returns the class attribute dictionary directly, not a copy. Assigning to config["buffer_size"] mutates LLMBufferConfig.BALANCED permanently, affecting all subsequent callers.
🔎 Proposed fix
if self.enable_buffer_streaming:
- config = LLMBufferConfig.get_config("balanced")
- config["buffer_size"] = BREEZE_BUDDY_LLM_BUFFER_SIZE
+ config = LLMBufferConfig.get_config("balanced").copy()
+ config["buffer_size"] = BREEZE_BUDDY_LLM_BUFFER_SIZE
self.buffer_wrapper = BufferedLLMStreamWrapper(**config)🤖 Prompt for AI Agents
In app/ai/voice/agents/breeze_buddy/services/llm_wrapper.py around lines 30 to
33, the code calls LLMBufferConfig.get_config("balanced") which returns the
class-level dictionary directly and then mutates config["buffer_size"],
unintentionally changing the shared BALANCED config for all callers; fix this by
creating a shallow (or deep, if nested) copy of the returned config before
modifying it (e.g., config = dict(LLMBufferConfig.get_config("balanced")) or use
copy.deepcopy(...)) and then set config["buffer_size"] and pass that copy into
BufferedLLMStreamWrapper so the class-level config remains unchanged.
| def track_component( | ||
| self, | ||
| component: str, | ||
| first_byte_latency_ms: Optional[float] = None, | ||
| total_duration_ms: Optional[float] = None, | ||
| turn_id: Optional[str] = None, | ||
| sequence_id: Optional[int] = None, | ||
| metadata: Optional[Dict] = None | ||
| ) -> ComponentLatency: | ||
| """ | ||
| Track latency for a specific component. | ||
|
|
||
| Args: | ||
| component: Component name ("stt", "llm", "tts") | ||
| first_byte_latency_ms: Time to first byte/token | ||
| total_duration_ms: Total processing duration | ||
| turn_id: Turn identifier (uses current if None) | ||
| sequence_id: Sequence ID for this component execution | ||
| metadata: Additional metadata (provider, model, etc.) | ||
|
|
||
| Returns: | ||
| ComponentLatency object | ||
| """ | ||
| turn_id = turn_id or self.current_turn_id | ||
| if not turn_id: | ||
| logger.warning(f"[Latency] Cannot track {component}: no active turn") | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return type annotation is inconsistent with actual behavior.
The method signature declares -> ComponentLatency but can return None on line 157 when no active turn exists.
Suggested fix
def track_component(
self,
component: str,
first_byte_latency_ms: Optional[float] = None,
total_duration_ms: Optional[float] = None,
turn_id: Optional[str] = None,
sequence_id: Optional[int] = None,
metadata: Optional[Dict] = None
- ) -> ComponentLatency:
+ ) -> Optional[ComponentLatency]:🤖 Prompt for AI Agents
In app/ai/voice/agents/breeze_buddy/utils/latency_tracker.py around lines 131 to
157, the track_component method is annotated to return ComponentLatency but
actually returns None when there is no active turn; change the return type
annotation to Optional[ComponentLatency] (and update any imports if needed),
update the docstring to document that None may be returned when there is no
active turn, and run a quick grep to ensure callsites handle the Optional return
(or adjust callers to handle None).
| logger.debug( | ||
| f"[Latency] {component.upper()} tracked for turn {turn_id}: " | ||
| f"TTFB={first_byte_latency_ms:.0f}ms, total={total_duration_ms:.0f}ms" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential TypeError when formatting None values.
If first_byte_latency_ms or total_duration_ms is None, the :.0f formatting will fail.
Suggested fix
logger.debug(
f"[Latency] {component.upper()} tracked for turn {turn_id}: "
- f"TTFB={first_byte_latency_ms:.0f}ms, total={total_duration_ms:.0f}ms"
+ f"TTFB={first_byte_latency_ms:.0f}ms, total={total_duration_ms:.0f}ms"
+ if first_byte_latency_ms and total_duration_ms
+ else f"TTFB={'N/A' if first_byte_latency_ms is None else f'{first_byte_latency_ms:.0f}ms'}, "
+ f"total={'N/A' if total_duration_ms is None else f'{total_duration_ms:.0f}ms'}"
)Or simpler:
ttfb_str = f"{first_byte_latency_ms:.0f}ms" if first_byte_latency_ms else "N/A"
total_str = f"{total_duration_ms:.0f}ms" if total_duration_ms else "N/A"
logger.debug(
f"[Latency] {component.upper()} tracked for turn {turn_id}: "
f"TTFB={ttfb_str}, total={total_str}"
)Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In app/ai/voice/agents/breeze_buddy/utils/latency_tracker.py around lines 181 to
184, the f-string uses "{:.0f}" which will raise a TypeError if
first_byte_latency_ms or total_duration_ms is None; change it to build ttfb_str
and total_str using an explicit None check (e.g., ttfb_str = "N/A" if
first_byte_latency_ms is None else f"{first_byte_latency_ms:.0f}ms" and
similarly for total_duration_ms) and then log those strings in the logger.debug
call so None values are represented as "N/A" instead of causing a formatting
error.
|
|
||
| ## 📝 **NOTES** | ||
|
|
||
| - All file paths are relative to `/Users/pinnamaraju.swaroop/Repos/BreezeAutomatic/clairvoyance/` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove hardcoded local file path.
The document contains a developer-specific local path that should not be in committed documentation.
Suggested fix
-- All file paths are relative to `/Users/pinnamaraju.swaroop/Repos/BreezeAutomatic/clairvoyance/`
+- All file paths are relative to the repository root📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - All file paths are relative to `/Users/pinnamaraju.swaroop/Repos/BreezeAutomatic/clairvoyance/` | |
| - All file paths are relative to the repository root |
🤖 Prompt for AI Agents
In docs/BOLNA_VS_BREEZE_BUDDY_OPTIMIZATION_GAP_ANALYSIS.md around line 894,
there is a hardcoded developer-specific local file path
(/Users/pinnamaraju.swaroop/Repos/BreezeAutomatic/clairvoyance/) that must be
removed; replace it with a generic or relative path such as "./" or
"{project_root}/clairvoyance" or an example placeholder like
"/path/to/repo/clairvoyance", and scan the document for other occurrences to
update them similarly so no personal absolute paths remain in the committed
docs.
Summary by CodeRabbit
Release Notes
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.