Skip to content

Conversation

@swaroopvarma1
Copy link
Collaborator

@swaroopvarma1 swaroopvarma1 commented Dec 29, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Added comprehensive latency tracking for voice conversations, measuring speech-to-text, language model, and text-to-speech performance metrics with statistical analysis and session summaries.
    • Implemented LLM buffer streaming to enable parallel language model generation and speech synthesis, delivering faster response times.
    • Added new configuration options to enable/disable latency tracking and buffer streaming features.
  • Documentation

    • Added latency optimization guides and implementation roadmaps for improving voice agent responsiveness.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 29, 2025

Walkthrough

This pull request introduces a comprehensive latency tracking and optimization system for the Breeze Buddy voice agent. It adds latency tracking infrastructure (LatencyTracker, data models), frame processors for STT/LLM/TTS, buffered LLM streaming for parallel synthesis, configuration flags, and documentation.

Changes

Cohort / File(s) Summary
Latency Tracking Core
app/ai/voice/agents/breeze_buddy/utils/latency_tracker.py
New LatencyTracker class with TurnLatency and ComponentLatency data models. Tracks per-turn and per-component metrics (TTFB, total duration), computes percentiles (P50/P95/P99), exports to Langfuse, and logs summaries. ~250 lines.
Frame Processors
app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py, app/ai/voice/agents/breeze_buddy/processors/__init__.py
Three FrameProcessor subclasses (STTLatencyProcessor, LLMLatencyProcessor, TTSLatencyProcessor) that instrument voice pipeline stages. Factory function create_latency_processors returns all three. Module exports via all.
LLM Buffer Streaming
app/ai/voice/agents/breeze_buddy/utils/llm_buffer_streaming.py
BufferedLLMStreamWrapper class enables buffered streaming with configurable thresholds and word-boundary alignment. LLMBufferConfig provides AGGRESSIVE/BALANCED/CONSERVATIVE presets for latency/quality tradeoffs.
LLM Services Wrapper
app/ai/voice/agents/breeze_buddy/services/llm_wrapper.py, app/ai/voice/agents/breeze_buddy/services/__init__.py
BreezeBuddyLLMWrapper extends AzureLLMService, conditionally routes streaming through BufferedLLMStreamWrapper based on ENABLE_BREEZE_BUDDY_LLM_BUFFER_STREAMING flag. Re-exported via services init.
Configuration
app/core/config/static.py
Three new env-driven configuration flags: ENABLE_BREEZE_BUDDY_LATENCY_TRACKING (default true), ENABLE_BREEZE_BUDDY_LLM_BUFFER_STREAMING (default false), BREEZE_BUDDY_LLM_BUFFER_SIZE (default 40).
Documentation
docs/LATENCY_OPTIMIZATION.md, docs/BOLNA_VS_BREEZE_BUDDY_OPTIMIZATION_GAP_ANALYSIS.md
Two markdown files: phased implementation guide for latency optimization features, and detailed gap analysis comparing Bolna vs. Breeze Buddy with prioritized roadmap and code recommendations.

Sequence Diagram(s)

sequenceDiagram
    participant AudioFrame as AudioRawFrame
    participant STTProc as STTLatencyProcessor
    participant LLMProc as LLMLatencyProcessor
    participant TTSProc as TTSLatencyProcessor
    participant Tracker as LatencyTracker
    
    Note over STTProc,Tracker: STT Phase
    AudioFrame->>STTProc: First frame arrives
    STTProc->>Tracker: Track STT start
    
    Note over STTProc,Tracker: Transcription arrives
    STTProc->>STTProc: Count frames, measure TTFB
    STTProc->>Tracker: track_component(STT, TTFB, duration, metadata)
    
    Note over LLMProc,Tracker: LLM Phase
    LLMProc->>Tracker: Track LLM start on LLMRunFrame
    LLMProc->>LLMProc: Capture first token time, count tokens
    LLMProc->>Tracker: track_component(LLM, TTFB, duration, metadata)
    
    Note over TTSProc,Tracker: TTS Phase
    TTSProc->>Tracker: Track TTS start on TTSStartedFrame
    TTSProc->>TTSProc: Measure audio chunks, bytes, TTFB
    TTSProc->>Tracker: track_component(TTS, TTFB, duration, audio_stats)
    TTSProc->>Tracker: end_turn() finalizes and computes total latency
    Tracker->>Tracker: Log summary with percentiles & export to Langfuse
Loading
sequenceDiagram
    participant Client as LLM Client
    participant LLMWrapper as BreezeBuddyLLMWrapper
    participant BaseStream as Base LLM Stream
    participant Buffer as BufferedLLMStreamWrapper
    participant TTS as TTS Consumer
    
    Note over Client,Buffer: LLM Streaming Flow (Buffer Enabled)
    Client->>LLMWrapper: _stream_chat_completions(context)
    LLMWrapper->>BaseStream: Request stream
    BaseStream-->>Buffer: Raw token chunks
    
    Note over Buffer: Buffer accumulation
    Buffer->>Buffer: Accumulate text in buffer
    Buffer->>Buffer: Check thresholds (buffer_size/min_buffer_size)
    Buffer->>Buffer: Align to word boundaries (optional)
    
    par Parallel Synthesis
        Buffer-->>TTS: Emit buffered chunk (TTFB improved)
        TTS->>TTS: Start TTS synthesis in parallel
    end
    
    Buffer->>Buffer: Yield remaining on completion
    Buffer-->>LLMWrapper: Final content
    LLMWrapper-->>Client: Streamed response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • badri-singhal
  • murdore

Poem

🐰 Hops excitedly

Latency tracked from start to end,
Frame processors, buffers blend,
STT, LLM, TTS unite,
Optimization shines so bright! ✨
Parallel streams and metrics flow,
Breeze Buddy steals the show! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title contains a typo ('imporvements' instead of 'improvements') and is vague/generic, failing to clearly convey the substantial changes including latency tracking, LLM buffer streaming, configuration updates, and documentation. Fix the typo and make the title more specific. Consider: 'feat: Add latency tracking and LLM buffer streaming for Breeze Buddy' or similar to better reflect the actual implementation scope.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 91.30% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch breeze-buddy-latency-improvements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +124 to +131
if len(split_result) == 2:
# Found a word boundary
chunk_to_yield = split_result[0]
self.buffer = split_result[1]

# Only yield if chunk is substantial enough
if len(chunk_to_yield) >= self.min_buffer_size:
return chunk_to_yield

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve buffer when boundary chunk is too small

The word-boundary extraction mutates self.buffer before it knows whether it will emit a chunk. If the prefix is shorter than min_buffer_size, the method returns None after already assigning self.buffer = split_result[1], which permanently discards the prefix (and the separating space). Any stream where the last space falls early in the buffer will lose text and produce concatenated/incorrect speech. Consider only slicing the buffer once you’re sure the chunk will be emitted, or reattach the dropped prefix when skipping.

Useful? React with 👍 / 👎.

Comment on lines +207 to +210
logger.info(
f"[LLM Latency] Turn {self.current_turn_id}: "
f"TTFB={first_byte_latency:.0f}ms, "
f"total={total_duration:.0f}ms, "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard LLM latency logging when no first token arrives

The log message formats first_byte_latency with .0f unconditionally. If no TTSSpeakFrame arrives before LLMFullResponseEndFrame (e.g., empty LLM response, TTS disabled, or a downstream error), first_byte_latency remains None and this f-string raises TypeError, interrupting frame processing. A fallback string/0 or a conditional log avoids crashing the pipeline in these cases.

Useful? React with 👍 / 👎.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (7)
app/ai/voice/agents/breeze_buddy/processors/__init__.py (1)

1-19: LGTM!

The package initialization correctly re-exports the latency tracking processors and factory function. The module structure is clean and follows standard Python packaging conventions.

The static analysis hint about sorting __all__ is a minor style preference that can be addressed optionally.

Optional: Sort __all__ alphabetically
 __all__ = [
+    "create_latency_processors",
     "STTLatencyProcessor",
     "LLMLatencyProcessor",
     "TTSLatencyProcessor",
-    "create_latency_processors",
 ]
app/ai/voice/agents/breeze_buddy/utils/llm_buffer_streaming.py (1)

143-178: Consider using ClassVar for mutable class attributes.

The static analysis correctly identifies that mutable class attributes (dicts) should be annotated with ClassVar to clarify they are class-level, not instance-level.

Suggested fix
+from typing import ClassVar, Dict
+
 class LLMBufferConfig:
     """Configuration for LLM buffer-based streaming."""

     # Aggressive (lowest latency, may cut words)
-    AGGRESSIVE = {
+    AGGRESSIVE: ClassVar[Dict[str, int | bool]] = {
         "buffer_size": 30,
         "min_buffer_size": 15,
         "enable_word_boundary": True
     }

     # Balanced (good latency, preserves words)
-    BALANCED = {
+    BALANCED: ClassVar[Dict[str, int | bool]] = {
         "buffer_size": 40,
         "min_buffer_size": 20,
         "enable_word_boundary": True
     }

     # Conservative (higher quality, slightly more latency)
-    CONSERVATIVE = {
+    CONSERVATIVE: ClassVar[Dict[str, int | bool]] = {
         "buffer_size": 60,
         "min_buffer_size": 30,
         "enable_word_boundary": True
     }
app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py (3)

68-68: Remove extraneous f prefix from string without placeholders.

-                logger.trace(f"[STT Latency] Audio input started")
+                logger.trace("[STT Latency] Audio input started")

263-263: Remove extraneous f prefix from string without placeholders.

-            logger.trace(f"[TTS Latency] TTS started")
+            logger.trace("[TTS Latency] TTS started")

178-181: Potential IndexError if frame.text is shorter than 50 characters.

String slicing in Python is safe for short strings (returns available characters), but the log message appends ... which may be misleading for short texts.

Suggested improvement
text_preview = frame.text[:50] + "..." if len(frame.text) > 50 else frame.text
logger.debug(
    f"[LLM Latency] First token received: {ttfb:.0f}ms, "
    f"text='{text_preview}'"
)
app/ai/voice/agents/breeze_buddy/utils/latency_tracker.py (1)

224-226: Use descriptive variable names instead of l.

The variable l is ambiguous and can be confused with 1 (one) in some fonts.

Suggested fix
         # Extract TTFB and total duration
-        ttfb_values = [l.first_byte_latency_ms for l in latencies if l.first_byte_latency_ms is not None]
-        total_values = [l.total_duration_ms for l in latencies if l.total_duration_ms is not None]
+        ttfb_values = [lat.first_byte_latency_ms for lat in latencies if lat.first_byte_latency_ms is not None]
+        total_values = [lat.total_duration_ms for lat in latencies if lat.total_duration_ms is not None]
docs/LATENCY_OPTIMIZATION.md (1)

91-103: Add language specifiers to fenced code blocks for proper syntax highlighting.

Several code blocks in the document are missing language specifiers, which affects readability when rendered.

Suggested fix for lines 91-103
-```
+```text
 User speaks → VAD detects end → STT finalizes → LLM starts → Response
                                   ↑
                             Waiting for complete transcript

With interim results:
- +text
User speaks → Interim results → LLM starts early → Response

Processing begins while user still speaking

Similar fixes needed for lines 298, 327, 332, and 490.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef71043 and f0dcdc3.

📒 Files selected for processing (9)
  • app/ai/voice/agents/breeze_buddy/processors/__init__.py
  • app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py
  • app/ai/voice/agents/breeze_buddy/services/__init__.py
  • app/ai/voice/agents/breeze_buddy/services/llm_wrapper.py
  • app/ai/voice/agents/breeze_buddy/utils/latency_tracker.py
  • app/ai/voice/agents/breeze_buddy/utils/llm_buffer_streaming.py
  • app/core/config/static.py
  • docs/BOLNA_VS_BREEZE_BUDDY_OPTIMIZATION_GAP_ANALYSIS.md
  • docs/LATENCY_OPTIMIZATION.md
🧰 Additional context used
🧬 Code graph analysis (4)
app/ai/voice/agents/breeze_buddy/services/__init__.py (1)
app/ai/voice/agents/breeze_buddy/services/llm_wrapper.py (1)
  • BreezeBuddyLLMWrapper (22-48)
app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py (1)
app/ai/voice/agents/breeze_buddy/utils/latency_tracker.py (4)
  • LatencyTracker (49-362)
  • track_component (131-186)
  • start_turn (82-95)
  • end_turn (97-129)
app/ai/voice/agents/breeze_buddy/services/llm_wrapper.py (2)
app/ai/voice/agents/breeze_buddy/utils/llm_buffer_streaming.py (4)
  • BufferedLLMStreamWrapper (16-140)
  • LLMBufferConfig (143-178)
  • get_config (168-178)
  • stream_with_buffer (45-106)
app/ai/voice/agents/breeze_buddy/template/context.py (1)
  • context (72-74)
app/ai/voice/agents/breeze_buddy/processors/__init__.py (1)
app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py (4)
  • STTLatencyProcessor (30-121)
  • LLMLatencyProcessor (124-220)
  • TTSLatencyProcessor (223-317)
  • create_latency_processors (320-357)
🪛 LanguageTool
docs/LATENCY_OPTIMIZATION.md

[grammar] ~3-~3: Ensure spelling is correct
Context: ...ucing voice conversation latency by 600-700ms** --- ## 📋 Table of Contents 1. [Overview](#ove...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~70-~70: Ensure spelling is correct
Context: ... 3:** Test - Make a call and notice 200-400ms faster response times! ✅ **That's it!*...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~82-~82: Ensure spelling is correct
Context: ...rim Results (DETAILED) Impact: 200-400ms latency reduction Effort: 5-10 minu...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~318-~318: Ensure spelling is correct
Context: ...: LLM Buffer Streaming Impact: 200-300ms latency reduction Effort: 4-6 hours...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~473-~473: Ensure spelling is correct
Context: ... in logs - ✅ P95 latency reduced by 200-400ms - ✅ No STT accuracy degradation **Phase 2...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~573-~573: Ensure spelling is correct
Context: ...es to .env - Restart server - Get 200-400ms improvement! Phase 2 (2-3 hours): ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

docs/BOLNA_VS_BREEZE_BUDDY_OPTIMIZATION_GAP_ANALYSIS.md

[grammar] ~23-~23: Ensure spelling is correct
Context: ...ruption handling with sequence IDs (100-200ms reduction) 7. Implement smart caching f...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~24-~24: Ensure spelling is correct
Context: ...nt smart caching for common phrases (50-200ms reduction) **Larger Projects (1-2 week...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~27-~27: Ensure spelling is correct
Context: ...igrate to queue-based architecture (100-300ms reduction) 9. Add intelligent buffering...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~28-~28: Ensure spelling is correct
Context: ...gent buffering throughout pipeline (100-200ms reduction) --- ## 📊 DETAILED GAP ANA...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~65-~65: Ensure spelling is correct
Context: ...finalizes the transcript. This adds 200-400ms per turn. Recommendation: ⭐ **IMME...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~148-~148: Ensure spelling is correct
Context: ...ntations are well-tuned. Breeze Buddy's 300ms is actually slightly faster. **Recomme...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~655-~655: Ensure spelling is correct
Context: ...s) could be pre-synthesized, saving 100-200ms each time. Recommendation: ⭐ **MED...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~770-~770: Ensure spelling is correct
Context: ...Estimated Total Latency Reduction: 500-900ms* ### 1.1 Enable Soniox Interim Results ⭐⭐⭐ - ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~774-~774: Ensure spelling is correct
Context: ...Effort: 5 minutes - Impact: 200-400ms reduction - Files: .env - **Actio...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~798-~798: Ensure spelling is correct
Context: ... - Effort: 4 hours - Impact: 50-100ms reduction - Files: `services/llm_wr...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~812-~812: Ensure spelling is correct
Context: ...Estimated Total Latency Reduction: 200-500ms* ### 2.1 Integrate Latency Tracking ⭐⭐ - **Ef...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~816-~816: Ensure spelling is correct
Context: ...ng ⭐⭐ - Effort: 1 day - Impact: 0ms (visibility only, enables future optimi...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~828-~828: Ensure spelling is correct
Context: ...- Effort: 3-5 days - Impact: 50-200ms per cached phrase - Files: Create `...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~842-~842: Ensure spelling is correct
Context: ...Estimated Total Latency Reduction: 200-500ms* ### 3.1 Hybrid Queue Architecture ⭐⭐⭐ - **Ef...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~846-~846: Ensure spelling is correct
Context: ...Effort: 2-3 weeks - Impact: 100-300ms reduction - Files: Significant refa...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~854-~854: Ensure spelling is correct
Context: ....3-0.5s audio chunks (4096-8192 bytes @ 8kHz) ### 3.3 Add Filler Sounds & Backchann...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~886-~886: Ensure spelling is correct
Context: ...4 hours) Expected Improvement: 250-500ms reduction + full visibility into remain...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
docs/LATENCY_OPTIMIZATION.md

3-3: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


24-24: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


30-30: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


91-91: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


98-98: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


189-189: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


189-189: Strong style
Expected: asterisk; Actual: underscore

(MD050, strong-style)


298-298: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


327-327: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


332-332: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


490-490: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/BOLNA_VS_BREEZE_BUDDY_OPTIMIZATION_GAP_ANALYSIS.md

561-561: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


770-770: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


812-812: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


842-842: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🪛 Ruff (0.14.10)
app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py

68-68: f-string without any placeholders

Remove extraneous f prefix

(F541)


263-263: f-string without any placeholders

Remove extraneous f prefix

(F541)

app/ai/voice/agents/breeze_buddy/utils/llm_buffer_streaming.py

147-151: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


154-158: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


161-165: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

app/ai/voice/agents/breeze_buddy/processors/__init__.py

14-19: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

app/ai/voice/agents/breeze_buddy/utils/latency_tracker.py

225-225: Ambiguous variable name: l

(E741)


226-226: Ambiguous variable name: l

(E741)


324-324: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (12)
app/core/config/static.py (1)

134-145: LGTM!

The new configuration constants follow the established patterns in this file. Good defaults:

  • Latency tracking enabled by default (visibility without side effects)
  • Buffer streaming disabled by default (opt-in for the more experimental feature)
  • Sensible 40-char buffer size matching the Bolna reference implementation
docs/BOLNA_VS_BREEZE_BUDDY_OPTIMIZATION_GAP_ANALYSIS.md (1)

1-904: Comprehensive and well-structured analysis document.

The gap analysis provides valuable context for latency optimization priorities with clear effort/impact assessments and actionable recommendations. The phased roadmap aligns well with the implementation code in this PR.

app/ai/voice/agents/breeze_buddy/utils/llm_buffer_streaming.py (2)

121-140: Edge case: buffer with no spaces returns None indefinitely.

If the buffer contains text without any spaces (e.g., a very long word or URL) and is between buffer_size and buffer_size * 2, _extract_chunk_at_boundary returns None, causing chunks to accumulate until the 2x threshold. This is acceptable behavior but worth noting.

The force-yield at 2x buffer_size (lines 133-138) is a good safeguard that prevents unbounded growth.


45-106: Well-implemented async streaming with proper error handling.

Good practices observed:

  • Buffer reset at start of each stream (line 60-61)
  • Proper CancelledError handling that discards partial state
  • Exception handler yields remaining buffer before propagating
  • Informative logging at key points
app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py (1)

320-357: Well-designed factory function with clear documentation.

The create_latency_processors function provides a clean API with a helpful docstring example showing pipeline integration.

app/ai/voice/agents/breeze_buddy/utils/latency_tracker.py (2)

290-325: Langfuse export handles errors gracefully.

The broad Exception catch on line 324 is appropriate here since export failures should not crash the application, and it logs the error for debugging.


49-80: Well-structured LatencyTracker initialization.

The tracker properly initializes session state, turn tracking structures, and connection latency storage. The use of defaultdict for component latencies is appropriate.

app/ai/voice/agents/breeze_buddy/services/__init__.py (1)

1-13: LGTM!

Clean package initialization that correctly re-exports BreezeBuddyLLMWrapper for convenient access.

docs/LATENCY_OPTIMIZATION.md (1)

1-597: Excellent implementation guide with clear phased approach.

The documentation provides:

  • Clear baseline state assessment
  • Actionable steps with specific file references
  • Expected outcomes with measurable targets
  • Troubleshooting guidance

This aligns well with the actual implementation code in the PR.

app/ai/voice/agents/breeze_buddy/services/llm_wrapper.py (3)

1-6: LGTM!

Clear module docstring explaining the latency optimization purpose.


8-19: LGTM!

Imports are well-organized and all appear to be used.


38-48: This code is not currently used in the agent implementation.

BreezeBuddyLLMWrapper is defined and exported but never instantiated. The actual agents (agent.py and websocket_bot.py) directly use AzureLLMService instead. The buffer streaming feature is not integrated into the active codebase.

If this class is reactivated in the future, the original concurrency concern about the shared buffer_wrapper instance would be valid—multiple concurrent streams would corrupt buffer state since stream_with_buffer resets mutable state at the start. However, this is not a current issue as the code is not deployed.

Likely an incorrect or invalid review comment.

Comment on lines +109 to +114
logger.info(
f"[STT Latency] Turn {turn_id}: "
f"TTFB={first_byte_latency:.0f}ms, "
f"total={total_duration:.0f}ms, "
f"transcript='{frame.text[:50]}...'"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential TypeError when first_byte_latency is None.

If no interim results are received before the final transcription, first_byte_latency will be None, and formatting it with :.0f will raise a TypeError.

Suggested fix
                     logger.info(
                         f"[STT Latency] Turn {turn_id}: "
-                        f"TTFB={first_byte_latency:.0f}ms, "
+                        f"TTFB={first_byte_latency:.0f}ms, " if first_byte_latency else "TTFB=N/A, "
                         f"total={total_duration:.0f}ms, "
                         f"transcript='{frame.text[:50]}...'"
                     )

Or more cleanly:

ttfb_str = f"{first_byte_latency:.0f}ms" if first_byte_latency else "N/A"
logger.info(
    f"[STT Latency] Turn {turn_id}: "
    f"TTFB={ttfb_str}, "
    f"total={total_duration:.0f}ms, "
    f"transcript='{frame.text[:50]}...'"
)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py around lines
109-114, the logger formats first_byte_latency with "{:.0f}" which raises a
TypeError when first_byte_latency is None; change the logging to format TTFB
conditionally (e.g., ttfb_str = f"{first_byte_latency:.0f}ms" if
first_byte_latency is not None else "N/A") and then use ttfb_str in the
f-string, leaving total_duration formatted as before and preserving the
transcript slice.

Comment on lines +207 to +212
logger.info(
f"[LLM Latency] Turn {self.current_turn_id}: "
f"TTFB={first_byte_latency:.0f}ms, "
f"total={total_duration:.0f}ms, "
f"tokens={self.tokens_count}"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same None formatting issue for LLM first_byte_latency.

Apply the same fix as for STT to handle cases where no TTSSpeakFrame is received before LLMFullResponseEndFrame.

Suggested fix
ttfb_str = f"{first_byte_latency:.0f}ms" if first_byte_latency else "N/A"
logger.info(
    f"[LLM Latency] Turn {self.current_turn_id}: "
    f"TTFB={ttfb_str}, "
    f"total={total_duration:.0f}ms, "
    f"tokens={self.tokens_count}"
)
🤖 Prompt for AI Agents
In app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py around lines
207 to 212, the logger formats first_byte_latency which can be None causing
"None" in output; change the formatting to display a fallback like "N/A" when
first_byte_latency is falsy (e.g., compute ttfb_str =
f"{first_byte_latency:.0f}ms" if first_byte_latency else "N/A") and use ttfb_str
in the f-string instead of formatting first_byte_latency directly.

Comment on lines +301 to +306
logger.info(
f"[TTS Latency] Turn {turn_id}: "
f"TTFB={first_byte_latency:.0f}ms, "
f"total={total_duration:.0f}ms, "
f"chunks={self.audio_chunks_count}"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same None formatting issue for TTS first_byte_latency.

Apply the same fix as for STT/LLM to handle cases where no TTSAudioRawFrame is received before TTSStoppedFrame.

Suggested fix
ttfb_str = f"{first_byte_latency:.0f}ms" if first_byte_latency else "N/A"
logger.info(
    f"[TTS Latency] Turn {turn_id}: "
    f"TTFB={ttfb_str}, "
    f"total={total_duration:.0f}ms, "
    f"chunks={self.audio_chunks_count}"
)
🤖 Prompt for AI Agents
In app/ai/voice/agents/breeze_buddy/processors/latency_tracking.py around lines
301 to 306, the logger formats first_byte_latency directly which can produce
"None" when no TTSAudioRawFrame was received; change the logging to compute a
ttfb_str that is "N/A" when first_byte_latency is falsy/None (e.g., ttfb_str =
f"{first_byte_latency:.0f}ms" if first_byte_latency else "N/A") and then use
that ttfb_str in the f-string for TTFB while keeping the rest of the fields the
same.

Comment on lines +30 to +33
if self.enable_buffer_streaming:
config = LLMBufferConfig.get_config("balanced")
config["buffer_size"] = BREEZE_BUDDY_LLM_BUFFER_SIZE
self.buffer_wrapper = BufferedLLMStreamWrapper(**config)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mutating shared class-level config dictionary.

LLMBufferConfig.get_config("balanced") returns the class attribute dictionary directly, not a copy. Assigning to config["buffer_size"] mutates LLMBufferConfig.BALANCED permanently, affecting all subsequent callers.

🔎 Proposed fix
         if self.enable_buffer_streaming:
-            config = LLMBufferConfig.get_config("balanced")
-            config["buffer_size"] = BREEZE_BUDDY_LLM_BUFFER_SIZE
+            config = LLMBufferConfig.get_config("balanced").copy()
+            config["buffer_size"] = BREEZE_BUDDY_LLM_BUFFER_SIZE
             self.buffer_wrapper = BufferedLLMStreamWrapper(**config)
🤖 Prompt for AI Agents
In app/ai/voice/agents/breeze_buddy/services/llm_wrapper.py around lines 30 to
33, the code calls LLMBufferConfig.get_config("balanced") which returns the
class-level dictionary directly and then mutates config["buffer_size"],
unintentionally changing the shared BALANCED config for all callers; fix this by
creating a shallow (or deep, if nested) copy of the returned config before
modifying it (e.g., config = dict(LLMBufferConfig.get_config("balanced")) or use
copy.deepcopy(...)) and then set config["buffer_size"] and pass that copy into
BufferedLLMStreamWrapper so the class-level config remains unchanged.

Comment on lines +131 to +157
def track_component(
self,
component: str,
first_byte_latency_ms: Optional[float] = None,
total_duration_ms: Optional[float] = None,
turn_id: Optional[str] = None,
sequence_id: Optional[int] = None,
metadata: Optional[Dict] = None
) -> ComponentLatency:
"""
Track latency for a specific component.

Args:
component: Component name ("stt", "llm", "tts")
first_byte_latency_ms: Time to first byte/token
total_duration_ms: Total processing duration
turn_id: Turn identifier (uses current if None)
sequence_id: Sequence ID for this component execution
metadata: Additional metadata (provider, model, etc.)

Returns:
ComponentLatency object
"""
turn_id = turn_id or self.current_turn_id
if not turn_id:
logger.warning(f"[Latency] Cannot track {component}: no active turn")
return None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Return type annotation is inconsistent with actual behavior.

The method signature declares -> ComponentLatency but can return None on line 157 when no active turn exists.

Suggested fix
     def track_component(
         self,
         component: str,
         first_byte_latency_ms: Optional[float] = None,
         total_duration_ms: Optional[float] = None,
         turn_id: Optional[str] = None,
         sequence_id: Optional[int] = None,
         metadata: Optional[Dict] = None
-    ) -> ComponentLatency:
+    ) -> Optional[ComponentLatency]:
🤖 Prompt for AI Agents
In app/ai/voice/agents/breeze_buddy/utils/latency_tracker.py around lines 131 to
157, the track_component method is annotated to return ComponentLatency but
actually returns None when there is no active turn; change the return type
annotation to Optional[ComponentLatency] (and update any imports if needed),
update the docstring to document that None may be returned when there is no
active turn, and run a quick grep to ensure callsites handle the Optional return
(or adjust callers to handle None).

Comment on lines +181 to +184
logger.debug(
f"[Latency] {component.upper()} tracked for turn {turn_id}: "
f"TTFB={first_byte_latency_ms:.0f}ms, total={total_duration_ms:.0f}ms"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential TypeError when formatting None values.

If first_byte_latency_ms or total_duration_ms is None, the :.0f formatting will fail.

Suggested fix
         logger.debug(
             f"[Latency] {component.upper()} tracked for turn {turn_id}: "
-            f"TTFB={first_byte_latency_ms:.0f}ms, total={total_duration_ms:.0f}ms"
+            f"TTFB={first_byte_latency_ms:.0f}ms, total={total_duration_ms:.0f}ms"
+            if first_byte_latency_ms and total_duration_ms
+            else f"TTFB={'N/A' if first_byte_latency_ms is None else f'{first_byte_latency_ms:.0f}ms'}, "
+                 f"total={'N/A' if total_duration_ms is None else f'{total_duration_ms:.0f}ms'}"
         )

Or simpler:

ttfb_str = f"{first_byte_latency_ms:.0f}ms" if first_byte_latency_ms else "N/A"
total_str = f"{total_duration_ms:.0f}ms" if total_duration_ms else "N/A"
logger.debug(
    f"[Latency] {component.upper()} tracked for turn {turn_id}: "
    f"TTFB={ttfb_str}, total={total_str}"
)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In app/ai/voice/agents/breeze_buddy/utils/latency_tracker.py around lines 181 to
184, the f-string uses "{:.0f}" which will raise a TypeError if
first_byte_latency_ms or total_duration_ms is None; change it to build ttfb_str
and total_str using an explicit None check (e.g., ttfb_str = "N/A" if
first_byte_latency_ms is None else f"{first_byte_latency_ms:.0f}ms" and
similarly for total_duration_ms) and then log those strings in the logger.debug
call so None values are represented as "N/A" instead of causing a formatting
error.


## 📝 **NOTES**

- All file paths are relative to `/Users/pinnamaraju.swaroop/Repos/BreezeAutomatic/clairvoyance/`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove hardcoded local file path.

The document contains a developer-specific local path that should not be in committed documentation.

Suggested fix
-- All file paths are relative to `/Users/pinnamaraju.swaroop/Repos/BreezeAutomatic/clairvoyance/`
+- All file paths are relative to the repository root
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- All file paths are relative to `/Users/pinnamaraju.swaroop/Repos/BreezeAutomatic/clairvoyance/`
- All file paths are relative to the repository root
🤖 Prompt for AI Agents
In docs/BOLNA_VS_BREEZE_BUDDY_OPTIMIZATION_GAP_ANALYSIS.md around line 894,
there is a hardcoded developer-specific local file path
(/Users/pinnamaraju.swaroop/Repos/BreezeAutomatic/clairvoyance/) that must be
removed; replace it with a generic or relative path such as "./" or
"{project_root}/clairvoyance" or an example placeholder like
"/path/to/repo/clairvoyance", and scan the document for other occurrences to
update them similarly so no personal absolute paths remain in the committed
docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants