Skip to content

Conversation

@Devansh-1218
Copy link
Contributor

@Devansh-1218 Devansh-1218 commented Oct 17, 2025

stt fallback for providers integrated

Summary by CodeRabbit

  • New Features

    • Added automatic fallback mechanism for voice transcription services. When the primary speech-to-text provider fails, the system now automatically switches to alternative providers for improved reliability and uptime.
  • Chores

    • Enhanced voice agent error detection and recovery infrastructure.

@coderabbitai
Copy link

coderabbitai bot commented Oct 17, 2025

Walkthrough

The pull request introduces STT (speech-to-text) provider fallback capability to the voice agent system. It adds error detection in the pipeline, modularizes STT service creation with provider-specific factories, introduces new fallback-aware entrypoints, and extends configuration to enable and prioritize STT fallback providers.

Changes

Cohort / File(s) Summary
Core Fallback Orchestration
app/agents/voice/automatic/__init__.py
Introduces STTRestartException, adds run_with_stt_fallback() and run_with_single_stt() entrypoints, refactors run_normal_mode() to route through fallback flow when enabled, introduces run_pipeline_with_stt() and build_and_run_pipeline() to centralize pipeline construction with optional STT error detection, and adds helper setup functions (setup_vad_analyzer(), setup_smart_turn_analyzer(), setup_audio_filter(), setup_transport_proxy(), create_llm_service(), setup_tools()) to modularize component assembly.
STT Error Detection
app/agents/voice/automatic/processors/stt_error_detector.py
Introduces STTErrorDetector processor class that intercepts pipeline frames, detects STT-related errors via keyword matching, triggers provided restart_callback to fallback to another provider, and halts error propagation downstream.
Provider Factory & Fallback Logic
app/agents/voice/automatic/stt/__init__.py
Adds provider-agnostic get_provider_for_attempt() helper to map fallback attempts to providers, introduces dedicated factory functions (create_assemblyai_service(), create_openai_service(), create_deepgram_service(), create_soniox_service(), create_google_service()), adds create_provider_service() centralizer, and extends get_stt_service() with fallback_attempt parameter to support both fallback and legacy single-provider modes.
Fallback Configuration
app/core/config.py
Adds ENABLE_STT_FALLBACK (boolean) and STT_FALLBACK_PROVIDERS (comma-separated list defaulting to deepgram, openai, assemblyai, google) to enable and prioritize STT fallback behavior.

Sequence Diagram

sequenceDiagram
    participant User
    participant Pipeline
    participant STTService
    participant ErrorDetector
    participant Agent
    participant FallbackHandler

    User->>Pipeline: Audio frame
    Pipeline->>STTService: Process STT request
    STTService-->>Pipeline: ErrorFrame (STT error)
    Pipeline->>ErrorDetector: Route ErrorFrame
    
    rect rgb(220, 53, 69, 0.1)
    Note over ErrorDetector: STT error detected
    ErrorDetector->>FallbackHandler: trigger restart_callback()
    end
    
    FallbackHandler->>Agent: Switch to next STT provider
    Agent->>STTService: Reinitialize with fallback provider
    Pipeline->>STTService: Retry audio processing
    STTService-->>Pipeline: STT result (success)
    Pipeline->>User: Response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes introduce significant architectural modifications across multiple files with heterogeneous purposes: orchestration logic refactoring, new processor component, provider factory pattern implementation, and configuration extensions. While individual components are moderately complex, the interconnected nature of fallback flow wiring, error detection mechanisms, and provider factory abstractions demands careful review of the integration points and control flow across the pipeline.

Possibly related PRs

  • feat: soniox stt integration #234: Directly overlaps with STT provider factory modifications and configuration additions, including Soniox provider support in both PRs.
  • feat: Add deepgram stt support #225: Shares STT provider infrastructure refactoring and configuration changes, with prior Deepgram support that aligns with fallback provider setup.

Poem

🐰 From one provider, now we leap and bound,
When whispers fail, we try the next sound,
Error detection, fallback dance so swift,
Our voice pipeline gains a graceful gift! 🎤✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "stt fallback" directly corresponds to the primary objective of the changeset, which is implementing an STT (Speech-To-Text) fallback mechanism for provider providers. The title accurately identifies the core feature being added across multiple files, including new exception handling, error detection components, provider-agnostic selection logic, and configuration options. While the title is terse, it is specific and clear enough that a developer scanning the commit history would immediately understand this PR introduces STT fallback functionality. The title avoids vague language and provides a concise summary of the main change.
Docstring Coverage ✅ Passed Docstring coverage is 95.83% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
app/agents/voice/automatic/__init__.py (1)

641-656: ptt_vad_filter may be undefined in message handlers; guard its usage.

If DISABLE_VAD_FOR_PTT is False, handlers will NameError on PTT events. Initialize to None and check before use.

-    stt_mute_filter = None
-    tool_call_processor = None
+    stt_mute_filter = None
+    tool_call_processor = None
+    ptt_vad_filter = None
@@
-                elif message_type == "ptt-start":
+                elif message_type == "ptt-start" and ptt_vad_filter:
@@
-                elif message_type == "ptt-end":
+                elif message_type == "ptt-end" and ptt_vad_filter:
@@
-                elif message_type == "ptt-sync":
+                elif message_type == "ptt-sync" and ptt_vad_filter:

Optional: else log “PTT message ignored; VAD filter disabled”.

Also applies to: 776-804

app/agents/voice/automatic/stt/__init__.py (1)

180-237: Verification confirms the review comment is accurate—callers need to handle init errors as fallback triggers.

Analysis of direct callers:

  1. websocket_bot.py:238: No error handling; exceptions fatal (separate module context)
  2. automatic/init.py:283 (fallback path): Catches only STTRestartException; ValueError from get_stt_service() falls through to generic handler and re-raises as fatal
  3. automatic/init.py:312 (legacy path): No error handling; exceptions fatal

The fallback path at line 283 should catch init errors (like ValueError: "All STT providers exhausted") and treat them as "try next provider" rather than fatal. Currently, such exceptions hit the generic handler and propagate up, stopping the fallback loop prematurely.

The review comment is correct: keep get_stt_service() raising, but fix callers to catch and continue on init errors.

🧹 Nitpick comments (5)
app/agents/voice/automatic/stt/__init__.py (1)

16-34: Provider selection should skip duplicates and unknowns; compute ordered unique list once.

Prevents retrying the same provider and tolerates misconfigured lists.

-def get_provider_for_attempt(fallback_attempt: int) -> Optional[str]:
+def get_provider_for_attempt(fallback_attempt: int) -> Optional[str]:
     """
     Get the STT provider to use for the given fallback attempt.
@@
-    if fallback_attempt == 0:
-        return config.STT_PROVIDER
-
-    fallback_index = fallback_attempt - 1
-    if fallback_index < len(config.STT_FALLBACK_PROVIDERS):
-        return config.STT_FALLBACK_PROVIDERS[fallback_index]
-    return None
+    # Build ordered unique provider list: primary + fallbacks
+    allowed = {"google", "assemblyai", "openai", "deepgram", "soniox"}
+    ordered = []
+    primary = (config.STT_PROVIDER or "google").strip().lower()
+    if primary in allowed:
+        ordered.append(primary)
+    for p in config.STT_FALLBACK_PROVIDERS:
+        q = p.strip().lower()
+        if q in allowed and q not in ordered:
+            ordered.append(q)
+    return ordered[fallback_attempt] if 0 <= fallback_attempt < len(ordered) else None
app/agents/voice/automatic/processors/stt_error_detector.py (3)

25-37: Fix restart_callback type: it’s awaited; annotate as Awaitable.

Prevents type-checker drift and accidental sync callbacks.

-from typing import Callable
+from typing import Callable, Awaitable
@@
-    def __init__(
-        self, restart_callback: Callable[[], None], name: str = "STTErrorDetector"
-    ):
+    def __init__(
+        self, restart_callback: Callable[[], Awaitable[None]], name: str = "STTErrorDetector"
+    ):

39-57: Avoid leaking secrets in logs; redact sensitive substrings before logging.

Errors often include API keys/tokens. Redact before logger.error.

-            logger.error(f"STT error detected: {frame.error}")
+            msg = str(frame.error)
+            sanitized = msg.replace("api_key=", "api_key=***").replace("apikey=", "apikey=***")
+            logger.error(f"STT error detected: {sanitized}")

72-107: Keyword detection is broad; expect false positives. Prefer source metadata check when available.

If ErrorFrame exposes origin/service name, check that first; keep keywords as fallback.

-        # Check if error message contains any STT-related keywords
+        # Prefer structured source metadata if available
+        source = getattr(error_frame, "source", None)
+        if isinstance(source, str) and any(s in source.lower() for s in stt_services):
+            return True
+        # Fallback: keyword scan
         for keyword in stt_services + stt_error_patterns:
app/agents/voice/automatic/__init__.py (1)

865-879: Clean up pipeline/transport on STTRestartException to avoid leaks before retry.

Cancel the task (and close transport if supported) before bubbling up.

     async def run_pipeline():
         try:
             await runner.run(task)
         except asyncio.CancelledError:
             logger.info("Main task cancelled. Exiting gracefully.")
         except STTRestartException:
             # Re-raise STT restart exception to trigger fallback
             logger.info(
                 "STT restart exception detected - propagating to fallback handler"
             )
-            raise
+            try:
+                await task.cancel()
+            finally:
+                raise
         except Exception as e:
             logger.error(f"Pipeline runner error: {e}")
-            raise
+            try:
+                await task.cancel()
+            finally:
+                raise
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1dfeee1 and 980dad8.

📒 Files selected for processing (4)
  • app/agents/voice/automatic/__init__.py (11 hunks)
  • app/agents/voice/automatic/processors/stt_error_detector.py (1 hunks)
  • app/agents/voice/automatic/stt/__init__.py (2 hunks)
  • app/core/config.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
app/agents/voice/automatic/stt/__init__.py (1)
app/agents/voice/breeze_buddy/workflows/order_confirmation/utils.py (1)
  • get_stt_service (13-65)
app/agents/voice/automatic/__init__.py (8)
app/agents/voice/automatic/processors/stt_error_detector.py (1)
  • STTErrorDetector (17-107)
app/agents/voice/automatic/prompts/__init__.py (1)
  • get_system_prompt (26-59)
app/agents/voice/automatic/stt/__init__.py (1)
  • get_stt_service (180-237)
app/agents/voice/automatic/tools/__init__.py (1)
  • initialize_tools (19-135)
app/agents/voice/automatic/tts/__init__.py (1)
  • get_tts_service (15-88)
app/agents/voice/automatic/types/decoders.py (3)
  • decode_voice_name (14-21)
  • decode_tts_provider (4-11)
  • decode_mode (24-31)
app/agents/voice/automatic/features/llm_wrapper.py (1)
  • LLMServiceWrapper (14-142)
app/agents/voice/automatic/utils/session_context.py (2)
  • set_current_session_id (31-35)
  • create_session_context (22-24)
🪛 Ruff (0.14.0)
app/agents/voice/automatic/stt/__init__.py

39-39: Avoid specifying long messages outside the exception class

(TRY003)


51-51: Avoid specifying long messages outside the exception class

(TRY003)


68-68: Avoid specifying long messages outside the exception class

(TRY003)


105-105: Avoid specifying long messages outside the exception class

(TRY003)


177-177: Avoid specifying long messages outside the exception class

(TRY003)


217-217: Avoid specifying long messages outside the exception class

(TRY003)


222-222: Consider moving this statement to an else block

(TRY300)


234-234: Consider moving this statement to an else block

(TRY300)

app/agents/voice/automatic/__init__.py

293-293: Consider moving this statement to an else block

(TRY300)


306-306: Create your own exception

(TRY002)


306-306: Avoid specifying long messages outside the exception class

(TRY003)


386-386: Consider moving this statement to an else block

(TRY300)


387-387: Do not catch blind exception: Exception

(BLE001)


424-424: Unused function argument: tts_provider

(ARG001)


515-515: Avoid specifying long messages outside the exception class

(TRY003)


534-534: Unused function argument: session_context

(ARG001)

Comment on lines +283 to +286
stt = get_stt_service(
voice_name=decode_voice_name(args.voice_name).value,
fallback_attempt=fallback_attempt,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fallback doesn’t advance on provider init failures; continue to next provider.

If get_stt_service fails (missing key, bad config), we should try the next provider instead of aborting the whole flow.

-            # Get STT service for current fallback attempt
-            stt = get_stt_service(
-                voice_name=decode_voice_name(args.voice_name).value,
-                fallback_attempt=fallback_attempt,
-            )
+            # Get STT service for current fallback attempt
+            try:
+                stt = get_stt_service(
+                    voice_name=decode_voice_name(args.voice_name).value,
+                    fallback_attempt=fallback_attempt,
+                )
+            except Exception as e:
+                logger.error(
+                    f"STT fallback: Failed to initialize provider for attempt #{fallback_attempt}: {e}"
+                )
+                continue  # Try next provider
@@
-        except Exception as e:
-            logger.error(f"STT fallback: Non-STT error occurred: {e}")
-            raise
+        except Exception as e:
+            logger.error(f"STT fallback: Non-STT error occurred: {e}")
+            raise

Also applies to: 295-303

🤖 Prompt for AI Agents
In app/agents/voice/automatic/__init__.py around lines 283-286 (and similarly
295-303) the code calls get_stt_service directly and will abort the fallback
loop if provider initialization throws (missing key/bad config); wrap each
get_stt_service call in a try/except that catches initialization errors, logs
the provider name and error, and then continues the provider loop to try the
next provider (do not raise or exit); ensure any per-attempt counters
(fallback_attempt) are updated as before and that the exception path cleans
up/returns to the loop rather than terminating the whole flow.

Comment on lines +500 to +503
# Create session context for tools and services
session_context = create_session_context(args)

# Decode TTS configuration
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wrong argument passed to create_session_context.

It expects a session_id string, not the entire args object.

-    session_context = create_session_context(args)
+    session_context = create_session_context(args.session_id)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Create session context for tools and services
session_context = create_session_context(args)
# Decode TTS configuration
# Create session context for tools and services
session_context = create_session_context(args.session_id)
# Decode TTS configuration
🤖 Prompt for AI Agents
In app/agents/voice/automatic/__init__.py around lines 500 to 503,
create_session_context is being called with the entire args object but it
expects a session_id string; change the call to pass the session identifier
(e.g. args.session_id or the appropriate session id variable) instead of args so
the function receives the expected string; ensure the variable you pass is a
string and exists on args (add a fallback or validation if needed).

Comment on lines +646 to +651
# Add STT Error Detector for runtime fallback support
if enable_stt_fallback and restart_callback:
stt_error_detector = STTErrorDetector(restart_callback, "STTErrorDetector")
pipeline_components.append(stt_error_detector)
logger.info("STT Error Detector enabled for runtime fallback")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Detector placement bug: STTErrorDetector must be after STT to see STT-originated ErrorFrames.

Placed before STT, it will miss STT errors; fallback never triggers. Move it right after STT.

-    # Add STT Error Detector for runtime fallback support
-    if enable_stt_fallback and restart_callback:
-        stt_error_detector = STTErrorDetector(restart_callback, "STTErrorDetector")
-        pipeline_components.append(stt_error_detector)
-        logger.info("STT Error Detector enabled for runtime fallback")
-
-    # Add PTT VAD filter only if it's enabled
-    if config.DISABLE_VAD_FOR_PTT:
-        ptt_vad_filter = PTTVADFilter("PTTVADFilter")
-        pipeline_components.append(ptt_vad_filter)  # Filter VAD frames after STT
-
-    pipeline_components.append(stt)
+    # Add PTT VAD filter only if it's enabled
+    if config.DISABLE_VAD_FOR_PTT:
+        ptt_vad_filter = PTTVADFilter("PTTVADFilter")
+        pipeline_components.append(ptt_vad_filter)  # Filter VAD frames before STT
+
+    # STT service
+    pipeline_components.append(stt)
+
+    # Add STT Error Detector for runtime fallback support (must be after STT)
+    if enable_stt_fallback and restart_callback:
+        stt_error_detector = STTErrorDetector(restart_callback, "STTErrorDetector")
+        pipeline_components.append(stt_error_detector)
+        logger.info("STT Error Detector enabled for runtime fallback")

Also applies to: 657-657

🤖 Prompt for AI Agents
In app/agents/voice/automatic/__init__.py around lines 646-651 (and also apply
the same change at line 657), the STTErrorDetector is being appended before the
STT component so it cannot observe STT-originated ErrorFrames; move the
STTErrorDetector instantiation and pipeline_components.append call so they occur
immediately after the STT component is appended to pipeline_components (ensure
restart_callback check remains), and remove the earlier placement to guarantee
the detector sees STT output and triggers runtime fallback.

Comment on lines +147 to +159

# --- STT Fallback Configuration ---
ENABLE_STT_FALLBACK = (
os.environ.get("ENABLE_STT_FALLBACK", "false").lower() == "true"
) # Enable automatic fallback to alternative STT providers on failure
STT_FALLBACK_PROVIDERS = [
provider.strip().lower()
for provider in os.environ.get(
"STT_FALLBACK_PROVIDERS", "deepgram,openai,assemblyai,google"
).split(",")
if provider.strip()
] # Comma-separated list of fallback providers in priority order

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Harden fallback config: normalize, dedupe, validate providers (and include soniox by default).

Prevents duplicate/unknown providers and whitespace bugs; ensures primary isn’t retried in fallback. Also strips STT_PROVIDER.

Apply:

-# --- STT Fallback Configuration ---
-ENABLE_STT_FALLBACK = (
-    os.environ.get("ENABLE_STT_FALLBACK", "false").lower() == "true"
-)  # Enable automatic fallback to alternative STT providers on failure
-STT_FALLBACK_PROVIDERS = [
-    provider.strip().lower()
-    for provider in os.environ.get(
-        "STT_FALLBACK_PROVIDERS", "deepgram,openai,assemblyai,google"
-    ).split(",")
-    if provider.strip()
-]  # Comma-separated list of fallback providers in priority order
+# --- STT Fallback Configuration ---
+ALLOWED_STT_PROVIDERS = {"google", "assemblyai", "openai", "deepgram", "soniox"}
+ENABLE_STT_FALLBACK = os.environ.get("ENABLE_STT_FALLBACK", "false").lower() == "true"
+
+# Normalize primary
-STT_PROVIDER = os.environ.get(
-    "STT_PROVIDER", "google"
-).lower()  # "google", "assemblyai", "openai", "deepgram", or "soniox"
+STT_PROVIDER = os.environ.get("STT_PROVIDER", "google").strip().lower()
+
+_fallback_raw = os.environ.get(
+    "STT_FALLBACK_PROVIDERS", "deepgram,openai,assemblyai,google,soniox"
+)
+_fallback = [p.strip().lower() for p in _fallback_raw.split(",") if p.strip()]
+# keep order, drop dups/unknowns/primary
+seen = set()
+STT_FALLBACK_PROVIDERS = [
+    p for p in _fallback
+    if p in ALLOWED_STT_PROVIDERS and p != STT_PROVIDER and (p not in seen and not seen.add(p))
+]

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In app/core/config.py around lines 147 to 159, the STT fallback list needs to be
hardened: normalize (strip and lower), dedupe while preserving priority order,
validate entries against an allowed set (e.g.,
{"deepgram","openai","assemblyai","google","soniox"}) and ensure "soniox" is
included by default if not present; also remove the current primary STT_PROVIDER
(strip+lower) from the fallback list so we never retry the primary provider.
Implement this by reading STT_FALLBACK_PROVIDERS, splitting on commas, mapping
strip().lower(), filtering out empty strings and the normalized primary
provider, building a deduplicated ordered list (preserve first occurrence),
validating each against the allowed set and ignoring unknowns (optionally log or
warn), and finally ensuring "soniox" appears at the end or in default position
if absent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant