-
Notifications
You must be signed in to change notification settings - Fork 52
Support for template specific ambeience background with latency improvements #459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for template specific ambeience background with latency improvements #459
Conversation
WalkthroughThe PR adds background audio support to the Breeze Buddy agent by introducing template configuration options, a background sound mixer utility, and integrating the mixer into the WebSocket transport pipeline. Additionally, it extends Azure LLM service configuration, adds observability support via environment-aware observer integration, and introduces three new dynamic configuration accessors for LLM parameters. Changes
Sequence Diagram(s)sequenceDiagram
participant Template as Template Config
participant Agent as Breeze Buddy Agent
participant Mixer as Background Mixer
participant Pipeline as Pipeline
participant Transport as WebSocket Transport
participant Observers as Observers
Template->>Agent: Load template config
activate Agent
Agent->>Mixer: create_background_sound_mixer()
activate Mixer
Mixer->>Mixer: Validate enable_background_sound
Mixer->>Mixer: Resolve sound file path
Mixer->>Mixer: Create SoundfileMixer<br/>with volume
Mixer->>Agent: Return mixer instance
deactivate Mixer
Agent->>Transport: Configure with audio_out_mixer
activate Transport
Transport->>Transport: Wire mixer to output
deactivate Transport
Agent->>Agent: Check ENVIRONMENT (DEV mode)
rect rgb(200, 220, 255)
Note over Agent,Observers: Environment-aware observer setup
Agent->>Observers: Instantiate observers<br/>(Metrics, LLM, Transcription, Latency, etc.)
Observers->>Agent: Ready
end
Agent->>Pipeline: Create PipelineParams<br/>with observers, audio_out_mixer
activate Pipeline
Pipeline->>Pipeline: Initialize with config
deactivate Pipeline
deactivate Agent
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
app/ai/voice/agents/breeze_buddy/template/types.py (1)
26-28: LGTM! Consider adding validation for volume range.The new background sound configuration fields are properly typed. However,
background_sound_volumecould benefit from validation to ensure it stays within a reasonable range (e.g., 0.0 to 5.0) to prevent audio clipping or muted output.🔎 Optional: Add a Pydantic validator for volume range
+from pydantic import field_validator + class ConfigurationModel(BaseModel): tts_voice_name: Optional[TTSVoiceName] = None stt_language: Optional[str] = None payload_based_language_selection: bool = False enable_background_sound: bool = False background_sound_file: Optional[str] = None background_sound_volume: float = 2.0 + + @field_validator("background_sound_volume") + @classmethod + def validate_volume(cls, v: float) -> float: + if v < 0.0 or v > 5.0: + raise ValueError("background_sound_volume must be between 0.0 and 5.0") + return vrequirements.txt (1)
52-52: Consider implementing consistent version pinning across requirements.
soundfileis unpinned on line 52, but this is part of a broader inconsistency in the file. Core dependencies likenumpy,loguru,asyncpg,boto3, andpydubare also unpinned, while others likegoogle-cloud-storage>=2.10.0andfastapi==0.115.12are pinned. A systematic approach to version pinning across critical dependencies would improve reproducibility.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
app/ai/voice/agents/breeze_buddy/static/audio/office-ambience.mp3is excluded by!**/*.mp3
📒 Files selected for processing (6)
app/ai/voice/agents/breeze_buddy/agent.pyapp/ai/voice/agents/breeze_buddy/examples/templates/order-confirmation.jsonapp/ai/voice/agents/breeze_buddy/template/types.pyapp/core/config/dynamic.pyapp/core/config/static.pyrequirements.txt
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-15T04:10:50.071Z
Learnt from: badri-singhal
Repo: juspay/clairvoyance PR: 313
File: app/api/routers/breeze_buddy.py:487-489
Timestamp: 2025-10-15T04:10:50.071Z
Learning: In the Breeze Buddy dashboard (`app/agents/voice/breeze_buddy/workflows/order_confirmation/dashboard.html`), analytics metrics are dynamically rendered using a for loop in the HTML script. When adding new metrics to the JSON response in the `get_analytics()` endpoint, no manual HTML template updates are required as the loop handles all metrics automatically.
Applied to files:
app/ai/voice/agents/breeze_buddy/examples/templates/order-confirmation.json
🧬 Code graph analysis (2)
app/core/config/dynamic.py (1)
app/services/live_config/store.py (1)
get_config(211-235)
app/ai/voice/agents/breeze_buddy/agent.py (1)
app/ai/voice/agents/breeze_buddy/template/context.py (2)
vad_analyzer(42-44)transport(62-64)
🔇 Additional comments (7)
app/core/config/dynamic.py (1)
135-137: Verify the impact of disabling TTS preprocessing by default.The default value for
BB_SARVAM_TTS_ENABLE_PREPROCESSINGchanged fromTruetoFalse. This behavioral change will disable TTS preprocessing for all deployments that don't explicitly set this config in Redis or environment variables.Ensure this is the intended behavior and that existing deployments relying on the previous default have been updated accordingly.
app/core/config/static.py (2)
299-304: LGTM! New Azure LLM configuration parameters added.The new
AZURE_BREEZE_BUDDY_MAX_COMPLETION_TOKENS(default 50) andAZURE_BREEZE_BUDDY_TEMPERATURE(default 0.7) parameters are properly typed with int/float conversions and sensible defaults for voice agent responses.Verify that
max_completion_tokens=50is sufficient for the expected response lengths in the Breeze Buddy voice agent. Short token limits may cause truncated responses in edge cases.
438-441: Verify the impact of enabling non-final tokens by default.The default for
BREEZE_BUDDY_SONIOX_ENABLE_NON_FINAL_TOKENSchanged from"false"to"true". This enables interim/streaming transcription tokens, which provides real-time feedback but may also affect latency and behavior of downstream components that process transcription results.Ensure existing deployments are prepared for this behavioral change.
app/ai/voice/agents/breeze_buddy/agent.py (3)
369-371: Settingaggregation_timeout=0.0for immediate transcription delivery.This change delivers transcription data to the LLM immediately without buffering. While this reduces latency, it may result in more fragmented transcription chunks being sent to the LLM, potentially affecting response quality for longer utterances.
Confirm that immediate transcription delivery (no aggregation timeout) works well with the Soniox STT provider and doesn't cause issues with partial/incomplete transcriptions being processed.
399-405: Multiple observers added for comprehensive observability.Good addition of observability hooks for metrics, LLM activity, transcriptions, latency, and conversation flow tracking. These will be valuable for debugging and monitoring.
353-354: Azure LLM configured with new token and temperature parameters.The LLM service now uses the configurable
max_completion_tokensandtemperaturefrom environment variables, allowing runtime tuning of response behavior.app/ai/voice/agents/breeze_buddy/examples/templates/order-confirmation.json (1)
40-43: No action needed. The background audio fileoffice-ambience.mp3exists at the expected path (app/ai/voice/agents/breeze_buddy/static/audio/office-ambience.mp3), and the agent properly handles missing files with a warning log rather than failing silently. The volume default of2.0is set intentionally in the ConfigurationModel.Likely an incorrect or invalid review comment.
| # Create audio mixer for background sound from template configuration | ||
| audio_out_mixer = None | ||
| if ( | ||
| template | ||
| and template.configurations | ||
| and template.configurations.enable_background_sound | ||
| ): | ||
| background_sound_file = template.configurations.background_sound_file | ||
| background_sound_volume = template.configurations.background_sound_volume | ||
|
|
||
| if background_sound_file: | ||
| # Construct full path to audio file | ||
| full_audio_path = os.path.join( | ||
| os.path.dirname(__file__), "static", "audio", background_sound_file | ||
| ) | ||
|
|
||
| if os.path.exists(full_audio_path): | ||
| audio_out_mixer = SoundfileMixer( | ||
| sound_files={"background": full_audio_path}, | ||
| default_sound="background", | ||
| volume=background_sound_volume, | ||
| ) | ||
| logger.info( | ||
| f"Background sound mixer enabled: file={background_sound_file}, " | ||
| f"volume={background_sound_volume}" | ||
| ) | ||
| else: | ||
| logger.warning( | ||
| f"Background sound enabled but file not found: {full_audio_path}" | ||
| ) | ||
| else: | ||
| logger.warning( | ||
| "Background sound enabled but no file specified in template" | ||
| ) | ||
| else: | ||
| logger.info("Background sound mixer disabled (not configured in template)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the full context of the agent.py file, especially around the audio_out_mixer usage
cd app/ai/voice/agents/breeze_buddy/ && wc -l agent.pyRepository: juspay/clairvoyance
Length of output: 75
🏁 Script executed:
# Read the relevant section and look at how audio_out_mixer is used after line 308
sed -n '273,330p' agent.py | cat -nRepository: juspay/clairvoyance
Length of output: 114
🏁 Script executed:
# Find the import statement for SoundfileMixer
grep -n "SoundfileMixer" agent.py | head -5Repository: juspay/clairvoyance
Length of output: 104
🏁 Script executed:
# Search for other usages of SoundfileMixer in the entire codebase
rg "SoundfileMixer" --type py -A 3 -B 1Repository: juspay/clairvoyance
Length of output: 1042
🏁 Script executed:
# Read the section around where audio_out_mixer is created and used
sed -n '300,330p' app/ai/voice/agents/breeze_buddy/agent.py | cat -nRepository: juspay/clairvoyance
Length of output: 1427
🏁 Script executed:
# Search for where audio_out_mixer is referenced/used in the file
rg "audio_out_mixer" app/ai/voice/agents/breeze_buddy/agent.py -n -B 2 -A 2Repository: juspay/clairvoyance
Length of output: 727
🏁 Script executed:
# Search for other SoundfileMixer instantiation patterns in the codebase to see how it's typically used
rg "SoundfileMixer\(" --type py -A 5 -B 1Repository: juspay/clairvoyance
Length of output: 694
🏁 Script executed:
# Check if there are any pipecat examples or documentation files about SoundfileMixer
find . -type f \( -name "*.md" -o -name "*.rst" -o -name "*.txt" \) -exec grep -l "SoundfileMixer\|audio_out_mixer" {} \;Repository: juspay/clairvoyance
Length of output: 45
🌐 Web query:
pipecat SoundfileMixer audio_out_mixer automatic mixing
💡 Result:
Brief summary — pipecat SoundfileMixer & audio_out_mixer (automatic mixing):
- SoundfileMixer mixes transport output with audio files (loads via python-soundfile), auto-resamples to the transport sample rate, requires mono files, supports multiple named files, looping, and runtime volume/sound switching. [1][2]
- It implements the BaseAudioMixer interface (start/stop/process_frame/mix) and is passed to a transport via the transport param audio_out_mixer (can be a single BaseAudioMixer or a mapping per destination). The transport will call mix() for each outgoing audio chunk so mixing is automatic when configured. [3][2]
- Runtime control: use MixerUpdateSettingsFrame (sound, volume, loop) and MixerEnableFrame to change sound/volume or enable/disable mixing at runtime. [1][2]
- Notes & cautions: files must be mono; volume is realtime-adjustable; there have been reported issues (e.g. memory/blocking with websocket transport in some versions) — test on your target transport and pipecat version. [1][2][4]
If you want, I can show a minimal code example wiring SoundfileMixer into a transport and sending MixerUpdateSettingsFrame/MixerEnableFrame.
Background sound mixer integration is correct, but add audio file format validation.
The implementation properly integrates SoundfileMixer with the transport. When passed via the audio_out_mixer parameter to FastAPIWebsocketTransport, the mixer automatically mixes audio into the output stream—no explicit activation required. The configuration logic with file existence checks and fallback logging is well-structured.
However, SoundfileMixer requires mono audio files. The code should validate that background_sound_file is mono before passing it to the mixer, or document this requirement in the template configuration schema to prevent runtime failures with incompatible audio formats.
🤖 Prompt for AI Agents
In app/ai/voice/agents/breeze_buddy/agent.py around lines 273 to 308, add a
validation step to ensure the background_sound_file is mono before instantiating
SoundfileMixer: after constructing full_audio_path and confirming it exists,
open the file (e.g., via soundfile.info or wave module) to read its channel
count, and if channels != 1 log a warning that the file is not mono and skip
creating the mixer (leave audio_out_mixer as None); only create SoundfileMixer
when the channel count equals 1 so incompatible stereo/multi-channel files do
not cause runtime failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@badri-singhal add this check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
29f73c7 to
865de09
Compare
865de09 to
74e4145
Compare
| "stt_language": "en" | ||
| "stt_language": "en", | ||
| "enable_background_sound": false, | ||
| "background_sound_file": "office-ambience.mp3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
| ) | ||
|
|
||
| # Create audio mixer for background sound from template configuration | ||
| audio_out_mixer = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
74e4145 to
ea79db3
Compare
app/core/config/static.py
Outdated
| AZURE_BREEZE_BUDDY_OPENAI_MODEL = os.environ.get( | ||
| "AZURE_BREEZE_BUDDY_OPENAI_MODEL", "gpt-4o-automatic" | ||
| ) | ||
| AZURE_BREEZE_BUDDY_MAX_COMPLETION_TOKENS = int( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
|
|
||
| def create_background_sound_mixer( | ||
| template, base_audio_path: str | ||
| ) -> Optional[SoundfileMixer]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
app/core/config/static.py
Outdated
| ) | ||
| BREEZE_BUDDY_SONIOX_ENABLE_NON_FINAL_TOKENS = ( | ||
| os.environ.get("BREEZE_BUDDY_SONIOX_ENABLE_NON_FINAL_TOKENS", "false").lower() | ||
| os.environ.get("BREEZE_BUDDY_SONIOX_ENABLE_NON_FINAL_TOKENS", "true").lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
app/core/config/static.py
Outdated
| EXOTEL_SUBDOMAIN = os.getenv("EXOTEL_SUBDOMAIN", "api.exotel.com") | ||
| EXOTEL_APPLET_APP_ID = os.getenv("EXOTEL_APPLET_APP_ID", "1044183") | ||
| EXOTEL_TEMPLATE_APPLET_APP_ID = os.getenv("EXOTEL_TEMPLATE_APPLET_APP_ID", "") | ||
| EXOTEL_TEMPLATE_APPLET_APP_ID = os.getenv("EXOTEL_TEMPLATE_APPLET_APP_ID", "1147907") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
ea79db3 to
9a38523
Compare
|
@coderabbitai please review |
|
@badri-singhal: I'll review the changes in this PR for you. ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
app/ai/voice/agents/breeze_buddy/utils/common.py (1)
362-362: Consider extracting hard-coded path as a module-level constant.The constant
BACKGROUND_SOUND_AUDIO_PATHis defined inside the function. For consistency and maintainability, consider moving it to the module level alongside other constants.🔎 Proposed refactor
At the top of the file after imports:
+# Constants +BACKGROUND_SOUND_AUDIO_PATH = "app/ai/voice/agents/breeze_buddy/static/audio" + def indian_number_to_speech(number: int) -> str:Then in the function:
def create_background_sound_mixer(template) -> Optional[SoundfileMixer]: """ Create a background sound mixer from template configuration. Args: template: Template object with configurations for background sound Returns: SoundfileMixer instance if successfully configured, None otherwise """ - # Constant path for background sound audio files - BACKGROUND_SOUND_AUDIO_PATH = "app/ai/voice/agents/breeze_buddy/static/audio" - # Check if background sound is enabled in template
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
app/ai/voice/agents/breeze_buddy/static/audio/office-ambience.mp3is excluded by!**/*.mp3
📒 Files selected for processing (6)
app/ai/voice/agents/breeze_buddy/agent.pyapp/ai/voice/agents/breeze_buddy/examples/templates/order-confirmation.jsonapp/ai/voice/agents/breeze_buddy/template/types.pyapp/ai/voice/agents/breeze_buddy/utils/common.pyapp/core/config/dynamic.pyrequirements.txt
🚧 Files skipped from review as they are similar to previous changes (3)
- app/core/config/dynamic.py
- app/ai/voice/agents/breeze_buddy/examples/templates/order-confirmation.json
- requirements.txt
🧰 Additional context used
🧬 Code graph analysis (1)
app/ai/voice/agents/breeze_buddy/agent.py (3)
app/ai/voice/agents/breeze_buddy/utils/common.py (1)
create_background_sound_mixer(351-432)app/ai/voice/agents/breeze_buddy/utils/language_utils/prompt_injections.py (1)
inject_language_rules(8-61)app/core/config/dynamic.py (3)
BREEZE_BUDDY_AZURE_MAX_COMPLETION_TOKENS(152-154)BREEZE_BUDDY_AZURE_TEMPERATURE(157-159)BREEZE_BUDDY_LLM_AGGREGATION_TIMEOUT(162-164)
🪛 Ruff (0.14.10)
app/ai/voice/agents/breeze_buddy/utils/common.py
426-426: Consider moving this statement to an else block
(TRY300)
428-428: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (5)
app/ai/voice/agents/breeze_buddy/utils/common.py (1)
351-432: LGTM! Mono channel validation properly implemented.The background sound mixer creation function is well-structured with comprehensive validation:
- Configuration checks before file operations
- File existence validation
- Mono channel requirement enforcement (lines 407-414) as requested in past review
- Graceful fallback with appropriate logging at each failure point
The mono channel validation successfully prevents runtime failures when incompatible audio files are configured.
app/ai/voice/agents/breeze_buddy/template/types.py (1)
22-25: LGTM! Clear enum definition.The
BackgroundSoundFileenum is well-defined with a descriptive docstring. Starting with a single value (OFFICE_AMBIENCE) is appropriate for the initial implementation.app/ai/voice/agents/breeze_buddy/agent.py (3)
276-277: LGTM! Background mixer integration is correct.The background sound mixer is created from the template configuration and will be automatically integrated by the transport when passed via
audio_out_mixerparameter (line 286). The past review correctly confirmed that mono channel validation has been implemented in thecreate_background_sound_mixerfunction.
322-323: LGTM! Dynamic configuration properly integrated.The Azure LLM service and aggregation parameters now use dynamic configuration accessors, allowing runtime adjustments without code changes. All async calls are properly awaited.
Also applies to: 339-339
362-376: LGTM! Environment-aware observer setup is appropriate.The conditional observer setup ensures that detailed logging and metrics are only collected in development environments, which is the correct approach to:
- Reduce overhead in production
- Avoid excessive logging in production environments
- Maintain observability during development and testing
The environment check for
"dev"or"development"with clear logging is well-implemented.
| enable_background_sound: bool = False | ||
| background_sound_file: Optional[BackgroundSoundFile] = None | ||
| background_sound_volume: float = 2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for SoundfileMixer documentation and usage in the codebase
rg -i "soundfile.*mixer|mixer.*volume" -A 5 -B 2 --type=py | head -100Repository: juspay/clairvoyance
Length of output: 2687
🏁 Script executed:
#!/bin/bash
# Look for volume configuration or documentation files
fd -type f -name "*.md" -o -name "*.py" | xargs rg -l "soundfile\|mixer" | head -20Repository: juspay/clairvoyance
Length of output: 233
🌐 Web query:
pipecat SoundfileMixer volume parameter recommended values best practices
💡 Result:
Summary — recommended values and best practices for SoundfileMixer volume
- Recommended default: 0.4 (library default). [1][2]
- Allowed range: typically 0.0–1.0 (normalized scale), but the implementation accepts values >1.0 (example uses 2.0). Use >1.0 only if you intentionally need gain and you handle clipping. [1][2]
- Loudness / measurement: use pipecat.audio.utils.calculate_audio_volume (EBU‑R128 based, returns 0..1) to measure and normalize file/stream loudness before mixing. [3]
- Practical best practices:
- Normalize sound files to a target loudness (use EBU R128 or calculate_audio_volume) so same perceived level across files. [3]
- Keep music/ambience volume well below 1.0 when mixing with voice (start ~0.2–0.6) and adjust by ear or LUFS measurements to avoid masking speech. [1][3]
- Avoid abrupt jumps: apply smoothing when changing volume (use exponential smoothing or gradually update via MixerUpdateSettingsFrame). [1][3]
- Ensure sound files are mono and sample-rate matched; looping and runtime control available via MixerUpdateSettingsFrame / MixerEnableFrame. [1][2]
- Monitor for clipping when using volume >1.0 and prefer pre‑processing gain rather than extreme runtime gain. [1][2][3]
Sources
[1] Pipecat SoundfileMixer docs (constructor, default volume, runtime frames, example using 2.0).
[2] Pipecat API reference for soundfile_mixer (volume documented, default 0.4, recommended range noted).
[3] pipecat.audio.utils.calculate_audio_volume (EBU R128 loudness normalization and 0..1 normalized output).
Reduce default volume of 2.0 to align with audio mixing best practices.
The volume default of 2.0 is too high for background sound mixing with voice. Pipecat's SoundfileMixer library default is 0.4, and recommended practice for background sound/music mixed with voice speech is 0.2–0.6 to avoid masking the primary audio. The current value would significantly overpower voice output. Change to a value in the 0.2–0.6 range (recommend 0.4 to match library default).
🤖 Prompt for AI Agents
In app/ai/voice/agents/breeze_buddy/template/types.py around lines 32 to 34, the
default background_sound_volume is set to 2.0 which is far above recommended
mixing levels; change the default to a value between 0.2 and 0.6 (recommended
0.4 to match Pipecat SoundfileMixer) so background audio doesn't overpower
voice—update the field default accordingly and run tests/quick manual audio
check to confirm acceptable mixing.
Summary by CodeRabbit
Release Notes
New Features
Configuration
✏️ Tip: You can customize this high-level summary in your review settings.