-
Notifications
You must be signed in to change notification settings - Fork 49
Integration of Fish AI TTS #424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release
Are you sure you want to change the base?
Integration of Fish AI TTS #424
Conversation
WalkthroughThis change introduces support for a new Fish AI text-to-speech provider. It adds the FISH provider enum member, creates dynamic configuration accessors for Fish-specific parameters, introduces the FISH_API_KEY static configuration, and implements Fish TTS service instantiation within the existing provider selection logic. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
app/ai/voice/agents/automatic/tts/__init__.py(4 hunks)app/ai/voice/agents/automatic/types/models.py(1 hunks)app/core/config/dynamic.py(1 hunks)app/core/config/static.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
app/core/config/dynamic.py (1)
app/services/live_config/store.py (1)
get_config(211-235)
app/ai/voice/agents/automatic/tts/__init__.py (2)
app/core/config/dynamic.py (8)
FISH_TTS_LANGUAGE_CODE(95-97)FISH_TTS_LATENCY(100-102)FISH_TTS_MODEL_ID(90-92)FISH_TTS_NORMALIZE(115-117)FISH_TTS_OUTPUT_FORMAT(120-122)FISH_TTS_PROSODY_SPEED(105-107)FISH_TTS_PROSODY_VOLUME(110-112)FISH_TTS_REFERENCE_ID(85-87)app/ai/voice/agents/automatic/types/models.py (1)
TTSProvider(8-12)
🪛 Ruff (0.14.8)
app/ai/voice/agents/automatic/tts/__init__.py
107-107: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (8)
app/ai/voice/agents/automatic/types/models.py (1)
12-12: LGTM!The new
FISHenum member follows the established pattern of the existing TTS providers.app/core/config/static.py (1)
84-86: LGTM!The
FISH_API_KEYconfiguration follows the same optional pattern asELEVENLABS_API_KEY, with runtime validation inget_tts_servicewhen the Fish provider is selected.app/core/config/dynamic.py (2)
90-122: LGTM!The remaining Fish TTS config accessors follow the established pattern from the Sarvam TTS configuration, with sensible defaults.
84-88: No action needed. The emptyFISH_TTS_REFERENCE_IDdefault is safe and intentional—Fish Audio TTS API treatsreference_idas optional, defaulting to a built-in voice when not provided or left empty.Likely an incorrect or invalid review comment.
app/ai/voice/agents/automatic/tts/__init__.py (4)
4-4: LGTM!The import of
FishAudioTTSServicefrom pipecat follows the established pattern for other TTS services.
104-108: API key validation looks good.The validation pattern matches the Sarvam implementation. The static analysis hint about the long exception message (TRY003) is acceptable here for consistency with the existing code style at line 72.
119-130: LGTM!The language enum conversion with proper error handling and fallback to
Language.EN_INis well implemented.
146-151: Good observability with comprehensive logging.The log statement captures all relevant Fish TTS parameters for debugging and monitoring.
| service = FishAudioTTSService( | ||
| api_key=FISH_API_KEY, | ||
| reference_id=fish_tts_reference_id, | ||
| model_id=fish_tts_model_id, | ||
| output_format=fish_tts_output_format, | ||
| sample_rate=SAMPLE_RATE, | ||
| params=FishAudioTTSService.InputParams( | ||
| language=fish_language, | ||
| latency=fish_tts_latency, | ||
| normalize=fish_tts_normalize, | ||
| prosody_speed=fish_tts_prosody_speed, | ||
| prosody_volume=fish_tts_prosody_volume, | ||
| ), | ||
| ) | ||
| logger.info( | ||
| f"Using Fish AI TTS service with reference_id={fish_tts_reference_id}, " | ||
| f"model_id={fish_tts_model_id}, output_format={fish_tts_output_format}, " | ||
| f"sample_rate={SAMPLE_RATE}, language={fish_language}, " | ||
| f"speed={fish_tts_prosody_speed}, volume={fish_tts_prosody_volume}" | ||
| ) | ||
| return service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, locate the FishAudioTTSService class definition
fd -e py -type f | xargs rg -l "class FishAudioTTSService"Repository: juspay/clairvoyance
Length of output: 235
🏁 Script executed:
# Search for text_filters usage across TTS services
rg -n "text_filters" --type pyRepository: juspay/clairvoyance
Length of output: 402
🏁 Script executed:
# Check the actual instantiation of ElevenLabsTTSService and GoogleTTSService in the file
cat -n app/ai/voice/agents/automatic/tts/__init__.py | sed -n '150,220p'Repository: juspay/clairvoyance
Length of output: 2537
🏁 Script executed:
# Search for FishAudioTTSService class definition
rg -n "class FishAudioTTSService" -A 20Repository: juspay/clairvoyance
Length of output: 45
🏁 Script executed:
# Check where FishAudioTTSService is imported from
rg -n "from.*FishAudioTTSService|import.*FishAudioTTSService" --type pyRepository: juspay/clairvoyance
Length of output: 168
🏁 Script executed:
# Look at the beginning of the __init__.py file to see imports
head -30 app/ai/voice/agents/automatic/tts/__init__.pyRepository: juspay/clairvoyance
Length of output: 1038
🏁 Script executed:
# Check project dependencies to understand pipecat version
find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "Pipfile" | head -5Repository: juspay/clairvoyance
Length of output: 104
🏁 Script executed:
# Look for requirements or setup files
rg "pipecat" --type py -g '*requirements*|setup*|pyproject*|Pipfile*' -B 2 -A 2Repository: juspay/clairvoyance
Length of output: 211
🏁 Script executed:
# Search for any pipecat version specifications
rg "pipecat.*==" -m 5Repository: juspay/clairvoyance
Length of output: 45
🏁 Script executed:
# Check requirements.txt for pipecat version
cat requirements.txt | grep -i pipecatRepository: juspay/clairvoyance
Length of output: 224
🏁 Script executed:
# Also check requirements-dev.txt
cat requirements-dev.txt | grep -i pipecatRepository: juspay/clairvoyance
Length of output: 45
🌐 Web query:
pipecat FishAudioTTSService text_filters parameter documentation
💡 Result:
The FishAudioTTSService accepts a text_filters parameter (type: Sequence[BaseTextFilter]) which is a list of text-filter objects applied after text aggregation; it transforms/cleans streaming text before TTS. A single text_filter argument also exists but is deprecated in favor of text_filters. See docs for the TTSService constructor and the FishAudio TTS implementation for details and examples of filters/aggregators. [1][2][3]
Sources:
- TTSService API (text_filters / text_filter description). [1]
- FishAudioTTSService module (Fish Audio TTS implementation). [2]
- Pipecat text aggregators & filters overview (MarkdownTextFilter, BaseTextFilter usage). [3]
🌐 Web query:
pipecat fish tts service constructor parameters
💡 Result:
Constructor signature (keyword-only):
FishAudioTTSService(*, api_key: str, reference_id: str | None = None, model: str | None = None, model_id: str = 'speech-1.5', output_format: Literal['opus','mp3','pcm','wav'] = 'pcm', sample_rate: int | None = None, params: InputParams | None = None, **kwargs).
InputParams fields (keyword-only Pydantic model):
- language: Language | None (default Language.EN).
- latency: str | None ('normal' or 'balanced', default 'normal').
- normalize: bool | None (default True).
- prosody_speed: float | None (speech speed multiplier, default 1.0).
- prosody_volume: int | None (dB adjustment, default 0).
Notes: "model" is deprecated in favor of reference_id; model_id selects Fish model (e.g., "speech-1.5"); output_format and sample_rate control audio encoding. [1][2]
Sources:
[1] Pipecat Fish TTS API docs (FishAudioTTSService & InputParams).
[2] Pipecat reference/source implementation for pipecat.services.fish.tts.
Add text_filters parameter to FishAudioTTSService instantiation for consistency.
FishAudioTTSService supports the text_filters parameter (via base TTSService class), but it is not currently passed at line 132-152. Both ElevenLabsTTSService (line 166) and GoogleTTSService (line 204) receive text_filters for text filtering; FishAudioTTSService should also be updated to accept it for feature parity.
🤖 Prompt for AI Agents
In app/ai/voice/agents/automatic/tts/__init__.py around lines 132 to 152, the
FishAudioTTSService instantiation omits the text_filters parameter causing
inconsistent behavior with other TTS services; update the constructor call to
pass text_filters (the same variable used for ElevenLabs and Google TTS) into
FishAudioTTSService so the service receives text filtering settings, and adjust
the InputParams or surrounding args if needed to match the parameter name
expected by the base TTSService.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't support the text filters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Swetha-160303, thank you for clarifying! I understand that FishAudioTTSService doesn't support the text_filters parameter in practice. I've noted this for future reference.
✏️ Learnings added
Learnt from: Swetha-160303
Repo: juspay/clairvoyance PR: 424
File: app/ai/voice/agents/automatic/tts/__init__.py:132-152
Timestamp: 2025-12-12T10:56:50.984Z
Learning: In app/ai/voice/agents/automatic/tts/__init__.py, FishAudioTTSService does not support the text_filters parameter, unlike ElevenLabsTTSService and GoogleTTSService which do support it.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.