Enhance ASR services configuration and provider management#289
Enhance ASR services configuration and provider management#289AnkushMalaker wants to merge 1 commit intodevfrom
Conversation
- Updated `config.yml.template` to include capabilities for ASR providers, detailing features like word timestamps and speaker segments. - Added a new `vibevoice` provider configuration for Microsoft VibeVoice ASR, supporting speaker diarization. - Enhanced `.env.template` with clearer provider selection and model configuration options, including CUDA settings and voice activity detection. - Improved `docker-compose.yml` to support multiple ASR providers with detailed service configurations. - Introduced common utilities for audio processing and ASR service management in the `common` module, enhancing code reusability and maintainability. - Updated `README.md` to reflect the new provider-based architecture and usage instructions for starting different ASR services.
📝 WalkthroughWalkthroughThis pull request introduces a provider-based ASR architecture replacing the monolithic offline setup. It adds support for multiple speech-to-text providers (faster-whisper, NeMo, Transformers, VibeVoice), shared infrastructure components, configuration-driven provider selection, Docker services for each provider, and comprehensive test coverage with protocol validation and GPU-based integration tests. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
| Metric | Count |
|---|---|
| ✅ Passed | 87 |
| ❌ Failed | 33 |
| 📊 Total | 120 |
📊 View Reports
GitHub Pages (Live Reports):
Download Artifacts:
- robot-test-reports-html-no-api - HTML reports
- robot-test-results-xml-no-api - XML output
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 13
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/tags.md (1)
200-200:⚠️ Potential issue | 🟡 MinorStale tag count: "14" should be "15".
Line 200 still says "any tags other than the 14 approved tags above" but the document now defines 15 approved tags (updated on Line 272). Similarly, Line 236 references "existing 11 tags" which is also outdated.
📝 Proposed fix
At line 200:
-**DO NOT create or use any tags other than the 14 approved tags above.** +**DO NOT create or use any tags other than the 15 approved tags above.**At line 236:
-1. Can I use one of the existing 11 tags? +1. Can I use one of the existing 15 tags?
🤖 Fix all issues with AI agents
In `@extras/asr-services/init.py`:
- Around line 155-167: The current comparison uses string `cuda_ver` (e.g.,
"12.10") and lexicographic comparisons which are incorrect for multi-digit
minors; instead convert `major` and `minor` to integers (e.g., major_i, minor_i)
and compare numeric tuples like (major_i, minor_i) against (12,8), (12,6),
(12,1) in the same branching where `match = re.search(...)` and `cuda_ver` is
built; replace the string comparisons (`if cuda_ver >= "12.8": ...`) with tuple
comparisons (e.g., `if (major_i, minor_i) >= (12, 8): return "cu128"`, etc.) and
keep the same fallback behavior and exception handling.
- Around line 290-329: The provider-specific configs in setup_provider_config
hardcode DEVICE="cuda" for "faster-whisper", "vibevoice", and "transformers"
which ignores the detected device (macOS/CPU) set by setup_cuda_version; change
these branches to use the detected device instead of "cuda" (e.g., set
self.config["DEVICE"] = self.config.get("DEVICE") or call/ensure
setup_cuda_version() was run and then assign self.config["DEVICE"] accordingly)
and remove the unconditional "cuda" literal so providers fall back to CPU/macOS
when no GPU is present.
In `@extras/asr-services/providers/faster_whisper/service.py`:
- Around line 48-55: Replace deprecated asyncio.get_event_loop() calls with
asyncio.get_running_loop() in async contexts: update the warmup() method in
faster_whisper.service (where loop = asyncio.get_event_loop() is used before
await loop.run_in_executor(...) and FasterWhisperTranscriber.load_model is
invoked), any transcribe() async methods in service classes, and async timing
functions like run_mic_transcription() in client.py; ensure you call
asyncio.get_running_loop() and keep using loop.run_in_executor(...) for
offloading blocking work so the code runs without DeprecationWarnings or future
errors.
In `@extras/asr-services/providers/nemo/transcriber.py`:
- Around line 92-96: The current async transcribe method in transcriber.py is
calling the blocking NeMo call self.model.transcribe(...) inside async with
self._lock, which blocks the event loop; change it to offload the blocking call
to a thread using asyncio.get_event_loop().run_in_executor (or
loop.run_in_executor) so the event loop stays responsive. Keep the async lock
for concurrency control but use await loop.run_in_executor(None, lambda:
self.model.transcribe([audio_file_path], batch_size=1, timestamps=True)) (or
make transcribe synchronous and let NemoService call run_in_executor); ensure
the torch.no_grad() context and any GPU device setup are executed inside the
executor task so GPU/CPU work does not run on the event loop.
In `@extras/asr-services/providers/transformers/Dockerfile`:
- Around line 19-43: The runtime Dockerfile is running as root and installs an
unnecessary git package; create and switch to a non-root user in the runtime
stage (e.g., add a user/group like "app" and use USER app) and remove "git" from
the apt-get install list in the runtime stage where curl and ffmpeg are
installed; ensure the created user's home and /app ownership match ENV PATH
usage and any copied files so the app can run without root, and apply the same
USER + removal of git change consistently in the other provider Dockerfiles
(transformers, nemo, faster_whisper, vibevoice).
In `@extras/asr-services/providers/vibevoice/Dockerfile`:
- Around line 41-44: The Dockerfile currently runs RUN git clone
https://github.com/microsoft/VibeVoice.git /models/vibevoice in the final image
(ENV HF_HOME="/models") which makes builds non-reproducible and forces git into
the runtime image; modify the Dockerfile to clone a specific commit or tag in
the builder stage (after the uv sync step) and replace the runtime-stage git
clone with a COPY --from=builder /models/vibevoice /models/vibevoice, remove git
from the runtime apt-get install, and ensure the clone uses a pinned commit SHA
or tag instead of the default branch to make builds reproducible and reduce
attack surface.
In `@extras/asr-services/providers/vibevoice/transcriber.py`:
- Around line 260-274: Change the two logger.info calls that print raw_output
previews to logger.debug and limit the preview size to avoid excessive logging;
then replace the greedy regexes used to locate the JSON array (the patterns
around raw_output and json_match) with non-greedy patterns and safer matching:
use a non-greedy quantifier (e.g., .*?) with re.DOTALL when searching for the
array of objects (the json_match for the assistant response) and similarly for
the fallback array search, or implement a short-circuit that finds the first '['
and then scans for the matching ']' to avoid catastrophic backtracking; update
references to raw_output and json_match in this block accordingly.
- Around line 285-298: The fallback parsing in _parse_vibevoice_output currently
sets "speaker" to a prefixed string using f"Speaker {seg.get('Speaker', 0)}",
which causes double-prefixing when _map_to_result later wraps it again; change
the fallback to store the raw speaker value (e.g., seg.get("Speaker", 0)
converted to int or kept as numeric) under the segment dict (use a key like
"speaker" or "speaker_raw" consistent with _map_to_result's expectations) so
that _map_to_result (and its speaker_id path) is solely responsible for
formatting the "Speaker {id}" label. Ensure types match what _map_to_result
expects to avoid further conversions.
In `@extras/asr-services/pyproject.toml`:
- Around line 97-103: Remove the unused "diffusers>=0.30.0" entry from both
dependency groups in pyproject.toml: delete the "diffusers>=0.30.0" line under
the transformers array and likewise remove it from the vibevoice dependency
group so neither the transformers nor vibevoice lists include diffusers.
In `@tests/asr/protocol_tests.robot`:
- Around line 159-174: The test "ASR Capabilities Are From Known Set" is using
an incomplete known capabilities list in the variable @{known_caps}, causing
valid provider capabilities to be treated as unknown; update @{known_caps} to
include the additional capabilities returned by providers (add
language_detection, vad_filter, translation, chunked_processing,
speaker_identification, long_form) so the assertion over ${info}[capabilities]
only fails for truly unknown entries. Ensure the list in the test matches all
supported capability strings returned by FasterWhisperService,
TransformersService, NemoService, and VibeVoiceService.
- Around line 90-126: The tests unconditionally assert on ${json}[words] but
must skip when the provider lacks the word_timestamps capability; add a guard at
the start of both test cases (ASR Transcribe Response Words Have Timestamps and
ASR Transcribe Response Words Have Confidence Scores) that checks the provider
capability (e.g., inspect ${capabilities}[word_timestamps] or call an existing
capability-check keyword) and uses "Skip Test" or "Run Keyword Unless" to skip
the test when word_timestamps is false, then keep the existing assertions on
${json}[words], ${first_word}, start/end and confidence as-is; apply the same
conditional pattern used by the segment/diarization tests to both test cases.
In `@tests/libs/mock_asr_server.py`:
- Around line 43-67: The mock response models (HealthResponse, InfoResponse,
WordTimestamp, TranscriptionResult) must match the canonical models: either
import and reuse the classes from common/response_models.py or update the mock
definitions so WordTimestamp.confidence is Optional[float],
TranscriptionResult.segments is typed as List[Segment] (not list[dict]), and
include optional fields speakers: Optional[List[Speaker]], language:
Optional[str], and duration: Optional[float]; update the mock module to
reference the canonical Segment and Speaker types (or import them) so tests
exercise the same schema as production.
In `@tests/resources/asr_keywords.robot`:
- Around line 88-103: The subprocess is setting ASR_MODEL to an empty string
which overrides docker-compose defaults and the @{env_vars} list is unused;
update the Start GPU ASR Service invocation so you remove the dead @{env_vars}
construction and only include ASR_MODEL in the env passed to Run Process when
"${model}" is not empty (keep env:ASR_PORT always), i.e., conditionally add the
ASR_MODEL environment entry in the Run Process call instead of unconditionally
using env:ASR_MODEL=${model} and delete the unused @{env_vars} lines.
🟡 Minor comments (23)
extras/asr-services/providers/__init__.py-1-8 (1)
1-8:⚠️ Potential issue | 🟡 MinorAdd
vibevoiceprovider to the documentation.The PR objectives mention adding "a new
vibevoiceprovider configuration for Microsoft VibeVoice ASR with support for speaker diarization," but this provider is not listed in the module docstring. This omission may confuse developers looking for available providers.📝 Proposed fix to include vibevoice
""" ASR Provider implementations. Available providers: - faster_whisper: Fast Whisper inference using CTranslate2 - transformers: HuggingFace Transformers for general ASR models - nemo: NVIDIA NeMo ASR for Parakeet, Canary, etc. +- vibevoice: Microsoft VibeVoice ASR with speaker diarization """tests/integration/websocket_transcription_e2e_test.robot-294-297 (1)
294-297:⚠️ Potential issue | 🟡 MinorMisleading comment: this allows small overlaps, not gaps.
${start} >= ${prev_end} - 0.1permits the next segment's start to be up to 0.1s before the previous segment's end (i.e., overlap). Gaps (start > prev_end) would pass naturally. Update the comment to match the actual intent.- # Start time must be >= previous end time (allowing small gaps between segments) + # Start time must be >= previous end time (allowing up to 0.1s overlap between segments) Should Be True ${start} >= ${prev_end} - 0.1wizard.py-501-504 (1)
501-504:⚠️ Potential issue | 🟡 Minor
"vibevoice".capitalize()→"Vibevoice", not"VibeVoice".
str.capitalize()will produce "Vibevoice" which doesn't match the product branding "VibeVoice". Consider using a display-name mapping instead.Proposed fix
+ _PROVIDER_DISPLAY = {"parakeet": "Parakeet", "vibevoice": "VibeVoice"} + # Auto-add asr-services if local ASR was chosen (Parakeet or VibeVoice) if transcription_provider in ("parakeet", "vibevoice") and 'asr-services' not in selected_services: - console.print(f"[blue][INFO][/blue] Auto-adding ASR services for {transcription_provider.capitalize()} transcription") + console.print(f"[blue][INFO][/blue] Auto-adding ASR services for {_PROVIDER_DISPLAY.get(transcription_provider, transcription_provider)} transcription") selected_services.append('asr-services')tests/DEBUG_GUIDE.md-167-169 (1)
167-169:⚠️ Potential issue | 🟡 MinorAdd language specifier to fenced code block.
Static analysis (MD040) flags fenced code blocks without a language. Use a plain text identifier for output blocks.
Proposed fix
Expected output: -``` +```text CONFIG_FILE=/app/test-configs/mock-services.yml</details> </blockquote></details> <details> <summary>tests/MOCK_SPEAKER_IMPLEMENTATION.md-204-204 (1)</summary><blockquote> `204-204`: _⚠️ Potential issue_ | _🟡 Minor_ **Hardcoded absolute path leaks local developer environment.** `/home/ankush/workspaces/friend-lite/tests/TODO_MOCK_SPEAKER_RECOGNITION.md` is a machine-specific path that won't work for any other contributor. Use a repository-relative path or remove the reference if the file isn't committed. </blockquote></details> <details> <summary>tests/show_results.py-189-191 (1)</summary><blockquote> `189-191`: _⚠️ Potential issue_ | _🟡 Minor_ **`relative_to(Path.cwd())` raises `ValueError` if the path is not under cwd.** If the script is invoked from a directory outside the test results tree, this will crash. Consider wrapping it or falling back to the absolute path. <details> <summary>Proposed fix</summary> ```diff - print(f"Results from: {colorize(str(xml_file.relative_to(Path.cwd())), Colors.BLUE)}") + try: + display_path = str(xml_file.relative_to(Path.cwd())) + except ValueError: + display_path = str(xml_file) + print(f"Results from: {colorize(display_path, Colors.BLUE)}")extras/asr-services/.env.template-4-8 (1)
4-8:⚠️ Potential issue | 🟡 Minor
vibevoiceis missing from the provider selection comment.Line 7 lists
faster-whisper, transformers, nemobut the PR introduces a dedicatedvibevoiceprovider (with its own docker-compose servicevibevoice-asr, config profile, and README section). Users referencing this template won't knowvibevoiceis an option.Proposed fix
-# Choose one of: faster-whisper, transformers, nemo +# Choose one of: faster-whisper, transformers, nemo, vibevoicetests/run_failed_tests.sh-33-33 (1)
33-33:⚠️ Potential issue | 🟡 MinorHardcoded test count is fragile.
The total
6is hardcoded. If tests are added or removed from thetestsarray, this will be wrong.Proposed fix
- echo "Test $test_count/6: $test_name" + echo "Test $test_count/${`#tests`[@]}: $test_name"tests/scripts/mock_transcription_server.py-170-174 (1)
170-174:⚠️ Potential issue | 🟡 Minor
KeyboardInterruptwill not be caught inside an async coroutine.
asyncio.run()interceptsKeyboardInterruptat the event-loop level and cancels running tasks — it never propagates into the coroutine as aKeyboardInterrupt. This meansserver.stop()on line 174 is effectively dead code. Move the interrupt handling to the synchronous call site (aroundasyncio.run(main())), consistent with how the sibling mock intests/libs/mock_streaming_stt_server.pyhandles it.Proposed fix
async def main(): """Main entry point.""" import argparse parser = argparse.ArgumentParser(description="Mock WebSocket Transcription Server") parser.add_argument("--host", default="0.0.0.0", help="Host to bind to (default: 0.0.0.0)") parser.add_argument("--port", type=int, default=9999, help="Port to bind to (default: 9999)") args = parser.parse_args() server = MockTranscriptionServer(host=args.host, port=args.port) - - try: - await server.run_forever() - except KeyboardInterrupt: - logger.info("⏹️ Keyboard interrupt received") - await server.stop() + await server.run_forever() if __name__ == "__main__": - asyncio.run(main()) + try: + asyncio.run(main()) + except KeyboardInterrupt: + logger.info("⏹️ Keyboard interrupt received, shutting down")tests/show_results.py-152-162 (1)
152-162:⚠️ Potential issue | 🟡 Minor
find_parent_suitemay return a grandparent instead of the immediate parent.
suite.findall(".//test")matches descendants at any depth, so for a nested suite structure likeRoot > SubSuite > Test, this function could returnRootinstead ofSubSuite. This produces incorrect suite paths in the output.A reliable approach is to build a parent map once before traversal:
Proposed fix
Replace the
find_parent_suitefunction and its usage with a pre-built parent map:-def find_parent_suite(root, element): - """Find parent suite of an element""" - for suite in root.findall(".//suite"): - for child in suite: - if child == element: - return suite - # Check nested tests - for test in suite.findall(".//test"): - if test == element: - return suite - return None +def build_parent_map(root): + """Build a mapping from child elements to their parent elements.""" + parent_map = {} + for parent in root.iter(): + for child in parent: + parent_map[child] = parent + return parent_mapThen in
parse_results, build the map once and use it:parent_map = build_parent_map(root) # ... # Replace the while loop (lines 116-122): parent = parent_map.get(test) while parent is not None and parent.tag == "suite": suite_name = parent.get("name") if suite_name: suite_path.insert(0, suite_name) parent = parent_map.get(parent)extras/asr-services/README.md-212-218 (1)
212-218:⚠️ Potential issue | 🟡 MinorClarify NeMo configuration documentation.
The README documents
CHUNKING_ENABLED,MIN_AUDIO_FOR_CHUNKING, andCHUNK_DURATION_SECONDSas NeMo configuration options, but the.env.template(lines 69–73) explicitly states "No additional configuration needed" for NeMo. These variables are actively used in the codebase (parakeet-offline.py,nemo/Dockerfile,docker-compose.yml). Either document them in.env.templatewith their defaults, or clarify in the README that they are internal defaults and not user-configurable.extras/asr-services/init.py-376-391 (1)
376-391:⚠️ Potential issue | 🟡 MinorHardcoded legacy references in next steps regardless of selected provider.
- Step 2 always suggests the parakeet pre-configured profile (
cp configs/parakeet.env .env && docker compose up --build -d nemo-asr), even when the user selected a different provider.- Step 5 references
PARAKEET_ASR_URLregardless of provider, which is a legacy naming artifact.Consider making step 2 provider-aware (or omitting it for non-nemo providers) and using a provider-neutral env var name like
ASR_URL.extras/asr-services/init.py-253-258 (1)
253-258:⚠️ Potential issue | 🟡 MinorEmpty custom model identifier is silently accepted.
If the user selects "Enter custom model URL" and presses Enter without typing anything,
prompt_valuereturns"". This empty string flows intosetup_provider_configand gets written to.envasASR_MODEL=, which will likely cause a startup failure downstream.🛡️ Proposed fix: validate non-empty input
if choice == str(len(models) + 1): # Custom model custom_model = self.prompt_value("Enter model identifier (HuggingFace repo or path)") - return custom_model + if not custom_model.strip(): + self.console.print("[yellow]No model entered, using provider default[/yellow]") + return provider_info["default_model"] + return custom_modelextras/asr-services/providers/vibevoice/__init__.py-1-5 (1)
1-5:⚠️ Potential issue | 🟡 MinorMissing public exports — inconsistent with all other provider packages.
Every other provider package (
faster_whisper,nemo,transformers) re-exports their Service and Transcriber classes via__all__, enabling clean imports likefrom providers.nemo import NemoService. This file should follow the same pattern forVibeVoiceServiceandVibeVoiceTranscriber.♻️ Proposed fix: add exports
""" VibeVoice ASR provider. Dedicated provider for Microsoft's VibeVoice-ASR model with speaker diarization. """ + +from providers.vibevoice.service import VibeVoiceService +from providers.vibevoice.transcriber import VibeVoiceTranscriber + +__all__ = ["VibeVoiceService", "VibeVoiceTranscriber"]config/config.yml.template-158-181 (1)
158-181:⚠️ Potential issue | 🟡 MinorVibeVoice and Parakeet both default to port 8767 — potential conflict.
stt-vibevoicedefaults tohost.docker.internal:8767andstt-parakeet-batchdefaults to172.17.0.1:8767. While the hosts differ, running both on the same machine could still cause a port collision sincehost.docker.internaland172.17.0.1both resolve to the Docker host. Consider using a different default port for one of the providers (e.g.,8768for VibeVoice).tests/configs/mock-vibevoice.yml-62-66 (1)
62-66:⚠️ Potential issue | 🟡 MinorMock config capabilities don't match the real VibeVoice provider config.
The comment on line 62 says "no word-level timestamps," yet
timestampsis listed incapabilities(line 66). Meanwhile, the realstt-vibevoiceentry inconfig/config.yml.template(lines 169–171) only declaressegmentsanddiarization— notimestamps. This mismatch means protocol tests may pass against the mock but fail against the actual provider.Proposed fix: align with the real provider config
# VibeVoice capabilities: segments with speakers, but no word-level timestamps capabilities: - segments - diarization - - timestampsextras/asr-services/providers/faster_whisper/transcriber.py-29-50 (1)
29-50:⚠️ Potential issue | 🟡 Minor
DEVICE_INDEXparsing can raise an unhandledValueError.Line 41:
int(os.getenv("DEVICE_INDEX", "0"))will throw aValueErrorif the env var is set to a non-numeric string. Consider wrapping with a fallback or validating at init.This is a minor risk since it's operator-configured, but it would produce an unhelpful traceback.
🛡️ Proposed fix
- self.device_index = int(os.getenv("DEVICE_INDEX", "0")) + try: + self.device_index = int(os.getenv("DEVICE_INDEX", "0")) + except ValueError: + logger.warning("Invalid DEVICE_INDEX, defaulting to 0") + self.device_index = 0extras/asr-services/providers/vibevoice/transcriber.py-270-278 (1)
270-278:⚠️ Potential issue | 🟡 MinorRemove extraneous
fprefix on line 272.Line 272 is an f-string with no placeholders (Ruff F541).
- logger.warning(f"Output does not match pattern [{{...}}], checking for other formats...") + logger.warning("Output does not match pattern [{...}], checking for other formats...")tests/scripts/verify_mock_servers.py-146-146 (1)
146-146:⚠️ Potential issue | 🟡 MinorUse Pythonic truthiness checks instead of
== False/== True.Per Ruff E712, prefer
not initial_data["is_final"]andif final_data["is_final"].🔧 Proposed fix
- assert initial_data["is_final"] == False + assert not initial_data["is_final"] ... - assert final_data["is_final"] == True + assert final_data["is_final"]Also applies to: 163-163
tests/scripts/verify_mock_servers.py-24-24 (1)
24-24:⚠️ Potential issue | 🟡 MinorNetwork calls lack timeouts — script may hang if servers are down.
urllib.request.urlopenandwebsockets.connecthave no explicit timeouts. If the target server isn't running, the script will hang indefinitely instead of failing fast.🔧 Proposed fix — add timeouts
- with urllib.request.urlopen('http://localhost:11435/v1/models') as response: + with urllib.request.urlopen('http://localhost:11435/v1/models', timeout=10) as response:- async with websockets.connect("ws://localhost:9999") as ws: + async with websockets.connect("ws://localhost:9999", open_timeout=10) as ws:Apply similarly to all
urlopencalls.Also applies to: 56-56, 87-87, 142-142
extras/asr-services/common/base_service.py-156-157 (1)
156-157:⚠️ Potential issue | 🟡 MinorRemove extraneous
fprefix on string without placeholders.Line 157 has an f-string with no interpolation expressions.
- logger.info(f"Transcription request started") + logger.info("Transcription request started")extras/asr-services/common/base_service.py-192-197 (1)
192-197:⚠️ Potential issue | 🟡 MinorChain the re-raised exception with
from efor proper traceback.When wrapping an exception inside an
exceptblock, useraise ... from eto preserve the exception chain and clearly distinguish it from an error in the handler itself. Also,logging.exceptionalready includes the exception info, so passingein the format string is redundant.🐛 Proposed fix
except Exception as e: error_time = time.time() - request_start - logger.exception(f"Error after {error_time:.3f}s: {e}") - raise HTTPException(status_code=500, detail="Transcription failed") + logger.exception(f"Error after {error_time:.3f}s") + raise HTTPException(status_code=500, detail="Transcription failed") from eextras/asr-services/common/audio_utils.py-50-70 (1)
50-70:⚠️ Potential issue | 🟡 MinorMissing clipping before integer conversion can cause overflow.
If input values exceed the
[-1.0, 1.0]range (even slightly, due to floating-point arithmetic during resampling), multiplying byint16.max/int32.maxand casting will silently wrap around, producing audio glitches.🛡️ Proposed fix: clip before conversion
if sample_width == 2: - audio_int = (audio_array * np.iinfo(np.int16).max).astype(np.int16) + audio_clipped = np.clip(audio_array, -1.0, 1.0) + audio_int = (audio_clipped * np.iinfo(np.int16).max).astype(np.int16) return audio_int.tobytes() elif sample_width == 4: - audio_int = (audio_array * np.iinfo(np.int32).max).astype(np.int32) + audio_clipped = np.clip(audio_array, -1.0, 1.0) + audio_int = (audio_clipped * np.iinfo(np.int32).max).astype(np.int32) return audio_int.tobytes()
🧹 Nitpick comments (34)
tests/Makefile (2)
293-316: Consider addingtest-asrandtest-asr-gputo thetest-all-with-slow-and-sdktarget or documenting that GPU ASR tests require a separate invocation.The
test-all-with-slow-and-sdktarget (line 285) runs everything without exclusions over$(TEST_DIR)which now includesasr. This means it will pick uprequires-gputagged tests, potentially failing on machines without a GPU. If that target is intended to be the "run absolutely everything" option, this may be acceptable, but it's worth confirming the intent — especially sinceallexplicitly excludesrequires-gpu.
308-316: Hardcoded port 8767 in the informational note could drift.Line 310 references port 8767. If the ASR service port is configurable or changes, this echo message will become misleading. Consider referencing a variable or the relevant config file instead.
extras/asr-services/providers/__init__.py (1)
1-8: Consider exposing providers via explicit imports or__all__.The package initializer contains only documentation without exposing any public API. Consider adding explicit imports from submodules or defining
__all__to improve provider discoverability and establish a clear public interface for the package.Example: Define __all__ for explicit package exports
""" ASR Provider implementations. Available providers: - faster_whisper: Fast Whisper inference using CTranslate2 - transformers: HuggingFace Transformers for general ASR models - nemo: NVIDIA NeMo ASR for Parakeet, Canary, etc. - vibevoice: Microsoft VibeVoice ASR with speaker diarization """ __all__ = [ "faster_whisper", "transformers", "nemo", "vibevoice", ]Alternatively, if provider classes should be directly importable from the package root, add explicit imports.
wizard.py (1)
45-49: Stale service description — mentions only "Parakeet" but now also supports VibeVoice.The
asr-servicesdescription at line 48 still reads'Offline speech-to-text (Parakeet)'. Since VibeVoice is now a supported provider, consider updating this.Proposed fix
'asr-services': { 'path': 'extras/asr-services', 'cmd': ['uv', 'run', '--with-requirements', '../../setup-requirements.txt', 'python', 'init.py'], - 'description': 'Offline speech-to-text (Parakeet)' + 'description': 'Offline speech-to-text (Parakeet / VibeVoice)' },extras/asr-services/providers/nemo/Dockerfile (3)
27-29:build-essentialin the runtime stage adds ~200 MB+ of build tools.Unless NeMo or
texterrorscompiles C extensions at import/runtime,build-essentialshould stay only in the builder stage. If it is genuinely needed at runtime, add a comment explaining why.
31-34:COPY --from=builder /app /appcopies build artifacts (pyproject.toml, uv.lock) into runtime.The VibeVoice Dockerfile only copies
.venvfrom the builder (COPY --from=builder /app/.venv /app/.venv). Consider aligning for a leaner image, unless additional files from/appare needed at runtime.Proposed fix
-# Copy virtual environment and application -COPY --from=builder /app /app +# Copy virtual environment from builder +COPY --from=builder /app/.venv /app/.venv + +# Copy application code COPY common/ ./common/ COPY providers/nemo/ ./providers/nemo/
1-46: Container runs as root — consider adding a non-rootUSER.Trivy DS002 flags the absence of a non-root user. While GPU/ML containers often run as root, adding a dedicated user improves the security posture. This is common practice for production images.
Proposed fix (at end of runtime stage, before CMD)
+RUN useradd --create-home appuser +USER appuser + CMD ["python", "-m", "providers.nemo.service", "--port", "8765"]extras/asr-services/providers/vibevoice/Dockerfile (1)
1-56: Container runs as root — consider adding a non-rootUSER.Same concern as the NeMo Dockerfile — Trivy DS002 flags the absence of a non-root user. Adding a dedicated user is recommended for production images.
Proposed fix (before CMD)
+RUN useradd --create-home appuser +USER appuser + CMD ["python", "-m", "providers.vibevoice.service", "--port", "8765"]tests/run_failed_tests.sh (1)
64-72: Script always exits 0, even when tests fail.If this script is used in CI, the lack of a non-zero exit code on failure means the pipeline won't detect regressions. Consider exiting with a failure code when
failed_count > 0.Proposed fix
echo "Individual test results saved in: $OUTPUTDIR/" + +if [ "$failed_count" -gt 0 ]; then + exit 1 +fitests/scripts/mock_transcription_server.py (3)
16-21: Usesys.exit(1)instead of bareexit(1).The built-in
exit()is intended for the interactive interpreter.sys.exit()is the correct function for scripts, thoughsyswould need to be imported at the top. Given theimport sysisn't present, consider adding it or relying onraise SystemExit(1).
127-128: Uselogger.exceptionto preserve the traceback.
logging.errordiscards the traceback.logger.exceptionlogs at ERROR level and includes the exception info automatically, which is valuable for debugging unexpected failures even in a mock server.Proposed fix
except Exception as e: - logger.error(f"❌ Error handling connection from {client_id}: {e}") + logger.exception(f"❌ Error handling connection from {client_id}: {e}")
53-54:durationis tracked but never used.The
durationvariable is incremented on every audio chunk but never referenced anywhere. Either remove it or use it (e.g., in log messages or response data).extras/asr-services/init.py (1)
146-148: Partial executable path in subprocess call.As flagged by Ruff S607,
"nvidia-smi"is resolved viaPATH, which could be exploited if the environment is compromised. This is low-risk in a developer setup script, but for defense-in-depth consider using the full path or documenting the assumption.🛡️ Optional: use absolute path
- ["nvidia-smi"], + ["/usr/bin/nvidia-smi"],Alternatively, use
shutil.which("nvidia-smi")to resolve and validate the path before invocation.tests/Dockerfile.mock-asr (1)
1-19: Consider adding a non-root user and pinning dependency versions.Two improvements for this test Dockerfile:
Root user (Trivy DS002): The container runs as root. While acceptable for a test mock server, adding a non-root user is a good habit and prevents accidental misuse as a base for production images.
Unpinned pip dependencies:
fastapi,uvicorn, andpython-multipartare installed without version constraints. A future breaking change in any of these could silently break the mock server in CI.♻️ Proposed improvements
FROM python:3.12-slim WORKDIR /app # Install dependencies (FastAPI + uvicorn + multipart for file uploads) -RUN pip install --no-cache-dir fastapi uvicorn python-multipart +RUN pip install --no-cache-dir fastapi>=0.115.12 uvicorn>=0.34.2 python-multipart>=0.0.6 # Copy mock server script COPY tests/libs/mock_asr_server.py . # Expose HTTP port EXPOSE 8765 +# Run as non-root user +RUN useradd --create-home appuser +USER appuser + # Provider mode can be set via environment variable # Options: mock, parakeet, vibevoice, deepgram ENV MOCK_ASR_PROVIDER=mock # Run server (provider mode read from env var) CMD ["sh", "-c", "python mock_asr_server.py --host 0.0.0.0 --port 8765 --provider $MOCK_ASR_PROVIDER"]extras/asr-services/providers/vibevoice/impl.md (1)
1-17: This reads like raw conversation notes, not documentation.This file appears to be an unedited transcript or voice note pasted verbatim into the repo. It contains valuable architectural context (VibeVoice combined ASR+diarization vs. Parakeet+pyannote pipeline), but in its current form it has grammatical issues, missing punctuation, and an informal conversational tone that isn't suitable for committed documentation.
Consider either:
- Rewriting it as structured implementation notes (e.g., "Architecture Decision: Combined vs. Separate ASR+Diarization") with clear sections.
- Moving it to an issue or discussion thread and removing it from the codebase.
- At minimum, adding a disclaimer at the top (e.g.,
<!-- Internal brainstorming notes — not user-facing docs -->).extras/asr-services/providers/transformers/__init__.py (1)
1-11: Docstring misleadingly lists VibeVoice as a supported model.Line 5 says "Supports models like VibeVoice-ASR, Whisper variants, and custom fine-tuned models," but the
TransformersServiceclass itself (inservice.py, line ~33) explicitly states: "For VibeVoice-ASR with speaker diarization, use the dedicated vibevoice provider." Consider removing VibeVoice from this docstring to avoid confusion.Proposed fix
-Supports models like VibeVoice-ASR, Whisper variants, and custom fine-tuned models. +Supports Whisper variants and custom fine-tuned models.extras/asr-services/providers/faster_whisper/Dockerfile (1)
19-43: Container runs as root — add a non-root user.Trivy flagged DS002: the container has no
USERdirective, so it runs as root. This is a security best practice for production containers.Proposed fix: add non-root user to the runtime stage
# Copy virtual environment and application COPY --from=builder /app /app COPY common/ ./common/ COPY providers/faster_whisper/ ./providers/faster_whisper/ +RUN useradd --create-home --shell /bin/bash appuser +USER appuser + ENV PATH="/app/.venv/bin:$PATH"tests/asr/error_handling_tests.robot (1)
25-29: Redundant status code assertion.
POST On Sessionwithexpected_status=422already fails the test if the status doesn't match, making theShould Be Equal As Integerson line 29 redundant. The same pattern repeats in the 404 test (line 38) and 405 test (line 67). Not a bug, but it's unnecessary noise.extras/asr-services/providers/faster_whisper/service.py (1)
108-128: Environment mutation viaos.environinmain()is a code smell.Line 118 sets
os.environ["ASR_MODEL"]from the CLI arg, then line 121 reads it back. This indirect flow is fragile—if theFasterWhisperTranscriberconstructor reads the env var beforemain()sets it, the value could be stale. Consider passingmodel_iddirectly without the env mutation.♻️ Proposed fix
def main(): """Main entry point for faster-whisper service.""" parser = argparse.ArgumentParser(description="Faster-Whisper ASR Service") parser.add_argument("--host", default="0.0.0.0", help="Host to bind to") parser.add_argument("--port", type=int, default=8765, help="Port to bind to") parser.add_argument("--model", help="Model identifier", required=False) args = parser.parse_args() - # Set model via environment if provided - if args.model: - os.environ["ASR_MODEL"] = args.model - # Get model ID - model_id = os.getenv("ASR_MODEL", "Systran/faster-whisper-large-v3") + model_id = args.model or os.getenv("ASR_MODEL", "Systran/faster-whisper-large-v3") # Create service and app service = FasterWhisperService(model_id)extras/asr-services/providers/transformers/transcriber.py (2)
164-173: Only a single segment is produced regardless of audio length.When
textis non-empty, lines 165–173 create exactly oneSegmentspanning the entire transcription. This is a consequence of the word-level timestamp mode not providing segment boundaries. Downstream consumers expecting per-sentence or per-utterance segments (e.g., for subtitle generation) will get a single monolithic segment.This is worth documenting — either in the docstring or as a comment — so consumers of the
TransformersTranscriberunderstand this limitation.
92-103: Inconsistentdevicetype passed topipeline().Line 102 passes
self.device(a string like"cuda") when device is not CPU, but passes-1(an integer) for CPU. Moderntransformerspipeline()acceptsdevice="cpu"directly; mixing string and integer types is unnecessarily inconsistent.♻️ Proposed fix
self.pipeline = pipeline( "automatic-speech-recognition", model=self.model, tokenizer=self.processor.tokenizer, feature_extractor=self.processor.feature_extractor, torch_dtype=self.torch_dtype, - device=self.device if self.device != "cpu" else -1, + device=self.device, )extras/asr-services/providers/vibevoice/transcriber.py (2)
132-147: Chain the exception withraise ... from efor better tracebacks.When re-raising as
RuntimeErroron line 144, chain it from the originalImportErrorto preserve the full traceback context. Also,logging.exceptionis preferred overlogging.errorin except blocks (Ruff B904, TRY400).♻️ Proposed fix
except ImportError as e: - logger.error(f"Failed to import VibeVoice modules: {e}") + logger.exception("Failed to import VibeVoice modules") raise RuntimeError( f"Failed to import VibeVoice modules. " f"Ensure the VibeVoice repository is properly cloned. Error: {e}" - ) + ) from e
156-164:trust_remote_code=Trueis necessary for VibeVoice but warrants a security comment.Line 163 enables execution of arbitrary code from the model repository. This is required for VibeVoice's custom model classes, but should be documented with a comment explaining why it's needed and accepted.
self.model = VibeVoiceASRForConditionalGeneration.from_pretrained( self.model_id, torch_dtype=self.torch_dtype, device_map="auto" if self.device == "cuda" else None, attn_implementation=self.attn_impl, - trust_remote_code=True, + trust_remote_code=True, # Required: VibeVoice uses custom model architecture )extras/asr-services/providers/nemo/transcriber.py (1)
116-121: Hardcodedconfidence=1.0is misleading when NeMo provides no confidence scores.NeMo's Parakeet models don't produce per-word confidence. Setting
confidence=1.0implies perfect confidence, which is inaccurate. TheWordmodel acceptsNone(seeresponse_models.py), and the Transformers provider correctly usesconfidence=Nonewhen unavailable. Consider aligning to the same convention.♻️ Proposed fix
word = Word( word=word_data["word"], start=word_data["start"], end=word_data["end"], - confidence=1.0, + confidence=None, )extras/asr-services/providers/nemo/service.py (2)
47-54: Useasyncio.get_running_loop()instead of deprecatedget_event_loop().
asyncio.get_event_loop()is deprecated since Python 3.10 when there's no running loop, and emits a DeprecationWarning. Sincewarmupis called from an async context (FastAPI startup),asyncio.get_running_loop()is the correct replacement.Same applies to
TransformersService.warmupandTransformersService.transcribeinproviders/transformers/service.py.♻️ Proposed fix
- loop = asyncio.get_event_loop() + loop = asyncio.get_running_loop()
82-87: Capability "chunked_processing" may be misleading.The NeMo transcriber docstring states "no custom chunking required" and the code does not implement chunking. Advertising
"chunked_processing"as a capability could confuse consumers of the/infoendpoint. Consider removing it or renaming it to something like"long_audio"that reflects NeMo's native long-audio support.extras/asr-services/docker-compose.yml (1)
28-29: All services bind to the same default host port —docker compose upwill fail.All four new services default to
${ASR_PORT:-8767}:8765. Runningdocker compose upwithout specifying a service name will attempt to start all of them, causing port-binding failures. While the header comment says "run ONE provider at a time," this is easy to miss.Consider using Docker Compose profiles to prevent accidental multi-service startup:
♻️ Example: add profiles to each provider
nemo-asr: + profiles: ["nemo"] build: ... faster-whisper-asr: + profiles: ["faster-whisper"] build: ... vibevoice-asr: + profiles: ["vibevoice"] build: ... transformers-asr: + profiles: ["transformers"] build: ...Then start with:
docker compose --profile nemo up -dAlso applies to: 62-63, 94-95, 125-126
extras/asr-services/common/base_service.py (2)
26-99: Consider adding a setter or method for_is_readyinstead of mutating it externally.Line 123 in
create_asr_appdirectly setsservice._is_ready = True, breaking the encapsulation of the private attribute. A simple public method would keep the invariant within the class.♻️ Suggested refactor
Add to
BaseASRService:def mark_ready(self) -> None: """Mark the service as ready to handle requests.""" self._is_ready = TrueThen in
create_asr_app:await service.warmup() - service._is_ready = True + service.mark_ready()
145-190:/transcribeendpoint returnsJSONResponsemanually instead of usingresponse_model.The
/healthand/infoendpoints declareresponse_modeland return Pydantic model instances directly, but/transcribemanually callsresult.to_dict()and wraps it inJSONResponse. This skips Pydantic's response validation and creates inconsistency. You could declareresponse_model=TranscriptionResultand return the model instance directly (FastAPI will handle serialization, includingexclude_none).Alternatively, if you need the custom
to_dictexclusion behavior, this is fine — but the inconsistency is worth noting.tests/asr/gpu_integration_tests.robot (1)
79-93: Consider caching the transcription response across test cases to reduce GPU inference time.Three separate test cases (
GPU ASR Transcription Returns Text,GPU ASR Transcription Has Word Timestamps,GPU ASR Transcription Duration Reasonable) each independently upload and transcribe the same audio file. With real GPU inference, each call could take significant time.You could transcribe once in a suite-level keyword (or a shared test setup) and store the result in a suite variable, then validate different aspects in each test case. This would cut GPU inference from 3× to 1× for these tests.
That said, independent tests are more robust for isolation, so this is a trade-off.
Also applies to: 95-125, 127-149
extras/asr-services/common/response_models.py (2)
63-76:to_dict()includesconfidence: nullfor words but excludes top-level None fields.
model_dump()onWordobjects (Line 67) will include"confidence": nullin the output when no confidence is available, while top-level optional fields (speakers,language,duration) are explicitly excluded whenNone. This inconsistency could surprise API consumers.If you want consistent None-exclusion, use
model_dump(exclude_none=True)for nested models too:- "words": [w.model_dump() for w in self.words], - "segments": [s.model_dump() for s in self.segments], + "words": [w.model_dump(exclude_none=True) for w in self.words], + "segments": [s.model_dump(exclude_none=True) for s in self.segments],Similarly for
speakerson Line 71. Ifconfidence: nullis intentionally part of the contract, this is fine as-is.
79-98:HealthResponseandInfoResponselook good and align with the mock server's shapes.Note: The mock server in
tests/libs/mock_asr_server.pyredefines these models withoutFielddescriptors and with slightly different typing (list[str]vsList[str]). While functionally equivalent, having the mock import from this module (or use a shared contract) would prevent future drift.extras/asr-services/common/audio_utils.py (1)
22-47:sample_rateandchannelsparameters are accepted but unused.
convert_audio_to_numpytakessample_rateandchannelsarguments but never references them in the function body. This misleads callers into thinking these parameters affect the conversion. Sinceload_audio_filehandles mono conversion separately viaconvert_to_mono, either remove these unused params or document that they're present for future use / interface uniformity.♻️ Option A: Remove unused params
def convert_audio_to_numpy( audio_bytes: bytes, - sample_rate: int, sample_width: int = 2, - channels: int = 1, ) -> np.ndarray:Then update the call site in
load_audio_file(Line 149):audio_array = convert_audio_to_numpy( - audio_bytes, sample_rate, sample_width, channels + audio_bytes, sample_width=sample_width )tests/libs/mock_asr_server.py (1)
103-105: Silent fallback to "mock" for unknown providers.
get_provider_config()silently returns the"mock"config when an unrecognized provider is specified. This could mask configuration errors. Consider logging a warning when falling back.♻️ Proposed fix
def get_provider_config(): """Get configuration for the current provider mode.""" - return PROVIDER_CONFIGS.get(PROVIDER_MODE, PROVIDER_CONFIGS["mock"]) + config = PROVIDER_CONFIGS.get(PROVIDER_MODE) + if config is None: + logger.warning(f"Unknown provider '{PROVIDER_MODE}', falling back to 'mock'") + config = PROVIDER_CONFIGS["mock"] + return config
| match = re.search(r'CUDA Version:\s*(\d+)\.(\d+)', output) | ||
| if match: | ||
| major, minor = match.groups() | ||
| cuda_ver = f"{major}.{minor}" | ||
| if cuda_ver >= "12.8": | ||
| return "cu128" | ||
| elif cuda_ver >= "12.6": | ||
| return "cu126" | ||
| elif cuda_ver >= "12.1": | ||
| return "cu121" | ||
| except (subprocess.SubprocessError, FileNotFoundError): | ||
| pass | ||
| return "cu126" # Default fallback to cu126 | ||
| return "cu126" |
There was a problem hiding this comment.
Bug: String comparison for CUDA version produces incorrect results for multi-digit minor versions.
cuda_ver is a string like "12.8", and >= performs lexicographic comparison. This works by coincidence for the current values but will break for versions like "12.10" (since "12.10" < "12.8" lexicographically). Convert to a numeric tuple for correct ordering.
🐛 Proposed fix
if match:
major, minor = match.groups()
- cuda_ver = f"{major}.{minor}"
- if cuda_ver >= "12.8":
+ cuda_ver = (int(major), int(minor))
+ if cuda_ver >= (12, 8):
return "cu128"
- elif cuda_ver >= "12.6":
+ elif cuda_ver >= (12, 6):
return "cu126"
- elif cuda_ver >= "12.1":
+ elif cuda_ver >= (12, 1):
return "cu121"🤖 Prompt for AI Agents
In `@extras/asr-services/init.py` around lines 155 - 167, The current comparison
uses string `cuda_ver` (e.g., "12.10") and lexicographic comparisons which are
incorrect for multi-digit minors; instead convert `major` and `minor` to
integers (e.g., major_i, minor_i) and compare numeric tuples like (major_i,
minor_i) against (12,8), (12,6), (12,1) in the same branching where `match =
re.search(...)` and `cuda_ver` is built; replace the string comparisons (`if
cuda_ver >= "12.8": ...`) with tuple comparisons (e.g., `if (major_i, minor_i)
>= (12, 8): return "cu128"`, etc.) and keep the same fallback behavior and
exception handling.
| def setup_provider_config(self, provider: str, model: str): | ||
| """Configure provider-specific settings""" | ||
| self.print_section("Provider Configuration") | ||
|
|
||
| self.config["ASR_PROVIDER"] = provider | ||
| self.config["ASR_MODEL"] = model | ||
| self.config["ASR_PORT"] = "8767" | ||
|
|
||
| if provider == "faster-whisper": | ||
| self.config["COMPUTE_TYPE"] = "float16" | ||
| self.config["DEVICE"] = "cuda" | ||
| self.config["VAD_FILTER"] = "true" | ||
|
|
||
| # Ask about language | ||
| if Confirm.ask("Force specific language?", default=False): | ||
| lang = self.prompt_value("Language code (e.g., en, hi, es)", default="") | ||
| if lang: | ||
| self.config["LANGUAGE"] = lang | ||
|
|
||
| elif provider == "vibevoice": | ||
| # VibeVoice uses transformers backend with specific optimizations | ||
| self.config["TORCH_DTYPE"] = "float16" | ||
| self.config["DEVICE"] = "cuda" | ||
| self.config["USE_FLASH_ATTENTION"] = "true" | ||
| self.console.print("[blue][INFO][/blue] Enabled Flash Attention for VibeVoice") | ||
| self.console.print("[blue][INFO][/blue] VibeVoice provides built-in speaker diarization (no pyannote needed)") | ||
|
|
||
| elif provider == "transformers": | ||
| self.config["TORCH_DTYPE"] = "float16" | ||
| self.config["DEVICE"] = "cuda" | ||
| self.config["USE_FLASH_ATTENTION"] = "false" | ||
|
|
||
| # Hindi model-specific | ||
| if "hindi" in model.lower(): | ||
| self.config["LANGUAGE"] = "hi" | ||
| self.console.print("[blue][INFO][/blue] Set language to Hindi for Hindi Whisper model") | ||
|
|
||
| elif provider == "nemo": | ||
| # NeMo's transcribe() handles long audio natively - no extra config needed | ||
| pass |
There was a problem hiding this comment.
vibevoice and faster-whisper set DEVICE=cuda unconditionally — no macOS/CPU fallback.
setup_cuda_version (called for nemo/transformers) handles macOS detection and sets cpu, but the provider-specific config for vibevoice, faster-whisper, and transformers hardcodes DEVICE = "cuda". On a system without a GPU (or on macOS), this will produce an unusable configuration.
🐛 Proposed fix: use detected device
def setup_provider_config(self, provider: str, model: str):
"""Configure provider-specific settings"""
self.print_section("Provider Configuration")
self.config["ASR_PROVIDER"] = provider
self.config["ASR_MODEL"] = model
self.config["ASR_PORT"] = "8767"
+ is_macos = platform.system() == 'Darwin'
+ default_device = "cpu" if is_macos else "cuda"
+
if provider == "faster-whisper":
self.config["COMPUTE_TYPE"] = "float16"
- self.config["DEVICE"] = "cuda"
+ self.config["DEVICE"] = default_device
self.config["VAD_FILTER"] = "true"Apply the same pattern for vibevoice and transformers.
🤖 Prompt for AI Agents
In `@extras/asr-services/init.py` around lines 290 - 329, The provider-specific
configs in setup_provider_config hardcode DEVICE="cuda" for "faster-whisper",
"vibevoice", and "transformers" which ignores the detected device (macOS/CPU)
set by setup_cuda_version; change these branches to use the detected device
instead of "cuda" (e.g., set self.config["DEVICE"] = self.config.get("DEVICE")
or call/ensure setup_cuda_version() was run and then assign
self.config["DEVICE"] accordingly) and remove the unconditional "cuda" literal
so providers fall back to CPU/macOS when no GPU is present.
| async def warmup(self) -> None: | ||
| """Initialize and warm up the model.""" | ||
| logger.info(f"Initializing faster-whisper with model: {self.model_id}") | ||
|
|
||
| # Load model (runs in thread pool to not block) | ||
| loop = asyncio.get_event_loop() | ||
| self.transcriber = FasterWhisperTranscriber(self.model_id) | ||
| await loop.run_in_executor(None, self.transcriber.load_model) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all uses of get_event_loop in the extras/asr-services directory
rg -n 'get_event_loop' extras/asr-services/Repository: SimpleOpenSoftware/chronicle
Length of output: 1541
🏁 Script executed:
#!/bin/bash
# Check Python version requirements
find . -type f \( -name 'requirements*.txt' -o -name 'setup.py' -o -name 'pyproject.toml' -o -name '.python-version' \) | head -20Repository: SimpleOpenSoftware/chronicle
Length of output: 820
🏁 Script executed:
#!/bin/bash
# Get the context around get_event_loop calls - show the method they're in
rg -B 5 -A 2 'get_event_loop' extras/asr-services/Repository: SimpleOpenSoftware/chronicle
Length of output: 10341
🏁 Script executed:
#!/bin/bash
# Check Python version requirement
cat extras/asr-services/pyproject.toml | grep -A 5 "python"Repository: SimpleOpenSoftware/chronicle
Length of output: 508
🏁 Script executed:
#!/bin/bash
# Check the context of client.py to confirm it's all async
head -100 extras/asr-services/client.py | grep -E "^async def|^def|^class"Repository: SimpleOpenSoftware/chronicle
Length of output: 308
🌐 Web query:
asyncio.get_event_loop() deprecated Python 3.10 get_running_loop()
💡 Result:
asyncio.get_event_loop() itself isn’t fully removed in Python 3.10+, but calling it when there is no running loop is what triggers the DeprecationWarning (“there is no current event loop …”). The docs recommend asyncio.get_running_loop() inside async code, and asyncio.run() (or explicit loop creation) from sync code. In a future Python release, this “no running loop” case is expected to become an error, and get_event_loop() is intended to behave like get_running_loop() (i.e., raise RuntimeError when no loop is running). [1][2]
What to use instead
1) Inside a coroutine / callback (async context)
Use get_running_loop() (guarantees you’re on an active loop):
import asyncio
async def main():
loop = asyncio.get_running_loop()
# use loop...
asyncio.run(main())[1]
2) From synchronous “top-level” code (starting your program)
Prefer asyncio.run():
import asyncio
async def main():
...
if __name__ == "__main__":
asyncio.run(main())[1]
3) If you truly need manual loop management (advanced / special cases)
Create and set a loop explicitly:
import asyncio
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
loop.run_until_complete(main())
finally:
loop.close()(Using new_event_loop() avoids relying on the deprecated “implicit loop creation” behavior of get_event_loop().) [1][2]
Why the warning appears in 3.10+
Python tightened behavior around “no running loop”: in 3.10+ get_event_loop() can emit a DeprecationWarning when there’s no running loop (with details varying by patch releases), and “in some future Python release this will become an error.” [1]
Sources
[1] Python docs: asyncio — Event Loop (get_running_loop, get_event_loop, deprecation warning notes)
[2] CPython issue discussion referencing asyncio docs + intended future behavior (aliasing/behavior change)
Use asyncio.get_running_loop() instead of asyncio.get_event_loop() across the codebase.
asyncio.get_event_loop() is deprecated since Python 3.10 and triggers a DeprecationWarning when called without a running loop. Since this project requires Python ≥ 3.10, all users are affected. The function will become an error in future Python releases. Replace all instances with get_running_loop(), which is the correct call for async methods and functions that are guaranteed to run within an event loop.
This issue appears in:
- Service files (in
warmup()andtranscribe()methods): faster_whisper, vibevoice, transformers, nemo - client.py (timing calls in async functions like
run_mic_transcription())
All 15 occurrences should be updated consistently.
♻️ Example fix pattern
async def warmup(self) -> None:
"""Initialize and warm up the model."""
logger.info(f"Initializing faster-whisper with model: {self.model_id}")
# Load model (runs in thread pool to not block)
- loop = asyncio.get_event_loop()
+ loop = asyncio.get_running_loop()
self.transcriber = FasterWhisperTranscriber(self.model_id)
await loop.run_in_executor(None, self.transcriber.load_model)For timing calls in async context:
- recording_start_time = asyncio.get_event_loop().time()
+ recording_start_time = asyncio.get_running_loop().time()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def warmup(self) -> None: | |
| """Initialize and warm up the model.""" | |
| logger.info(f"Initializing faster-whisper with model: {self.model_id}") | |
| # Load model (runs in thread pool to not block) | |
| loop = asyncio.get_event_loop() | |
| self.transcriber = FasterWhisperTranscriber(self.model_id) | |
| await loop.run_in_executor(None, self.transcriber.load_model) | |
| async def warmup(self) -> None: | |
| """Initialize and warm up the model.""" | |
| logger.info(f"Initializing faster-whisper with model: {self.model_id}") | |
| # Load model (runs in thread pool to not block) | |
| loop = asyncio.get_running_loop() | |
| self.transcriber = FasterWhisperTranscriber(self.model_id) | |
| await loop.run_in_executor(None, self.transcriber.load_model) |
🤖 Prompt for AI Agents
In `@extras/asr-services/providers/faster_whisper/service.py` around lines 48 -
55, Replace deprecated asyncio.get_event_loop() calls with
asyncio.get_running_loop() in async contexts: update the warmup() method in
faster_whisper.service (where loop = asyncio.get_event_loop() is used before
await loop.run_in_executor(...) and FasterWhisperTranscriber.load_model is
invoked), any transcribe() async methods in service classes, and async timing
functions like run_mic_transcription() in client.py; ensure you call
asyncio.get_running_loop() and keep using loop.run_in_executor(...) for
offloading blocking work so the code runs without DeprecationWarnings or future
errors.
| async with self._lock: | ||
| with torch.no_grad(): | ||
| results = self.model.transcribe( | ||
| [audio_file_path], batch_size=1, timestamps=True | ||
| ) |
There was a problem hiding this comment.
Blocking GPU inference on the asyncio event loop.
self.model.transcribe() is a synchronous, CPU/GPU-bound NeMo call. Wrapping it in async with self._lock does not offload it — it blocks the entire event loop during inference, making the FastAPI service unresponsive (health checks, concurrent requests all stall).
Compare with TransformersService.transcribe (in providers/transformers/service.py, lines 88-96) which correctly uses loop.run_in_executor(None, lambda: self.transcriber.transcribe(...)).
Either make transcribe synchronous and have the caller (NemoService) wrap it in run_in_executor, or offload the blocking call here with asyncio.get_event_loop().run_in_executor().
🐛 Proposed fix: offload blocking inference to a thread
async def transcribe(self, audio_file_path: str) -> TranscriptionResult:
...
if not self._is_loaded or self.model is None:
raise RuntimeError("Model not loaded. Call load_model() first.")
logger.info(f"Transcribing: {audio_file_path}")
+ loop = asyncio.get_event_loop()
async with self._lock:
- with torch.no_grad():
- results = self.model.transcribe(
- [audio_file_path], batch_size=1, timestamps=True
- )
+ results = await loop.run_in_executor(
+ None,
+ lambda: self._transcribe_sync(audio_file_path),
+ )
...
+
+ def _transcribe_sync(self, audio_file_path: str):
+ """Run NeMo transcription synchronously (intended for executor)."""
+ with torch.no_grad():
+ return self.model.transcribe(
+ [audio_file_path], batch_size=1, timestamps=True
+ )🤖 Prompt for AI Agents
In `@extras/asr-services/providers/nemo/transcriber.py` around lines 92 - 96, The
current async transcribe method in transcriber.py is calling the blocking NeMo
call self.model.transcribe(...) inside async with self._lock, which blocks the
event loop; change it to offload the blocking call to a thread using
asyncio.get_event_loop().run_in_executor (or loop.run_in_executor) so the event
loop stays responsive. Keep the async lock for concurrency control but use await
loop.run_in_executor(None, lambda: self.model.transcribe([audio_file_path],
batch_size=1, timestamps=True)) (or make transcribe synchronous and let
NemoService call run_in_executor); ensure the torch.no_grad() context and any
GPU device setup are executed inside the executor task so GPU/CPU work does not
run on the event loop.
| FROM python:3.12-slim-bookworm AS runtime | ||
| ENV PYTHONUNBUFFERED=1 | ||
| WORKDIR /app | ||
|
|
||
| # Install runtime dependencies | ||
| RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
| curl ffmpeg git \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Copy virtual environment and application | ||
| COPY --from=builder /app /app | ||
| COPY common/ ./common/ | ||
| COPY providers/transformers/ ./providers/transformers/ | ||
|
|
||
| ENV PATH="/app/.venv/bin:$PATH" | ||
|
|
||
| # Default environment variables | ||
| ENV ASR_MODEL="openai/whisper-large-v3" | ||
| ENV USE_FLASH_ATTENTION="false" | ||
| ENV DEVICE="cuda" | ||
| ENV TORCH_DTYPE="float16" | ||
|
|
||
| EXPOSE 8765 | ||
|
|
||
| CMD ["python", "-m", "providers.transformers.service", "--port", "8765"] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all Dockerfiles in asr-services providers
echo "=== All provider Dockerfiles ==="
fd Dockerfile extras/asr-services/providers/ 2>/dev/null || echo "No Dockerfiles found"
echo ""
echo "=== Checking for USER directive ==="
fd Dockerfile extras/asr-services/providers/ -x sh -c '
echo "File: {}"
if grep -q "^USER" {}; then
echo " ✓ Has USER directive"
grep "^USER" {}
else
echo " ✗ Missing USER directive"
fi
echo ""
' 2>/dev/nullRepository: SimpleOpenSoftware/chronicle
Length of output: 691
🏁 Script executed:
#!/bin/bash
cat -n extras/asr-services/providers/transformers/DockerfileRepository: SimpleOpenSoftware/chronicle
Length of output: 1605
🏁 Script executed:
#!/bin/bash
# Check service code for git dependencies or runtime model downloads
echo "=== Service entry point ==="
cat -n extras/asr-services/providers/transformers/service.py 2>/dev/null | head -50
echo ""
echo "=== Check for git-related imports or huggingface_hub clone usage ==="
rg -l "subprocess|git|GIT_CLONE|clone_repo" extras/asr-services/providers/transformers/ 2>/dev/null || echo "No direct git usage found"
echo ""
echo "=== Check pyproject.toml for huggingface_hub or transformers extras ==="
rg "huggingface|git|transformers" pyproject.toml 2>/dev/null | head -20Repository: SimpleOpenSoftware/chronicle
Length of output: 2031
🏁 Script executed:
#!/bin/bash
echo "=== Transcriber implementation ==="
cat -n extras/asr-services/providers/transformers/transcriber.py 2>/dev/null | head -80
echo ""
echo "=== Full pyproject.toml ==="
cat pyproject.toml 2>/dev/null | grep -A 50 "tool.uv"Repository: SimpleOpenSoftware/chronicle
Length of output: 3322
🏁 Script executed:
#!/bin/bash
echo "=== Check pyproject.toml dependencies ==="
cat pyproject.toml | grep -E "(transformers|huggingface|torch|git)" -A 1 -B 1
echo ""
echo "=== Check if HuggingFace models require git in transformers provider ==="
# transformers library uses HTTP by default, not git
# Check if any model in the provider would need git (e.g., custom git-based models)
rg "git|clone|subprocess" extras/asr-services/providers/transformers/ --type pyRepository: SimpleOpenSoftware/chronicle
Length of output: 251
🏁 Script executed:
#!/bin/bash
# Find pyproject.toml
find . -name "pyproject.toml" -type f 2>/dev/null
echo ""
echo "=== Check if git is actually needed in runtime ==="
# HuggingFace Transformers uses HTTP downloads by default, not git
# Let's verify the model loading code more thoroughly
cat -n extras/asr-services/providers/transformers/transcriber.py | sed -n '65,150p'Repository: SimpleOpenSoftware/chronicle
Length of output: 3737
Add a non-root USER to the runtime stage and remove unnecessary git dependency.
Running the container as root is flagged by Trivy (DS002) and violates security best practices. Additionally, git is installed in the runtime stage (line 25) but is unnecessary—HuggingFace Transformers downloads models via HTTP by default using from_pretrained(), not git clone.
This issue is systemic across all provider Dockerfiles: transformers, nemo, faster_whisper, and vibevoice all lack the USER directive. Consider fixing consistently across all of them.
🛡️ Proposed fix
# Copy virtual environment and application
COPY --from=builder /app /app
COPY common/ ./common/
COPY providers/transformers/ ./providers/transformers/
ENV PATH="/app/.venv/bin:$PATH"
+# Run as non-root user
+RUN useradd --create-home appuser
+USER appuser
+
# Default environment variables
ENV ASR_MODEL="openai/whisper-large-v3"And simplify line 24–26:
RUN apt-get update && apt-get install -y --no-install-recommends \
- curl ffmpeg git \
+ curl ffmpeg \
&& rm -rf /var/lib/apt/lists/*🤖 Prompt for AI Agents
In `@extras/asr-services/providers/transformers/Dockerfile` around lines 19 - 43,
The runtime Dockerfile is running as root and installs an unnecessary git
package; create and switch to a non-root user in the runtime stage (e.g., add a
user/group like "app" and use USER app) and remove "git" from the apt-get
install list in the runtime stage where curl and ffmpeg are installed; ensure
the created user's home and /app ownership match ENV PATH usage and any copied
files so the app can run without root, and apply the same USER + removal of git
change consistently in the other provider Dockerfiles (transformers, nemo,
faster_whisper, vibevoice).
| transformers = [ | ||
| "transformers>=4.40.0", | ||
| "accelerate>=0.30.0", | ||
| "diffusers>=0.30.0", | ||
| "torch>=2.3", | ||
| "torchaudio>=2.3", | ||
| ] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for diffusers imports in Python files within extras/asr-services
find extras/asr-services -type f -name "*.py" | head -20Repository: SimpleOpenSoftware/chronicle
Length of output: 1008
🏁 Script executed:
# Search for any reference to diffusers in the entire extras/asr-services directory
rg 'diffusers' extras/asr-services/ -iRepository: SimpleOpenSoftware/chronicle
Length of output: 1155
🏁 Script executed:
# Get the structure of the asr-services directory
tree -L 3 extras/asr-services/ 2>/dev/null || find extras/asr-services -type f -name "*.py" | head -20Repository: SimpleOpenSoftware/chronicle
Length of output: 1240
🏁 Script executed:
# Check transformers provider
cat -n extras/asr-services/providers/transformers/transcriber.py | head -30Repository: SimpleOpenSoftware/chronicle
Length of output: 1179
🏁 Script executed:
# Check vibevoice provider
cat -n extras/asr-services/providers/vibevoice/service.py | head -30Repository: SimpleOpenSoftware/chronicle
Length of output: 1015
🏁 Script executed:
# Check vibevoice transcriber implementation
cat -n extras/asr-services/providers/vibevoice/transcriber.py | head -50Repository: SimpleOpenSoftware/chronicle
Length of output: 2131
🏁 Script executed:
# Search for actual imports in the transformers provider
grep -n "^import\|^from" extras/asr-services/providers/transformers/transcriber.pyRepository: SimpleOpenSoftware/chronicle
Length of output: 220
🏁 Script executed:
# Search for actual imports in the vibevoice provider
grep -n "^import\|^from" extras/asr-services/providers/vibevoice/transcriber.pyRepository: SimpleOpenSoftware/chronicle
Length of output: 316
🏁 Script executed:
# Check full transformers transcriber for any diffusers usage
cat extras/asr-services/providers/transformers/transcriber.py | grep -i "diffusers\|import"Repository: SimpleOpenSoftware/chronicle
Length of output: 291
🏁 Script executed:
# Check full vibevoice transcriber for any diffusers usage
cat extras/asr-services/providers/vibevoice/transcriber.py | grep -i "diffusers\|import"Repository: SimpleOpenSoftware/chronicle
Length of output: 785
🏁 Script executed:
# Check if transformers library depends on diffusers by looking at PyPI
web_search "does transformers library depend on diffusers as transitive dependency"Repository: SimpleOpenSoftware/chronicle
Length of output: 122
🌐 Web query:
transformers package dependencies does it include diffusers
💡 Result:
No—installing transformers does not install diffusers as a required dependency.
- In
transformers,diffusersis not part of the coreinstall_requires(required dependencies). (fossies.org) diffusersonly shows up under an optional extra (e.g., an “agents” extra in the setup configuration), meaning you only get it if you explicitly install that extra. (fossies.org)- The PyPI project metadata for
transformerslists manyProvides-Extraoptions, butdiffusersis not a base requirement. (pypi.org)
Citations:
- 1: https://fossies.org/diffs/transformers/4.51.3_vs_4.52.0/setup.py-diff.html?utm_source=openai
- 2: https://fossies.org/diffs/transformers/4.51.3_vs_4.52.0/setup.py-diff.html?utm_source=openai
- 3: https://pypi.org/project/transformers/?utm_source=openai
Remove diffusers>=0.30.0 from both the transformers and vibevoice dependency groups—it is unused.
diffusers is not imported or used anywhere in the transformers or vibevoice provider implementations. The transformers library does not list diffusers as a required or transitive dependency (it only appears in optional extras). Removing this from both groups (lines 100 and 114) will eliminate unnecessary bloat from the installation.
🤖 Prompt for AI Agents
In `@extras/asr-services/pyproject.toml` around lines 97 - 103, Remove the unused
"diffusers>=0.30.0" entry from both dependency groups in pyproject.toml: delete
the "diffusers>=0.30.0" line under the transformers array and likewise remove it
from the vibevoice dependency group so neither the transformers nor vibevoice
lists include diffusers.
| ASR Transcribe Response Words Have Timestamps | ||
| [Documentation] Verify word-level timestamps in transcription response | ||
| [Tags] audio-upload | ||
|
|
||
| ${response}= Upload Audio For ASR Transcription ${TEST_AUDIO_FILE} | ||
| ${json}= Set Variable ${response.json()} | ||
|
|
||
| # Verify words list is not empty | ||
| Should Not Be Empty ${json}[words] | ||
|
|
||
| # Verify first word has required timestamp fields | ||
| ${first_word}= Get From List ${json}[words] 0 | ||
| Dictionary Should Contain Key ${first_word} word | ||
| Dictionary Should Contain Key ${first_word} start | ||
| Dictionary Should Contain Key ${first_word} end | ||
|
|
||
| # Verify timestamps are numeric and ordered correctly | ||
| Should Be True isinstance($first_word['start'], (int, float)) | ||
| Should Be True isinstance($first_word['end'], (int, float)) | ||
| Should Be True ${first_word}[start] < ${first_word}[end] | ||
| Should Be True ${first_word}[start] >= 0 | ||
|
|
||
| ASR Transcribe Response Words Have Confidence Scores | ||
| [Documentation] Verify confidence scores in word-level transcription | ||
| [Tags] audio-upload | ||
|
|
||
| ${response}= Upload Audio For ASR Transcription ${TEST_AUDIO_FILE} | ||
| ${json}= Set Variable ${response.json()} | ||
|
|
||
| # Verify first word has confidence | ||
| ${first_word}= Get From List ${json}[words] 0 | ||
| Dictionary Should Contain Key ${first_word} confidence | ||
|
|
||
| # Verify confidence is between 0 and 1 | ||
| Should Be True isinstance($first_word['confidence'], (int, float)) | ||
| Should Be True ${first_word}[confidence] >= 0 | ||
| Should Be True ${first_word}[confidence] <= 1 |
There was a problem hiding this comment.
Word-level tests will fail for providers without word_timestamps capability (e.g., VibeVoice).
Tests "ASR Transcribe Response Words Have Timestamps" (line 90) and "ASR Transcribe Response Words Have Confidence Scores" (line 112) unconditionally assert that the words list is non-empty. However, VibeVoice explicitly does not provide word-level timestamps. The segment and diarization tests (lines 176–224) correctly guard on capability first — these should follow the same pattern.
Proposed fix for the timestamps test (apply similarly to confidence test)
ASR Transcribe Response Words Have Timestamps
[Documentation] Verify word-level timestamps in transcription response
[Tags] audio-upload
+ ${info}= Get ASR Service Info ${ASR_URL}
+ ${has_word_timestamps}= Evaluate 'word_timestamps' in $info['capabilities']
+ Skip If not ${has_word_timestamps} Provider does not have word_timestamps capability
+
${response}= Upload Audio For ASR Transcription ${TEST_AUDIO_FILE}
${json}= Set Variable ${response.json()}🤖 Prompt for AI Agents
In `@tests/asr/protocol_tests.robot` around lines 90 - 126, The tests
unconditionally assert on ${json}[words] but must skip when the provider lacks
the word_timestamps capability; add a guard at the start of both test cases (ASR
Transcribe Response Words Have Timestamps and ASR Transcribe Response Words Have
Confidence Scores) that checks the provider capability (e.g., inspect
${capabilities}[word_timestamps] or call an existing capability-check keyword)
and uses "Skip Test" or "Run Keyword Unless" to skip the test when
word_timestamps is false, then keep the existing assertions on ${json}[words],
${first_word}, start/end and confidence as-is; apply the same conditional
pattern used by the segment/diarization tests to both test cases.
| ASR Capabilities Are From Known Set | ||
| [Documentation] Verify reported capabilities are valid known capabilities | ||
| ... Known capabilities: timestamps, word_timestamps, segments, diarization | ||
| [Tags] infra | ||
|
|
||
| ${info}= Get ASR Service Info ${ASR_URL} | ||
|
|
||
| # Define known capabilities | ||
| @{known_caps}= Create List timestamps word_timestamps segments diarization | ||
|
|
||
| # All reported capabilities should be known | ||
| FOR ${cap} IN @{info}[capabilities] | ||
| ${is_known}= Evaluate $cap in $known_caps | ||
| Should Be True ${is_known} | ||
| ... Unknown capability reported: ${cap}. Known: ${known_caps} | ||
| END |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find service.py file
find . -name "service.py" -o -name "*service*.py" | head -20Repository: SimpleOpenSoftware/chronicle
Length of output: 1088
🏁 Script executed:
#!/bin/bash
# Search for get_capabilities method implementations across the codebase
rg -n "def get_capabilities" --type=py -A 10Repository: SimpleOpenSoftware/chronicle
Length of output: 4465
🏁 Script executed:
#!/bin/bash
# Also search for classes that might implement ASR services
rg -n "class.*Service" --type=py | grep -i "asr\|whisper\|transformer"Repository: SimpleOpenSoftware/chronicle
Length of output: 663
Known capabilities set is incomplete — will reject valid provider capabilities.
The known set on line 167 is [timestamps, word_timestamps, segments, diarization], but the providers return additional capabilities:
- FasterWhisperService returns:
language_detection,vad_filter,translation - TransformersService returns:
language_detection - NemoService returns:
chunked_processing - VibeVoiceService returns:
speaker_identification,long_form
The test will fail for any of these providers unless the known capabilities list includes all their supported capabilities.
Proposed fix: extend the known capabilities list
# Define known capabilities
- @{known_caps}= Create List timestamps word_timestamps segments diarization
+ @{known_caps}= Create List timestamps word_timestamps segments diarization language_detection vad_filter translation chunked_processing speaker_identification long_form📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ASR Capabilities Are From Known Set | |
| [Documentation] Verify reported capabilities are valid known capabilities | |
| ... Known capabilities: timestamps, word_timestamps, segments, diarization | |
| [Tags] infra | |
| ${info}= Get ASR Service Info ${ASR_URL} | |
| # Define known capabilities | |
| @{known_caps}= Create List timestamps word_timestamps segments diarization | |
| # All reported capabilities should be known | |
| FOR ${cap} IN @{info}[capabilities] | |
| ${is_known}= Evaluate $cap in $known_caps | |
| Should Be True ${is_known} | |
| ... Unknown capability reported: ${cap}. Known: ${known_caps} | |
| END | |
| ASR Capabilities Are From Known Set | |
| [Documentation] Verify reported capabilities are valid known capabilities | |
| ... Known capabilities: timestamps, word_timestamps, segments, diarization | |
| [Tags] infra | |
| ${info}= Get ASR Service Info ${ASR_URL} | |
| # Define known capabilities | |
| @{known_caps}= Create List timestamps word_timestamps segments diarization language_detection vad_filter translation chunked_processing speaker_identification long_form | |
| # All reported capabilities should be known | |
| FOR ${cap} IN @{info}[capabilities] | |
| ${is_known}= Evaluate $cap in $known_caps | |
| Should Be True ${is_known} | |
| ... Unknown capability reported: ${cap}. Known: ${known_caps} | |
| END |
🤖 Prompt for AI Agents
In `@tests/asr/protocol_tests.robot` around lines 159 - 174, The test "ASR
Capabilities Are From Known Set" is using an incomplete known capabilities list
in the variable @{known_caps}, causing valid provider capabilities to be treated
as unknown; update @{known_caps} to include the additional capabilities returned
by providers (add language_detection, vad_filter, translation,
chunked_processing, speaker_identification, long_form) so the assertion over
${info}[capabilities] only fails for truly unknown entries. Ensure the list in
the test matches all supported capability strings returned by
FasterWhisperService, TransformersService, NemoService, and VibeVoiceService.
| # Response models | ||
| class HealthResponse(BaseModel): | ||
| status: str | ||
| model: str | ||
| provider: str | ||
|
|
||
|
|
||
| class InfoResponse(BaseModel): | ||
| model_id: str | ||
| provider: str | ||
| capabilities: list[str] | ||
| supported_languages: Optional[list[str]] = None | ||
|
|
||
|
|
||
| class WordTimestamp(BaseModel): | ||
| word: str | ||
| start: float | ||
| end: float | ||
| confidence: float | ||
|
|
||
|
|
||
| class TranscriptionResult(BaseModel): | ||
| text: str | ||
| words: list[WordTimestamp] | ||
| segments: list[dict] |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Mock response models diverge from production models in common/response_models.py.
The mock defines its own HealthResponse, InfoResponse, WordTimestamp, and TranscriptionResult that differ from the canonical models:
| Field | Production (common/response_models.py) |
Mock |
|---|---|---|
Word.confidence |
Optional[float] |
float (required) |
TranscriptionResult.segments |
List[Segment] (typed model) |
list[dict] (untyped) |
TranscriptionResult.speakers |
Optional[List[Speaker]] |
absent |
TranscriptionResult.language |
Optional[str] |
absent |
TranscriptionResult.duration |
Optional[float] |
absent |
Tests passing against this mock may not catch schema mismatches with the real service (e.g., missing speakers/language/duration fields, or confidence being null). Consider importing and reusing the models from common/response_models.py, or at minimum aligning the mock's TranscriptionResult shape to include optional fields that the production API returns.
🤖 Prompt for AI Agents
In `@tests/libs/mock_asr_server.py` around lines 43 - 67, The mock response models
(HealthResponse, InfoResponse, WordTimestamp, TranscriptionResult) must match
the canonical models: either import and reuse the classes from
common/response_models.py or update the mock definitions so
WordTimestamp.confidence is Optional[float], TranscriptionResult.segments is
typed as List[Segment] (not list[dict]), and include optional fields speakers:
Optional[List[Speaker]], language: Optional[str], and duration: Optional[float];
update the mock module to reference the canonical Segment and Speaker types (or
import them) so tests exercise the same schema as production.
| @{env_vars}= Create List | ||
| IF "${model}" != "${EMPTY}" | ||
| Append To List ${env_vars} ASR_MODEL=${model} | ||
| END | ||
| Append To List ${env_vars} ASR_PORT=${port} | ||
|
|
||
| Log To Console \n🚀 Starting ${service} on port ${port}... | ||
| IF "${model}" != "${EMPTY}" | ||
| Log To Console Model: ${model} | ||
| END | ||
|
|
||
| # Start the service | ||
| ${result}= Run Process docker compose up -d --build ${service} | ||
| ... cwd=${asr_dir} | ||
| ... env:ASR_MODEL=${model} | ||
| ... env:ASR_PORT=${port} |
There was a problem hiding this comment.
env:ASR_MODEL=${model} passes an empty string when no model is specified, overriding docker-compose defaults.
When Start GPU ASR Service is called without ${model} (defaults to ${EMPTY}), line 102 sets ASR_MODEL= as an empty string in the subprocess environment. Docker Compose's ${ASR_MODEL:-default} only applies the default when the variable is unset, not when it's set to empty — so the service would receive an empty model ID, likely causing a startup failure.
Additionally, the @{env_vars} list built on lines 88-92 is never passed to Run Process — it's dead code.
🐛 Proposed fix: only pass ASR_MODEL when non-empty
# Start the service
+ IF "${model}" != "${EMPTY}"
${result}= Run Process docker compose up -d --build ${service}
... cwd=${asr_dir}
... env:ASR_MODEL=${model}
... env:ASR_PORT=${port}
+ ELSE
+ ${result}= Run Process docker compose up -d --build ${service}
+ ... cwd=${asr_dir}
+ ... env:ASR_PORT=${port}
+ ENDAnd remove the unused @{env_vars} construction (lines 88-92).
🤖 Prompt for AI Agents
In `@tests/resources/asr_keywords.robot` around lines 88 - 103, The subprocess is
setting ASR_MODEL to an empty string which overrides docker-compose defaults and
the @{env_vars} list is unused; update the Start GPU ASR Service invocation so
you remove the dead @{env_vars} construction and only include ASR_MODEL in the
env passed to Run Process when "${model}" is not empty (keep env:ASR_PORT
always), i.e., conditionally add the ASR_MODEL environment entry in the Run
Process call instead of unconditionally using env:ASR_MODEL=${model} and delete
the unused @{env_vars} lines.
|
| Metric | Count |
|---|---|
| ✅ Passed | 124 |
| ❌ Failed | 19 |
| 📊 Total | 143 |
📊 View Reports
GitHub Pages (Live Reports):
Download Artifacts:
- robot-test-reports-html-pr-labeled - HTML reports
- robot-test-results-xml-pr-labeled - XML output
config.yml.templateto include capabilities for ASR providers, detailing features like word timestamps and speaker segments.vibevoiceprovider configuration for Microsoft VibeVoice ASR, supporting speaker diarization..env.templatewith clearer provider selection and model configuration options, including CUDA settings and voice activity detection.docker-compose.ymlto support multiple ASR providers with detailed service configurations.commonmodule, enhancing code reusability and maintainability.README.mdto reflect the new provider-based architecture and usage instructions for starting different ASR services.Summary by CodeRabbit
Release Notes
New Features
Configuration
Documentation
Tests