Skip to content

Comments

Fix: onborading improvements#287

Merged
AnkushMalaker merged 4 commits intofeat/vibevoice-asrfrom
fix/onborading-improvements
Feb 7, 2026
Merged

Fix: onborading improvements#287
AnkushMalaker merged 4 commits intofeat/vibevoice-asrfrom
fix/onborading-improvements

Conversation

@AnkushMalaker
Copy link
Collaborator

@AnkushMalaker AnkushMalaker commented Feb 6, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Configuration values are now preserved across re-runs, preventing loss of settings.
    • Auto-detection of Tailscale addresses for HTTPS setup with fallback to localhost.
    • Enhanced speaker identification capabilities for transcription services.
    • CLI-driven configuration with wizard-style confirmations.
  • Improvements

    • Interactive setup now includes safety notes about re-running.
    • Better error messages with actionable guidance for troubleshooting.
    • Streamlined optional services configuration flow.

- Introduced `detect_tailscale_info` function to automatically retrieve Tailscale DNS name and IP address, improving user experience for service configuration.
- Added `detect_cuda_version` function to identify the system's CUDA version, streamlining compatibility checks for GPU-based services.
- Updated `wizard.py` to utilize the new detection functions, enhancing service selection and configuration processes based on user input.
- Improved error handling and user feedback in service setup, ensuring clearer communication during configuration steps.
- Refactored existing code to improve maintainability and code reuse across setup utilities.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 6, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

Walkthrough

This pull request refactors setup utilities into a shared module (setup_utils.py) and enhances configuration flows across backend authentication, ASR services, and the wizard CLI. The changes introduce CLI-first configuration patterns, preserve existing environment values during re-runs, auto-detect Tailscale addresses for HTTPS setup, and update ASR provider capabilities.

Changes

Cohort / File(s) Summary
Core Setup Utilities
setup_utils.py
Added two new public utility functions: detect_tailscale_info() to detect MagicDNS names and IPv4 addresses via Tailscale CLI, and detect_cuda_version() to parse CUDA version from nvidia-smi and map to PyTorch strings (cu121, cu126, cu128).
Backend Authentication & Configuration
backends/advanced/init.py
Reworked authentication flow to preserve existing .env values during re-runs; updated transcription setup to support CLI-driven configuration with wizard-style messaging; enhanced HTTPS setup with Tailscale auto-detection; added mask_api_key() public method for API key masking.
ASR Services & Providers
extras/asr-services/init.py, extras/asr-services/providers/vibevoice/transcriber.py
Refactored to use shared utilities (detect_cuda_version, read_env_value); updated VibeVoice provider capabilities to include speaker_identification and long_form; implemented CLI-first selection patterns for provider and model; fixed speaker field handling to avoid double-prefixing in output parsing.
Speaker Recognition Service
extras/speaker-recognition/init.py
Refactored to delegate CUDA detection, environment value reading, and API key masking to shared utilities from setup_utils.
Wizard & Service Orchestration
wizard.py
Enhanced select_services() to accept optional transcription_provider parameter and auto-add matching ASR services; added Tailscale auto-detection for HTTPS configuration with fallback to localhost; improved error handling and user messaging for setup failures and retries.
ASR Tests
tests/asr/protocol_tests.robot
Expanded documented ASR capabilities set to include speaker_identification, long_form, language_detection, vad_filter, translation, and chunked_processing alongside existing diarization capabilities.

Sequence Diagram

sequenceDiagram
    participant User
    participant Wizard
    participant SetupUtils
    participant Backend
    participant ASR as ASR Services
    participant Tailscale

    User->>Wizard: Run setup with options
    Wizard->>Wizard: Main flow with safety notes
    
    Note over Wizard: Transcription Provider Selection
    User->>Wizard: Select transcription provider
    Wizard->>Wizard: Auto-add matching ASR service
    
    Note over Wizard: HTTPS Configuration
    alt HTTPS Enabled
        Wizard->>SetupUtils: detect_tailscale_info()
        SetupUtils->>Tailscale: Query MagicDNS & IPv4
        Tailscale-->>SetupUtils: DNS name and/or IP
        SetupUtils-->>Wizard: Detected address
        Wizard->>Wizard: Use detected address as default
        User->>Wizard: Confirm/override server address
    end
    
    Note over Wizard: Service Setup
    Wizard->>Backend: Configure with auth/values
    Backend->>SetupUtils: Preserve/reuse existing env values
    
    Wizard->>ASR: Setup with transcription provider
    ASR->>SetupUtils: detect_cuda_version()
    SetupUtils-->>ASR: CUDA version (cu121/cu126/cu128)
    ASR->>ASR: Configure provider & model
    
    Wizard-->>User: Configuration complete with summary
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title contains a typo ('onborading' instead of 'onboarding') and is vague/generic; it doesn't clearly convey the substantial improvements made to authentication flow, transcription setup, Tailscale integration, and speaker identification logic. Correct the typo to 'onboarding' and make the title more specific to the primary changes, such as: 'Improve onboarding flow with reusable config values and Tailscale auto-detection' or similar.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 96.77% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/onborading-improvements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…ndling

- Modified the capabilities of the VibeVoice ASR provider to include 'speaker_identification' and 'long_form', enhancing its feature set.
- Adjusted the speaker identification logic in the VibeVoiceTranscriber to prevent double-prefixing and ensure accurate speaker representation.
- Updated protocol tests to reflect the expanded list of known ASR capabilities, ensuring comprehensive validation of reported features.
@AnkushMalaker AnkushMalaker changed the base branch from dev to feat/vibevoice-asr February 6, 2026 19:50
@AnkushMalaker
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backends/advanced/init.py (2)

640-640: ⚠️ Potential issue | 🟡 Minor

Rich markup mismatch: opening [yellow] closed with [/warning].

This will cause the yellow styling to leak into subsequent console output.

Proposed fix
-                self.console.print(f"[yellow][WARNING][/warning] SSL script not found at {ssl_script}")
+                self.console.print(f"[yellow][WARNING][/yellow] SSL script not found at {ssl_script}")

133-135: ⚠️ Potential issue | 🟡 Minor

Same Optional[str] return type mismatch as noted in extras/speaker-recognition/init.py.

read_env_value returns Optional[str], but this method is annotated as returning str.

🤖 Fix all issues with AI agents
In `@backends/advanced/init.py`:
- Line 458: The printed message uses an unnecessary f-string; update the
self.console.print call (the invocation that prints "[green]✅[/green]
Obsidian/Neo4j: enabled (configured via wizard)") to use a plain string literal
instead of an f-prefixed string so the Ruff F541 warning is resolved.

In `@extras/asr-services/init.py`:
- Line 40: The capabilities list in extras/asr-services/init.py no longer
includes "segments" but VibeVoice's transcriber (VibeVoice.transcribe / class
VibeVoice) still returns a "segments" field; restore consistency by adding
"segments" back into the capabilities array in init.py (so downstream protocol
check sees segments) or alternatively remove the segment-generation code from
VibeVoice.transcribe if segments are intentionally deprecated; also update the
test mock in tests/libs/mock_asr_server.py to match whichever choice you make
(ensure the mock's capabilities array matches VibeVoice's declared
capabilities).

In `@extras/speaker-recognition/init.py`:
- Around line 72-74: The return type annotation of read_existing_env_value is
incorrect: read_env_value returns Optional[str], so update the signature of
read_existing_env_value to return Optional[str] and import Optional from typing
(ensure Optional is added to the existing imports); adjust the docstring if
desired to indicate the function may return None so callers and type-checkers
are accurate.

In `@setup_utils.py`:
- Around line 397-406: The code currently compares the string cuda_ver (built
from match.groups()) using lexicographic operators, which misclassifies versions
like "12.10"; change the logic to convert major and minor to integers and
compare numeric tuples instead (e.g., use tuple comparisons like (int(major),
int(minor)) >= (12, 8) to decide between "cu128", "cu126", "cu121"); update the
branch that assigns cuda_ver and the subsequent if/elif checks to use these
integer comparisons and remove string-based comparisons.

In `@wizard.py`:
- Around line 279-292: Several console.print calls in the
subprocess.TimeoutExpired and subprocess.CalledProcessError handlers use
f-strings without interpolation; change those f"... " literals to plain string
literals (remove the leading "f") for the messages passed to console.print in
the except blocks handling subprocess.TimeoutExpired (the prints for
"Configuration may be partially written.", "To retry just this service:", and
the "cd {service['path']} && {' '.join(service['cmd'])}" line should keep
interpolation and remain f-strings while the other two should be non-f strings)
and subprocess.CalledProcessError (the two "[yellow]   ..." helper messages
should be plain strings while the lines composing the retry command and the main
error message that use e.returncode or service interpolation should remain
f-strings); update the console.print calls in those exception handlers
accordingly.

- Replaced MicOff icon with Square icon in MainRecordingControls and SimplifiedControls for a more intuitive user experience.
- Enhanced button interactions to streamline recording start/stop actions, including a pulsing effect during recording.
- Updated status messages and button states to provide clearer feedback on recording status and actions.
- Improved accessibility by ensuring buttons are disabled appropriately based on recording state and microphone access.
* Enhance test environment setup and configuration

- Added a new interactive setup script for configuring test API keys (Deepgram, OpenAI) to streamline the testing process.
- Introduced a template for the .env.test file to guide users in setting up their API keys.
- Updated the Makefile to include a new 'configure' target for setting up API keys.
- Enhanced the start-containers script to warn users if API keys are still set to placeholder values, improving user awareness during testing.
- Updated .gitignore to include the new .env.test.template file.

* Remove outdated documentation and restructure feature overview

- Deleted the `features.md` file, consolidating its content into the new `overview.md` for a more streamlined documentation structure.
- Updated `init-system.md` to link to the new `overview.md` instead of the removed `features.md`.
- Removed `ports-and-access.md` as its content was integrated into other documentation files, enhancing clarity and reducing redundancy.
- Revised the `README.md` in the advanced backend to reflect the new naming conventions and updated links to documentation.
- Introduced a new `plugin-development-guide.md` to assist users in creating custom plugins, expanding the documentation for developers.

* tech debt
@AnkushMalaker AnkushMalaker merged commit 8dc923f into feat/vibevoice-asr Feb 7, 2026
2 of 3 checks passed
AnkushMalaker added a commit that referenced this pull request Feb 7, 2026
* Enhance StreamingTranscriptionConsumer and conversation job handling

- Removed cumulative audio offset tracking from StreamingTranscriptionConsumer as Deepgram provides cumulative timestamps directly.
- Updated store_final_result method to utilize Deepgram's cumulative timestamps without adjustments.
- Implemented completion signaling for transcription sessions in Redis, ensuring conversation jobs wait for all results before processing.
- Improved error handling to signal completion even in case of errors, preventing conversation jobs from hanging.
- Enhanced logging for better visibility of transcription completion and error states.

* Enhance ASR services configuration and provider management

- Updated `config.yml.template` to include capabilities for ASR providers, detailing features like word timestamps and speaker segments.
- Added a new `vibevoice` provider configuration for Microsoft VibeVoice ASR, supporting speaker diarization.
- Enhanced `.env.template` with clearer provider selection and model configuration options, including CUDA settings and voice activity detection.
- Improved `docker-compose.yml` to support multiple ASR providers with detailed service configurations.
- Introduced common utilities for audio processing and ASR service management in the `common` module, enhancing code reusability and maintainability.
- Updated `README.md` to reflect the new provider-based architecture and usage instructions for starting different ASR services.

* Enhance transcription provider support and capabilities management

- Added support for the new `vibevoice` transcription provider, including configuration options for built-in speaker diarization.
- Updated `ChronicleSetup` to include `vibevoice` in the transcription provider selection and adjusted related descriptions.
- Enhanced the `ModelDef` and `Conversation` models to reflect the addition of `vibevoice` in provider options.
- Introduced a new capabilities management system to validate provider features, allowing conditional execution of tasks based on provider capabilities.
- Improved logging and user feedback in transcription and speaker recognition jobs to reflect the capabilities of the selected provider.
- Updated documentation to include details on the new `vibevoice` provider and its features.

* Enhance conversation reprocessing and job management

- Introduced a new job for regenerating title and summary after memory processing to ensure fresh context is available.
- Updated the reprocess_transcript and reprocess_speakers functions to enqueue title/summary jobs based on memory job dependencies, improving job chaining and execution order.
- Enhanced validation for transcripts to account for provider capabilities, ensuring proper handling of diarization and segment data.
- Improved logging for job enqueuing and processing stages, providing clearer insights into the workflow and dependencies.

* Enhance Knowledge Graph integration and service management

- Introduced support for Knowledge Graph functionality, enabling entity and relationship extraction from conversations using Neo4j.
- Updated `services.py` to manage Knowledge Graph profiles and integrate with existing service commands.
- Enhanced Docker Compose configurations to include Neo4j service and environment variables for Knowledge Graph setup.
- Added new API routes and models for Knowledge Graph operations, including entity and relationship management.
- Improved documentation and configuration templates to reflect the new Knowledge Graph features and setup instructions.

* Add Knowledge Graph API routes and integrate into backend

- Introduced new `knowledge_graph_routes.py` to handle API endpoints for managing knowledge graph entities, relationships, and promises.
- Updated `__init__.py` to include the new knowledge graph router in the main router module.
- Enhanced documentation to reflect the addition of knowledge graph functionality, improving clarity on available API routes and their purposes.

* Update .gitignore to include individual plugin configuration files and SDK directory

- Added entries for individual plugin config files to ensure user-specific settings are ignored.
- Included the SDK directory in .gitignore to prevent unnecessary files from being tracked.

* Fix: onborading improvements (#287)

* Enhance setup utilities and wizard functionality

- Introduced `detect_tailscale_info` function to automatically retrieve Tailscale DNS name and IP address, improving user experience for service configuration.
- Added `detect_cuda_version` function to identify the system's CUDA version, streamlining compatibility checks for GPU-based services.
- Updated `wizard.py` to utilize the new detection functions, enhancing service selection and configuration processes based on user input.
- Improved error handling and user feedback in service setup, ensuring clearer communication during configuration steps.
- Refactored existing code to improve maintainability and code reuse across setup utilities.

* Update ASR service capabilities and improve speaker identification handling

- Modified the capabilities of the VibeVoice ASR provider to include 'speaker_identification' and 'long_form', enhancing its feature set.
- Adjusted the speaker identification logic in the VibeVoiceTranscriber to prevent double-prefixing and ensure accurate speaker representation.
- Updated protocol tests to reflect the expanded list of known ASR capabilities, ensuring comprehensive validation of reported features.

* Refactor audio recording controls for improved UI and functionality

- Replaced MicOff icon with Square icon in MainRecordingControls and SimplifiedControls for a more intuitive user experience.
- Enhanced button interactions to streamline recording start/stop actions, including a pulsing effect during recording.
- Updated status messages and button states to provide clearer feedback on recording status and actions.
- Improved accessibility by ensuring buttons are disabled appropriately based on recording state and microphone access.

* chore:test docs and test improvements  (#288)

* Enhance test environment setup and configuration

- Added a new interactive setup script for configuring test API keys (Deepgram, OpenAI) to streamline the testing process.
- Introduced a template for the .env.test file to guide users in setting up their API keys.
- Updated the Makefile to include a new 'configure' target for setting up API keys.
- Enhanced the start-containers script to warn users if API keys are still set to placeholder values, improving user awareness during testing.
- Updated .gitignore to include the new .env.test.template file.

* Remove outdated documentation and restructure feature overview

- Deleted the `features.md` file, consolidating its content into the new `overview.md` for a more streamlined documentation structure.
- Updated `init-system.md` to link to the new `overview.md` instead of the removed `features.md`.
- Removed `ports-and-access.md` as its content was integrated into other documentation files, enhancing clarity and reducing redundancy.
- Revised the `README.md` in the advanced backend to reflect the new naming conventions and updated links to documentation.
- Introduced a new `plugin-development-guide.md` to assist users in creating custom plugins, expanding the documentation for developers.

* tech debt
@AnkushMalaker AnkushMalaker deleted the fix/onborading-improvements branch February 7, 2026 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant