Skip to content

Comments

Feature wizard#16

Closed
thestumonkey wants to merge 5 commits intodevfrom
feature-wizard
Closed

Feature wizard#16
thestumonkey wants to merge 5 commits intodevfrom
feature-wizard

Conversation

@thestumonkey
Copy link
Member

No description provided.

@thestumonkey
Copy link
Member Author

@claude review

@claude
Copy link

claude bot commented Dec 21, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

@thestumonkey
Copy link
Member Author

@claude review pls

@claude
Copy link

claude bot commented Dec 21, 2025

Claude finished @thestumonkey's task in 2m 32s —— View job


Code Review Complete

I've completed a thorough review of the "Feature wizard" PR. This PR introduces a setup wizard and Docker service management system. Here are my findings:


🎯 Overview

This PR adds:

  1. Setup Wizard: Guided first-time configuration for API keys
  2. Docker Service Management: Admin UI for controlling Docker containers
  3. Config/Secrets Separation: Splits sensitive data into secrets.yaml
  4. API Routes: New wizard and services endpoints

✅ Strengths

  1. Good Security Separation: Splitting config.yaml and secrets.yaml is the right approach
  2. API Key Masking: Wizard properly masks keys (shows only last 4 characters) in GET responses
  3. Admin-Only Protection: Services and wizard update endpoints correctly require superuser permissions
  4. User-Controllable Flag: Docker manager has user_controllable flag to prevent users from stopping critical services
  5. Cache Invalidation: Config changes properly invalidate app_config cache via reload_config()
  6. Worker Restart: Wizard automatically restarts workers container after API key updates

🔴 Critical Issues

1. Docker Socket Security Risk (docker_manager.py)

Severity: CRITICAL

The DockerManager class exposes full Docker socket access through the web API. This is a major security vulnerability:

# docker_manager.py:89
self._client = docker.from_env()  # Full Docker socket access

Problems:

  • Admin users can potentially escape containers
  • Can start/stop ANY container (including backend itself)
  • Can mount host filesystem via Docker API
  • No resource limits or audit logging

Recommendations:

  • Add explicit audit logging for all Docker operations (who did what, when)
  • Consider restricting to specific Docker networks only
  • Add rate limiting to prevent abuse
  • Document security implications in deployment docs
  • Consider using Docker socket proxy with limited permissions instead

Location: backends/advanced/src/advanced_omi_backend/docker_manager.py:78-103


2. Async Event Loop Handling Bug (app_config.py:453-464)

Severity: HIGH

The API key loading logic has a dangerous pattern that can cause deadlocks:

# app_config.py:454-460
try:
    loop = asyncio.get_running_loop()
    # We're in an event loop - load synchronously using thread pool
    with concurrent.futures.ThreadPoolExecutor() as executor:
        config = executor.submit(lambda: asyncio.run(config_parser.load())).result()

Problem: Running asyncio.run() inside an executor from within an existing event loop creates a new nested event loop. This can cause:

  • Deadlocks if the inner loop awaits something blocked by outer loop
  • Resource leaks from unclosed event loops
  • Unpredictable behavior with shared resources

Recommendations:

  • Make _get_api_key() an async method instead of sync
  • Use await config_parser.load() directly when in event loop
  • Remove the nested asyncio.run() pattern entirely
  • If you need sync access, use a proper async-to-sync bridge pattern

Location: backends/advanced/src/advanced_omi_backend/app_config.py:453-464


⚠️ High Priority Issues

3. Missing Input Validation (wizard_routes.py, services_routes.py)

API keys are not validated before storage:

# wizard_routes.py:136-139
for key, value in key_mapping.items():
    if value is not None:
        setattr(config.api_keys, key, value)  # No validation!

Problems:

  • No format validation (could store invalid keys)
  • No length validation
  • No check for injection attempts

Recommendations:

  • Validate API key formats (e.g., regex for known providers)
  • Add length constraints (min/max)
  • Sanitize input before storage

Location: backends/advanced/src/advanced_omi_backend/routers/modules/wizard_routes.py:128-139


4. Service Name Injection Risk (services_controller.py)

Service names from URL parameters are passed directly to Docker API:

# services_controller.py:146
success, message = docker_manager.start_service(service_name)

Problem: While MANAGEABLE_SERVICES provides a whitelist, the error handling could expose information:

# docker_manager.py:354
container = self._client.containers.get(service_name)  # Arbitrary name from URL

Recommendations:

  • Add explicit service name validation regex (e.g., ^[a-z0-9_-]+$)
  • Return generic error messages (don't reveal container structure)
  • Add rate limiting on service operations

Location: backends/advanced/src/advanced_omi_backend/controllers/services_controller.py:124-172


5. Missing Error Recovery (wizard_routes.py:31-42)

Worker restart failure is only logged as warning, but wizard completes successfully:

# wizard_routes.py:39-42
if success:
    logger.info(f"✅ Workers container restarted")
else:
    logger.warning(f"⚠️ Could not restart workers: {message}")  # Silent failure!

Problem: User sees "API keys saved successfully!" even if workers didn't restart with new keys.

Recommendations:

  • Return worker restart status in API response
  • Show warning in UI if worker restart failed
  • Consider retry logic for worker restart

Location: backends/advanced/src/advanced_omi_backend/routers/modules/wizard_routes.py:31-42


💡 Medium Priority Issues

6. Password Hash Exposure (secrets_schema.py)

The secrets schema includes admin password hash, but there's no documentation about secure generation:

# secrets_schema.py:19-22
admin_password_hash: str = Field(
    default="",
    description="Bcrypt password hash (never store plaintext)"
)

Recommendations:

  • Add migration guide for existing deployments
  • Document hash generation process
  • Add validation to ensure it's actually a bcrypt hash (not plaintext)

7. Frontend: Wizard Can Be Skipped (SetupWizard.tsx:121-124)

Users can skip the wizard without configuring API keys:

// SetupWizard.tsx:121-124
const skipWizard = () => {
  localStorage.setItem(getStorageKey('wizard_dismissed'), 'true')
  navigate('/')
}

Recommendations:

  • Show prominent warning when skipping
  • Add "required for basic functionality" messaging
  • Consider restricting features until wizard completes

Location: backends/advanced/webui/src/pages/SetupWizard.tsx:121-124


8. Missing Secrets Template in .gitignore

The secrets.defaults.yaml template should be tracked, but secrets.yaml should be ignored:

Check:

# Verify .gitignore includes:
config/secrets.yaml
backends/advanced/config/secrets.yaml

Recommendation: Update .gitignore to explicitly exclude secrets.yaml while allowing secrets.defaults.yaml


🔧 Code Quality Issues

9. Inconsistent Error Handling

Some endpoints return JSONResponse directly, others raise HTTPException:

# services_controller.py:29-36 (JSONResponse)
return JSONResponse(
    status_code=503,
    content={"error": "Docker not available"}
)

# wizard_routes.py:88 (HTTPException)
raise HTTPException(status_code=500, detail=f"Failed to get wizard status: {str(e)}")

Recommendation: Standardize on HTTPException for consistency with FastAPI best practices


10. Verbose Logging in Production

API key loading has very verbose info-level logging:

# app_config.py:441-442
logger.info(f"🔍 Attempting to load {provider} API key from config/secrets.yaml")
logger.info(f"   Using existing ConfigParser instance")

Recommendation: Downgrade these to DEBUG level to reduce log noise in production


📋 Minor Issues

  1. Hardcoded URL in Logs (app_config.py:158, 178): URLs point to localhost:3000/localhost:4000 but should use environment variable
  2. Missing Type Hints: docker_manager.py:205 return type should be tuple[bool, str] (already used correctly elsewhere)
  3. Duplicate Restart Logic: wizard_routes.py:31 and settings_routes.py:34 have identical restart_workers_if_needed() - should be shared utility
  4. Services Auto-Refresh: Frontend refreshes every 10 seconds - consider WebSocket for real-time updates

🧪 Testing Recommendations

Missing Test Coverage:

  1. Docker manager service operations (start/stop/restart)
  2. Config/secrets file separation and merging
  3. Wizard completion flow
  4. API key validation and masking
  5. Worker restart after config changes
  6. Error cases (Docker unavailable, invalid service names, etc.)

Suggested Tests:

# test_wizard.py
async def test_wizard_masks_existing_keys()
async def test_wizard_requires_llm_and_transcription()
async def test_wizard_restarts_workers()

# test_docker_manager.py  
def test_only_user_controllable_services_listed()
def test_required_services_cannot_be_stopped()
def test_service_name_validation()

📚 Documentation Needs

  1. Security Model: Document the security implications of Docker socket access
  2. Migration Guide: How to move from environment variables to YAML config
  3. Secrets Management: Best practices for production deployments
  4. Wizard Flow: User documentation for first-time setup
  5. Service Dependencies: Which services are required vs optional

🎬 Deployment Considerations

  1. Secrets File Permissions: Ensure secrets.yaml has restricted permissions (600) in deployment
  2. Docker Socket Mounting: Document security risks of mounting /var/run/docker.sock
  3. Backup Strategy: Include both config and secrets in backup procedures
  4. Environment Precedence: Clarify whether env vars or YAML takes precedence

✅ Approval Recommendation

Status: Approve with Conditions

This is a valuable feature that significantly improves the user experience. The code is generally well-structured with good separation of concerns. However, the critical security issues should be addressed before merging to production:

Must Fix Before Merge:

  1. ✅ Add audit logging for all Docker operations
  2. ✅ Document Docker socket security implications
  3. ✅ Fix async event loop nested asyncio.run() bug

Should Fix Soon After:
4. Add input validation for API keys
5. Improve error handling consistency
6. Add comprehensive test coverage

The wizard feature itself is solid - nice UX, good API key masking, and proper admin protection. The config/secrets separation is architecturally sound. Main concern is the Docker management needs security hardening.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant