Skip to content

lume setup: Add Claude computer-use agent mode to replace brittle presets #1215

@f-trycua

Description

@f-trycua

Summary

Replace the current YAML preset-based lume setup automation with a Claude computer-use agent that can adapt to any macOS version's Setup Assistant UI.

Problem

The current lume setup --unattended <preset> system uses hardcoded YAML command sequences (e.g., tahoe, sequoia) that break when Apple changes the Setup Assistant UI between versions. For example, macOS 26.4 changed "Set Up Later" to "Other Sign-In Options" on the Apple ID screen, causing the tahoe preset to fail at step 66/167.

Maintaining per-version presets is fragile and doesn't scale.

Proposal

Add --mode agent (or --agent) flag to lume setup that uses Claude's computer-use API to navigate the Setup Assistant intelligently:

# Using env var
ANTHROPIC_API_KEY=sk-ant-... lume setup my-vm --mode agent

# Using flag
lume setup my-vm --mode agent --anthropic-key sk-ant-...

How it works

Lume already has all the infrastructure needed:

  1. VNC server — already runs during setup, provides screenshot capture
  2. VNC input client — already handles mouse clicks, keyboard input, coordinate mapping
  3. Screenshot → PNG — already captures framebuffer as images

The agent loop would:

  1. Capture VNC screenshot → base64 PNG
  2. Send to Claude API with computer_20251124 tool:
    {
      "type": "computer_20251124",
      "name": "computer",
      "display_width_px": <vnc_width>,
      "display_height_px": <vnc_height>
    }
  3. System prompt:

    Complete the macOS Setup Assistant. Select English, United States. Skip Apple ID sign-in. Create a user account with username 'lume' and password 'lume'. Enable Remote Login (SSH). Skip all optional features (Siri, Analytics, Screen Time, etc.). Reach the desktop.

  4. Parse Claude's tool_use responses and execute via existing VNC infrastructure:
    • screenshot → capture VNC framebuffer, return base64 image
    • left_click coordinate:[x,y] → VNC mouse click
    • type text:"..." → VNC key events
    • key text:"Return" → VNC special key press
  5. Return tool_result with new screenshot after each action
  6. Loop until Claude responds with text only (task complete) or max iterations reached

API details

  • Endpoint: POST https://api.anthropic.com/v1/messages
  • Beta header: anthropic-beta: computer-use-2025-11-24
  • Recommended model: claude-sonnet-4-6 (fast, cheap, good at UI navigation)
  • Tool version: computer_20251124
  • Actions needed: screenshot, left_click, type, key, double_click, scroll

Implementation in Swift

The agent loop is straightforward — lume already has:

  • VNCClient for screenshot capture and input
  • Coordinate system mapping between captured/display/VNC space
  • The UnattendedInstaller structure that manages VM boot + automation

We'd add:

  • AnthropicClient — simple HTTP client for the Messages API (just URLSession, no SDK needed)
  • AgentSetupRunner — the agent loop replacing PresetCommandRunner
  • CLI flags: --mode agent|preset (default: preset for backward compat), --anthropic-key, --model (default: claude-sonnet-4-6)

Benefits

  • Version-agnostic — works on any macOS version without preset maintenance
  • Self-healing — if a click misses, Claude sees the result and corrects
  • Configurable — custom system prompts for different setup requirements (e.g., different username/password, specific settings)
  • Debug-friendly--debug flag still saves screenshots at each step

Backward compatibility

  • --unattended tahoe / --unattended sequoia continue to work as before (preset mode)
  • --mode agent is opt-in and requires an API key
  • Could eventually become the default if presets are deprecated

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions