-
-
Notifications
You must be signed in to change notification settings - Fork 819
lume setup: Add Claude computer-use agent mode to replace brittle presets #1215
Description
Summary
Replace the current YAML preset-based lume setup automation with a Claude computer-use agent that can adapt to any macOS version's Setup Assistant UI.
Problem
The current lume setup --unattended <preset> system uses hardcoded YAML command sequences (e.g., tahoe, sequoia) that break when Apple changes the Setup Assistant UI between versions. For example, macOS 26.4 changed "Set Up Later" to "Other Sign-In Options" on the Apple ID screen, causing the tahoe preset to fail at step 66/167.
Maintaining per-version presets is fragile and doesn't scale.
Proposal
Add --mode agent (or --agent) flag to lume setup that uses Claude's computer-use API to navigate the Setup Assistant intelligently:
# Using env var
ANTHROPIC_API_KEY=sk-ant-... lume setup my-vm --mode agent
# Using flag
lume setup my-vm --mode agent --anthropic-key sk-ant-...How it works
Lume already has all the infrastructure needed:
- VNC server — already runs during setup, provides screenshot capture
- VNC input client — already handles mouse clicks, keyboard input, coordinate mapping
- Screenshot → PNG — already captures framebuffer as images
The agent loop would:
- Capture VNC screenshot → base64 PNG
- Send to Claude API with
computer_20251124tool:{ "type": "computer_20251124", "name": "computer", "display_width_px": <vnc_width>, "display_height_px": <vnc_height> } - System prompt:
Complete the macOS Setup Assistant. Select English, United States. Skip Apple ID sign-in. Create a user account with username 'lume' and password 'lume'. Enable Remote Login (SSH). Skip all optional features (Siri, Analytics, Screen Time, etc.). Reach the desktop.
- Parse Claude's
tool_useresponses and execute via existing VNC infrastructure:screenshot→ capture VNC framebuffer, return base64 imageleft_clickcoordinate:[x,y]→ VNC mouse clicktypetext:"..."→ VNC key eventskeytext:"Return"→ VNC special key press
- Return
tool_resultwith new screenshot after each action - Loop until Claude responds with text only (task complete) or max iterations reached
API details
- Endpoint:
POST https://api.anthropic.com/v1/messages - Beta header:
anthropic-beta: computer-use-2025-11-24 - Recommended model:
claude-sonnet-4-6(fast, cheap, good at UI navigation) - Tool version:
computer_20251124 - Actions needed:
screenshot,left_click,type,key,double_click,scroll
Implementation in Swift
The agent loop is straightforward — lume already has:
VNCClientfor screenshot capture and input- Coordinate system mapping between captured/display/VNC space
- The
UnattendedInstallerstructure that manages VM boot + automation
We'd add:
AnthropicClient— simple HTTP client for the Messages API (just URLSession, no SDK needed)AgentSetupRunner— the agent loop replacingPresetCommandRunner- CLI flags:
--mode agent|preset(default:presetfor backward compat),--anthropic-key,--model(default:claude-sonnet-4-6)
Benefits
- Version-agnostic — works on any macOS version without preset maintenance
- Self-healing — if a click misses, Claude sees the result and corrects
- Configurable — custom system prompts for different setup requirements (e.g., different username/password, specific settings)
- Debug-friendly —
--debugflag still saves screenshots at each step
Backward compatibility
--unattended tahoe/--unattended sequoiacontinue to work as before (preset mode)--mode agentis opt-in and requires an API key- Could eventually become the default if presets are deprecated
Related
- Current preset failure: macOS 26.4 changed "Set Up Later" → "Other Sign-In Options" on Apple ID screen
- Claude computer-use docs: https://docs.anthropic.com/en/docs/agents-and-tools/computer-use
- Reference implementation: https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo