feat: sandbox by i-am-thor · Pull Request #1 · scoutqa-dot-ai/thor

i-am-thor · 2026-03-19T06:17:24Z

No description provided.

…vider contract CEO plan review added: provider-agnostic SandboxProvider interface, multi-sandbox identity model, live preview URLs, real-time telemetry, weighted evaluation rubric, cost model framework, quantified performance targets, comprehensive failure modes (25 failure/recovery pairs), and execution-only sandbox architecture (no MCP tool access in sandboxes). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Scored Daytona, Vercel Sandbox, E2B, and Cloudflare Sandbox across 8 weighted dimensions verified against official docs. Daytona wins as default provider (4.95/5): best API fit, lifecycle, previews, telemetry, and cost (~$50/mo vs $122-200 alternatives). E2B as future secondary for memory-preserving hibernation. LocalProvider fallback wrapping existing OpenCode container. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace coder subagent with sandbox-coder CLI binary that delegates coding work to an isolated Daytona sandbox. The binary uses cwd (must be a worktree), calls remote-cli /sandbox/exec which manages sandbox lifecycle and source sync. Thinker subagent stays for local planning and review. Sandbox identity deferred to later phases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Incorporate 12 architecture decisions from eng review: SandboxProvider interface, Daytona sessions for resilient execution, per-worktree locking, label-based orphan reconciliation, git-diff partial sync, snapshot warm starts, --reconnect/--pull subcommands, and per-phase unit test requirements. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Create docker/opencode/bin/sandbox-coder shell wrapper - Add COPY to Dockerfile for the new binary - Add POST /exec/sandbox-coder stub route with NDJSON streaming - Add validateSandboxCwd (worktrees-only) and validateSandboxCoderArgs (prompt, --reconnect, --pull subcommands) to policy.ts - Add 12 unit tests for sandbox-coder policy validation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add @daytonaio/sdk dependency - Create SandboxProvider interface with DaytonaSandboxProvider implementation (D6: provider abstraction from day 1) - Create SandboxManager with per-worktree locking (D9), label-based orphan reconciliation on startup (D8), and destroy lifecycle - Wire manager into index.ts: getOrCreate in sandbox-coder route, destroy hook on git worktree remove, reconcile before app.listen - Add 11 unit tests with mock provider covering: cache hits, concurrent dedup, destroy, reconcile (orphans + restore + errors) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…se 3) - syncIn: full tar on first call, git-diff partial sync on repeat (D15) - syncOut: download changed files, detect+handle deletes via git status - Fail loud on download errors (D14) — caller can use --pull to recover - resetSyncState helper for error recovery - 7 unit tests covering full sync, partial sync, deletes, nested dirs, download failures, and empty worktrees Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace stub handler with full pipeline: getOrCreate → syncIn → createSession → execSessionCommand → streamLogs → syncOut - Implement --reconnect: resume streaming from existing Daytona session, then syncOut (D7, D12) - Implement --pull: syncOut only for file recovery (D12, D14) - Emit [sandbox:session] early for reconnection support - Emit [sandbox:done] with files_changed and files_deleted counts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… dead code cleanup - Add path traversal protection in syncOut (safeResolvePath boundary check) - Add tar `--` terminator to prevent option injection from filenames - Check exit codes for all sandbox executeCommand calls (D14: fail loud) - Use file-based agent exit code (cmdInfo) instead of unreliable async result - Remove dead code: unused `session` var, unused `logError` import - Fix pr merge test: move to allowed commands (matches allowlist) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove in-memory sandbox ID cache and reconcile logic. The manager now queries Daytona by labels on every lookup, eliminating stale state after restarts or external cleanup. Only the creation dedup lock remains in memory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Use single-quote shell escaping for agent prompts to prevent shell metachar expansion ($(), backticks) inside Daytona sessions - Consolidate WORKTREE_PREFIX and WORKTREES_ONLY_PREFIX into single WORKTREES_PREFIX constant used by both git worktree and sandbox-coder validation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add sandbox/setup.ts: uploads opencode.json (no MCP, permission: allow) and auth.json into sandbox so OpenCode can authenticate and run - Add snapshot-first creation in SandboxManager: tries named snapshot for warm starts, falls back to bare image if unavailable (D15) - Add createSnapshot/getSnapshot to SandboxProvider interface - Mount opencode data dir into remote-cli container (read-only) for auth.json access - Wire Daytona env vars (API key, URL, target) into docker-compose Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove the /tmp/.sandbox-exit-code cat hack and use the proper getSessionCommand API to retrieve the agent's exit code after log streaming completes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Snapshots are premature — need Daytona running first. Removes createSnapshot/getSnapshot from the provider interface, snapshot fallback logic from SandboxManager.doCreate, and related tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Drop apiUrl and target config — not needed. Make DAYTONA_API_KEY required at startup with a clear error message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The base image (node:22-slim) doesn't have opencode installed. Setup now runs npm install -g opencode-ai if the binary isn't found. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Default sandbox image to daytona-medium (configurable via SANDBOX_IMAGE env var) - Remove runtime opencode install from setup (pre-installed in snapshot) - Fix sandbox paths to /home/daytona (matching daytona user in snapshot) - Update sync workdir and agent prompt to correct paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add snapshot support to provider create() — uses Daytona snapshot API - Default to daytona-medium snapshot (configurable via SANDBOX_SNAPSHOT env var) - Remove opencode install from setup (pre-installed in snapshot) - Fix sandbox paths to /home/daytona (matching daytona user uid 1001) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Session-based log streaming via WebSocket was hanging silently. Switch to synchronous executeCommand which waits for completion and returns output directly. Session logic retained for --reconnect. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…execution" This reverts commit 861fa04.

…agent - Use createPty for real-time streaming of opencode output (sessions hang) - Parse JSON lines from PTY output, extract opencode sessionID - Detect completion via step_finish event, timeout after 1 hour - Replace --reconnect with --session <id> for opencode session continuity - Strip ANSI escape sequences from PTY output before JSON parsing - Update all test mocks to match new provider interface Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… config The code in index.ts throws if DAYTONA_API_KEY is empty, but compose defaulted it to empty string (:-), silently crashing remote-cli and taking down git/gh/scoutqa endpoints. Use :? to fail at container start. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Three bugs fixed: 1. syncInFull: exclude .git from tar (worktree pointer file references host paths that don't exist in sandbox), init a standalone repo in sandbox so git commands work correctly. 2. syncInPartial: use --diff-filter to separate changed files (for tar) from deleted files (for rm in sandbox). Re-commit in sandbox after each partial sync so HEAD stays current. 3. syncOut: filter deleted files out of the download list before iterating — git diff --name-only includes deletions which would cause downloadFile to throw before the delete loop runs. Added file-level logging to both syncInPartial and syncOut. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ustom agent - Install opencode-ai@1.2.27 via npm during setup (once per sandbox, tracked with a Set to skip repeat calls) - Remove custom coder agent prompt — use the default agent instead - Add --model flag to opencode command, defaulting to openai/gpt-5.3-codex-spark (configurable via SANDBOX_MODEL env var) - Remove model field from opencode.json config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ndbox Recursively removes any field whose key contains "refresh" (case-insensitive) from the auth JSON before uploading to the sandbox. Prevents the remote opencode from refreshing tokens and invalidating the main opencode's credentials. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Tokens expire, so auth must be refreshed each time. Split into: - setupSandboxOpenCode: one-time install + config (once per sandbox) - uploadSandboxAuth: fresh auth credentials (every prompt) Also changed AUTH_JSON_PATH from a module-level const to a function so env var overrides work at call time (fixes test reliability). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Document decisions made during implementation: PTY streaming over sessions, daytona-medium snapshot, per-prompt auth upload, no in-memory cache, and bidirectional file sync. Update architecture diagram to match current state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The daytona-medium snapshot uses nvm-managed node, so sudo resets PATH and cannot find npm. Use sudo "$(which npm)" to resolve the full path before elevating. Also capture and log npm output on install failure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move git user.email/user.name config to one-time setup (--global persists for sandbox lifetime) so all git commands work. Add error output capture to syncInFull so failures are diagnosable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

….json Removing refresh fields entirely broke auth parsing. Keep the keys but set string values to "" and numeric values to 0, preventing the sandbox from refreshing tokens while preserving the expected auth.json shape. Also fix pre-existing test bug matching $(which npm) command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ming Non-JSON PTY output (stacktraces, INFO lines, errors) was silently discarded, making sandbox agent crashes invisible. Now logs all non-JSON lines as warnings, detects early process exit via shell prompt return, and surfaces stderrTail in AgentStreamResult for CLI output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The daytona-medium snapshot may have opencode at /usr/bin/opencode while npm installs to /usr/local/share/nvm/.../bin/opencode, causing the old binary to shadow the new one. Remove whichever exists before installing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tall The daytona-medium snapshot ships a broken /usr/bin/corepack that blocks npm global installs. Remove it and reinstall via npm alongside the stale opencode binary cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add resolveCorrelationKeys() to the gateway /cron path — previously only Slack and GitHub events resolved aliases, so cron triggers with alias keys (e.g. git:branch:repo:branch) would create new sessions instead of resuming. Add e2e test section 8 (alias-based session matching): - Trigger #1 runs git worktree add → alias registered via [thor:meta] - Trigger #2 via gateway /cron uses alias as correlationKey - Verifies gateway resolves alias, runner resumes same session, and agent recalls context from trigger #1 Also fix approval e2e parser to check stderr for [thor:meta] format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

i-am-thor and others added 30 commits March 19, 2026 05:58

docs: define hosted coding sandbox evaluation

efa64b2

refactor: simplify DaytonaSandboxProvider to require only API key

e54e8ee

Drop apiUrl and target config — not needed. Make DAYTONA_API_KEY required at startup with a clear error message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: install opencode-ai in sandbox during setup phase

b7a668c

The base image (node:22-slim) doesn't have opencode installed. Setup now runs npm install -g opencode-ai if the binary isn't found. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert "fix: use executeCommand instead of session for sandbox agent …

54c2283

…execution" This reverts commit 861fa04.

daohoangson and others added 4 commits March 20, 2026 07:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: sandbox#1

feat: sandbox#1
i-am-thor wants to merge 34 commits intomainfrom
parallel-sandbox-research

i-am-thor commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

i-am-thor commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants