Skip to content

feat: sandbox#1

Draft
i-am-thor wants to merge 34 commits intomainfrom
parallel-sandbox-research
Draft

feat: sandbox#1
i-am-thor wants to merge 34 commits intomainfrom
parallel-sandbox-research

Conversation

@i-am-thor
Copy link
Copy Markdown

No description provided.

i-am-thor and others added 30 commits March 19, 2026 05:58
…vider contract

CEO plan review added: provider-agnostic SandboxProvider interface,
multi-sandbox identity model, live preview URLs, real-time telemetry,
weighted evaluation rubric, cost model framework, quantified performance
targets, comprehensive failure modes (25 failure/recovery pairs), and
execution-only sandbox architecture (no MCP tool access in sandboxes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scored Daytona, Vercel Sandbox, E2B, and Cloudflare Sandbox across
8 weighted dimensions verified against official docs. Daytona wins
as default provider (4.95/5): best API fit, lifecycle, previews,
telemetry, and cost (~$50/mo vs $122-200 alternatives). E2B as
future secondary for memory-preserving hibernation. LocalProvider
fallback wrapping existing OpenCode container.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace coder subagent with sandbox-coder CLI binary that delegates
coding work to an isolated Daytona sandbox. The binary uses cwd
(must be a worktree), calls remote-cli /sandbox/exec which manages
sandbox lifecycle and source sync. Thinker subagent stays for local
planning and review. Sandbox identity deferred to later phases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Incorporate 12 architecture decisions from eng review: SandboxProvider
interface, Daytona sessions for resilient execution, per-worktree
locking, label-based orphan reconciliation, git-diff partial sync,
snapshot warm starts, --reconnect/--pull subcommands, and per-phase
unit test requirements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create docker/opencode/bin/sandbox-coder shell wrapper
- Add COPY to Dockerfile for the new binary
- Add POST /exec/sandbox-coder stub route with NDJSON streaming
- Add validateSandboxCwd (worktrees-only) and validateSandboxCoderArgs
  (prompt, --reconnect, --pull subcommands) to policy.ts
- Add 12 unit tests for sandbox-coder policy validation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add @daytonaio/sdk dependency
- Create SandboxProvider interface with DaytonaSandboxProvider
  implementation (D6: provider abstraction from day 1)
- Create SandboxManager with per-worktree locking (D9), label-based
  orphan reconciliation on startup (D8), and destroy lifecycle
- Wire manager into index.ts: getOrCreate in sandbox-coder route,
  destroy hook on git worktree remove, reconcile before app.listen
- Add 11 unit tests with mock provider covering: cache hits,
  concurrent dedup, destroy, reconcile (orphans + restore + errors)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…se 3)

- syncIn: full tar on first call, git-diff partial sync on repeat (D15)
- syncOut: download changed files, detect+handle deletes via git status
- Fail loud on download errors (D14) — caller can use --pull to recover
- resetSyncState helper for error recovery
- 7 unit tests covering full sync, partial sync, deletes, nested dirs,
  download failures, and empty worktrees

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace stub handler with full pipeline: getOrCreate → syncIn →
  createSession → execSessionCommand → streamLogs → syncOut
- Implement --reconnect: resume streaming from existing Daytona session,
  then syncOut (D7, D12)
- Implement --pull: syncOut only for file recovery (D12, D14)
- Emit [sandbox:session] early for reconnection support
- Emit [sandbox:done] with files_changed and files_deleted counts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… dead code cleanup

- Add path traversal protection in syncOut (safeResolvePath boundary check)
- Add tar `--` terminator to prevent option injection from filenames
- Check exit codes for all sandbox executeCommand calls (D14: fail loud)
- Use file-based agent exit code (cmdInfo) instead of unreliable async result
- Remove dead code: unused `session` var, unused `logError` import
- Fix pr merge test: move to allowed commands (matches allowlist)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove in-memory sandbox ID cache and reconcile logic. The manager now
queries Daytona by labels on every lookup, eliminating stale state after
restarts or external cleanup. Only the creation dedup lock remains in memory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use single-quote shell escaping for agent prompts to prevent shell
  metachar expansion ($(), backticks) inside Daytona sessions
- Consolidate WORKTREE_PREFIX and WORKTREES_ONLY_PREFIX into single
  WORKTREES_PREFIX constant used by both git worktree and sandbox-coder
  validation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add sandbox/setup.ts: uploads opencode.json (no MCP, permission: allow)
  and auth.json into sandbox so OpenCode can authenticate and run
- Add snapshot-first creation in SandboxManager: tries named snapshot
  for warm starts, falls back to bare image if unavailable (D15)
- Add createSnapshot/getSnapshot to SandboxProvider interface
- Mount opencode data dir into remote-cli container (read-only) for
  auth.json access
- Wire Daytona env vars (API key, URL, target) into docker-compose

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the /tmp/.sandbox-exit-code cat hack and use the proper
getSessionCommand API to retrieve the agent's exit code after
log streaming completes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Snapshots are premature — need Daytona running first. Removes
createSnapshot/getSnapshot from the provider interface, snapshot
fallback logic from SandboxManager.doCreate, and related tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Drop apiUrl and target config — not needed. Make DAYTONA_API_KEY
required at startup with a clear error message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The base image (node:22-slim) doesn't have opencode installed.
Setup now runs npm install -g opencode-ai if the binary isn't found.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Default sandbox image to daytona-medium (configurable via SANDBOX_IMAGE env var)
- Remove runtime opencode install from setup (pre-installed in snapshot)
- Fix sandbox paths to /home/daytona (matching daytona user in snapshot)
- Update sync workdir and agent prompt to correct paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add snapshot support to provider create() — uses Daytona snapshot API
- Default to daytona-medium snapshot (configurable via SANDBOX_SNAPSHOT env var)
- Remove opencode install from setup (pre-installed in snapshot)
- Fix sandbox paths to /home/daytona (matching daytona user uid 1001)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Session-based log streaming via WebSocket was hanging silently.
Switch to synchronous executeCommand which waits for completion
and returns output directly. Session logic retained for --reconnect.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…agent

- Use createPty for real-time streaming of opencode output (sessions hang)
- Parse JSON lines from PTY output, extract opencode sessionID
- Detect completion via step_finish event, timeout after 1 hour
- Replace --reconnect with --session <id> for opencode session continuity
- Strip ANSI escape sequences from PTY output before JSON parsing
- Update all test mocks to match new provider interface

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… config

The code in index.ts throws if DAYTONA_API_KEY is empty, but compose
defaulted it to empty string (:-), silently crashing remote-cli and
taking down git/gh/scoutqa endpoints. Use :? to fail at container start.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three bugs fixed:

1. syncInFull: exclude .git from tar (worktree pointer file references
   host paths that don't exist in sandbox), init a standalone repo in
   sandbox so git commands work correctly.

2. syncInPartial: use --diff-filter to separate changed files (for tar)
   from deleted files (for rm in sandbox). Re-commit in sandbox after
   each partial sync so HEAD stays current.

3. syncOut: filter deleted files out of the download list before
   iterating — git diff --name-only includes deletions which would
   cause downloadFile to throw before the delete loop runs.

Added file-level logging to both syncInPartial and syncOut.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ustom agent

- Install opencode-ai@1.2.27 via npm during setup (once per sandbox,
  tracked with a Set to skip repeat calls)
- Remove custom coder agent prompt — use the default agent instead
- Add --model flag to opencode command, defaulting to
  openai/gpt-5.3-codex-spark (configurable via SANDBOX_MODEL env var)
- Remove model field from opencode.json config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ndbox

Recursively removes any field whose key contains "refresh"
(case-insensitive) from the auth JSON before uploading to the sandbox.
Prevents the remote opencode from refreshing tokens and invalidating
the main opencode's credentials.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tokens expire, so auth must be refreshed each time. Split into:
- setupSandboxOpenCode: one-time install + config (once per sandbox)
- uploadSandboxAuth: fresh auth credentials (every prompt)

Also changed AUTH_JSON_PATH from a module-level const to a function
so env var overrides work at call time (fixes test reliability).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document decisions made during implementation: PTY streaming over
sessions, daytona-medium snapshot, per-prompt auth upload, no
in-memory cache, and bidirectional file sync. Update architecture
diagram to match current state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The daytona-medium snapshot uses nvm-managed node, so sudo resets PATH
and cannot find npm. Use sudo "$(which npm)" to resolve the full path
before elevating. Also capture and log npm output on install failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move git user.email/user.name config to one-time setup (--global persists
for sandbox lifetime) so all git commands work. Add error output capture
to syncInFull so failures are diagnosable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
daohoangson and others added 4 commits March 20, 2026 07:09
….json

Removing refresh fields entirely broke auth parsing. Keep the keys but
set string values to "" and numeric values to 0, preventing the sandbox
from refreshing tokens while preserving the expected auth.json shape.
Also fix pre-existing test bug matching $(which npm) command.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ming

Non-JSON PTY output (stacktraces, INFO lines, errors) was silently
discarded, making sandbox agent crashes invisible. Now logs all non-JSON
lines as warnings, detects early process exit via shell prompt return,
and surfaces stderrTail in AgentStreamResult for CLI output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The daytona-medium snapshot may have opencode at /usr/bin/opencode while
npm installs to /usr/local/share/nvm/.../bin/opencode, causing the old
binary to shadow the new one. Remove whichever exists before installing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tall

The daytona-medium snapshot ships a broken /usr/bin/corepack that blocks
npm global installs. Remove it and reinstall via npm alongside the stale
opencode binary cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
daohoangson added a commit that referenced this pull request Apr 5, 2026
Add resolveCorrelationKeys() to the gateway /cron path — previously only
Slack and GitHub events resolved aliases, so cron triggers with alias keys
(e.g. git:branch:repo:branch) would create new sessions instead of resuming.

Add e2e test section 8 (alias-based session matching):
- Trigger #1 runs git worktree add → alias registered via [thor:meta]
- Trigger #2 via gateway /cron uses alias as correlationKey
- Verifies gateway resolves alias, runner resumes same session, and
  agent recalls context from trigger #1

Also fix approval e2e parser to check stderr for [thor:meta] format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants