(Architect • Planner • Coder • Reviewer)
This repository contains the official golden fixture kit for all four agents in the Swarm autonomous software-engineering system:
- architect – high-level system design
- planner – deterministic task decomposition
- coder – minimal, unified-diff patch generation
- reviewer – structured code reviews with blocking/non-blocking comments
Fixtures in this repo define truth, ensure schema discipline, and guarantee agent-to-agent compatibility across the entire pipeline.
This kit allows any contributor—internal or external—to:
- Write deterministic, spec-valid fixtures
- Verify that fixtures match the locked agent schemas
- Test completeness and correctness with
npm run verify - Refresh expected outputs safely via snapshot mode
- Add new tasks/topics without touching any runner logic
- Run a fast golden path subset via
--goldenfor CI and sanity checks
Fixtures enforce:
- Schema correctness (ArchitectSpec, PlannerOutput, CoderOutput, ReviewerOutput)
- Determinism (same inputs → same outputs)
- Forbidden-path hygiene (no
dist/,.swarm/,node_modules/, etc.) - Non-fabrication (no invented APIs, files, or metadata)
- Semantic correctness (refactor-only constraints, atomic patches, task graphs)
- Cross-agent interoperability (architect → planner → coder → reviewer)
This is the unified contract for all agents.
All fixtures live under:
fixtures/<topic>/<task-id-descriptive-name>/
architect/
prompt.md
expected.json
verify.ts
repo/...
planner/
prompt.md
expected.json
verify.ts
repo/...
coder/
prompt.md
expected.patch
verify.ts
repo/...
reviewer/
prompt.md
expected.json
verify.ts
repo/...
- Filenames are intentionally generic (
prompt.md,expected.json,expected.patch,verify.ts). repo/is optional and contains only the minimal source context needed for the task.- Agent folder names define the agent; no additional naming conventions are required.
This command:
- Discovers all fixtures automatically
- Loads each agent’s expected output
- Validates it against the correct Zod schema
- Runs the agent’s
verify.tsusingactual === expected(bootstrap self-test mode)
npm run verifyYou should see output like:
[run-verify] BOOTSTRAP MODE ACTIVE – using expected outputs as actuals.
zero-change/task-001-is-even/architect OK
zero-change/task-001-is-even/planner OK
zero-change/task-001-is-even/coder OK
zero-change/task-001-is-even/reviewer OK
If anything violates the agent schema or scenario logic, it will fail with a clear reason.
Once real agents are wired into the runner, you can require actual execution instead of bootstrap mode:
npm run verify -- --strict-real-agentsIn this mode, the harness will fail unless getActualOutput(...) is implemented to call real agents.
You can run fixtures in parallel batches:
npm run verify -- --concurrency 8If omitted, a sensible default is used.
When you intentionally improve prompts or expected outputs:
npm run verify -- --updateThis regenerates each expected.json / expected.patch as the new golden snapshot.
You can combine this with other flags, for example:
npm run verify -- --update --goldento refresh only the golden-path fixtures.
You can define a small, curated set of fixtures as a golden path for fast checks and CI stability.
Golden fixtures are configured in:
golden-fixtures.config.json
Example:
{
"fixtures": [
"zero-change/task-001-is-even/planner",
"single-file/task-101-single-file-low-complexity/planner",
"ambiguity/task-501-unclear-requirements/planner"
]
}To run only these fixtures:
npm run verify -- --goldenYou can also combine golden mode with filters or concurrency, e.g.:
npm run verify -- --golden --concurrency 8For each agent folder (architect/planner/coder/reviewer):
-
Write
prompt.mdWhat that agent should receive—no more, no less. -
Write
expected.jsonorexpected.patchMust match the official schemas exported by this kit. -
Write
verify.tsA thin wrapper around shared helpers:- schema validation (via Zod)
- semantic checks (e.g., “no new features”, “single low-complexity task”, “atomic patch”)
- forbidden-path safety
-
Add a
repo/folder only if your scenario requires source context. -
Run:
npm run verifyIf adding a new task or topic, nothing else needs updating—discovery is automatic.
Every verify.ts receives:
{
taskDir: string,
actual: any,
expected: any
}and must enforce:
-
Schema discipline Output must match the locked agent schema.
-
Determinism No randomization, timestamps, or unstable ordering.
-
Forbidden-path hygiene No patches or plans touching
dist/,build/,.swarm/,.git/,node_modules/, etc. -
Non-fabrication No invented APIs, tests, behaviors, paths, or metadata.
-
Semantic correctness Behavior must follow the scenario’s contract (e.g. refactor-only, multi-hunk atomicity, backup rules).
Return example:
{
ok: true;
}or
{ ok: false, reason: "bad complexity value" }All verify.ts files must import fixture helpers from the source tree, using this exact relative path:
import {
verifyArchitect,
verifyPlanner,
verifyCoder,
verifyReviewer,
type VerifyCtx,
type VerifyResult
} from "../../../../src/fixture-helpers";Do NOT import from dist/ and do NOT change the relative depth.
Every verify file lives four directories below project root, so this path is always correct.
-
Correctness > convenience
-
Schemas are versioned contracts (Add fields as optional; avoid breaking changes.)
-
Determinism is non-negotiable Output must not depend on environment or ordering.
-
Honesty Models cannot hallucinate structure, APIs, metadata, or files.
-
Composability All agents interoperate cleanly: architect → planner → coder → reviewer → swarm
This suite is the baseline for multi-agent evaluation and integration.
To add a new scenario:
fixtures/<topic>/task-XYZ-name/
architect/
planner/
coder/
reviewer/
Each folder requires:
prompt.mdexpected.jsonorexpected.patchverify.ts- (optional)
repo/
Then run:
npm run verifyIf all pass, your scenario is valid.
git clone <repo>
npm install
npm run verify # run all fixtures (bootstrap mode)
npm run verify -- --golden # run curated golden-path fixtures only
npm run verify -- --concurrency 8 # run all fixtures with higher concurrency
npm run verify -- --strict-real-agents # require real agent execution
npm run verify -- --update # refresh goldens
# add new tasks under fixtures/...
npm run verify # all tasks auto-discoveredFixtures in this repo use golden outputs (expected.json / expected.patch) that represent the correct result for each scenario. Over time, these goldens can become outdated when we intentionally improve schemas, prompts, or agent contracts. When that happens, running npm run verify will fail across many fixtures—not because the fixtures are wrong, but because the spec evolved.
Instead of editing dozens or hundreds of files by hand, contributors use:
npm run verify -- --updateThis command automatically regenerates each fixture’s expected.* file using the new schema and normalization rules. It updates only what legitimately changed and keeps everything consistent with the latest contract. After running it, npm run verify will pass again.
Think of it like Jest’s snapshot updates:
You write a fixture once, and snapshot mode keeps it healthy whenever the spec evolves.
Documentation-only or comment-only patches are explicitly permitted when the architect clearly requests documentation improvements (e.g., TSDoc, README updates, inline comments). Such patches remain subject to all other rules: minimal, atomic, no forbidden paths, and no runtime behavior changes.
When a task requires modifying configuration, environment, workflow, or other normally-forbidden files, the architect MUST:
-
Explicitly list every configuration or non-source file that is permitted to be modified for this task (e.g., .github/workflows/ci.yml, config/staging.json, migrations/001-add-users.sql).
-
Reaffirm that all other configuration, environment, or non-source files remain forbidden. No sibling files or directories are implicitly allowed.
This explicit-file-whitelist requirement ensures the planner, coder, and reviewer operate with a deterministic and safe scope, preventing accidental or speculative changes outside the architect’s intent.