diff --git a/README.md b/README.md index 19a7f10..2db0c7b 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,13 @@ # **Fixture Kit — Unified Contract & Contributor Guide** -_(Architect • Planner • Coder • Reviewer)_ +*(Architect • Planner • Coder • Reviewer)* This repository contains the **official golden fixture kit** for all four agents in the Swarm autonomous software-engineering system: -- **architect** – high-level system design -- **planner** – deterministic task decomposition -- **coder** – minimal, unified-diff patch generation -- **reviewer** – structured code reviews with blocking/non-blocking comments +* **architect** – high-level system design +* **planner** – deterministic task decomposition +* **coder** – minimal, unified-diff patch generation +* **reviewer** – structured code reviews with blocking/non-blocking comments Fixtures in this repo define **truth**, ensure **schema discipline**, and guarantee **agent-to-agent compatibility** across the entire pipeline. @@ -17,20 +17,21 @@ Fixtures in this repo define **truth**, ensure **schema discipline**, and guaran This kit allows **any contributor**—internal or external—to: -- Write deterministic, spec-valid fixtures -- Verify that fixtures match the **locked agent schemas** -- Test completeness and correctness with `npm run verify` -- Refresh expected outputs safely via snapshot mode -- Add new tasks/topics without touching any runner logic +* Write deterministic, spec-valid fixtures +* Verify that fixtures match the **locked agent schemas** +* Test completeness and correctness with `npm run verify` +* Refresh expected outputs safely via snapshot mode +* Add new tasks/topics without touching any runner logic +* Run a fast **golden path** subset via `--golden` for CI and sanity checks Fixtures enforce: -- **Schema correctness** (ArchitectSpec, PlannerOutput, CoderOutput, ReviewerOutput) -- **Determinism** (same inputs → same outputs) -- **Forbidden-path hygiene** (no `dist/`, `.swarm/`, `node_modules/`, etc.) -- **Non-fabrication** (no invented APIs, files, or metadata) -- **Semantic correctness** (refactor-only constraints, atomic patches, task graphs) -- **Cross-agent interoperability** (architect → planner → coder → reviewer) +* **Schema correctness** (ArchitectSpec, PlannerOutput, CoderOutput, ReviewerOutput) +* **Determinism** (same inputs → same outputs) +* **Forbidden-path hygiene** (no `dist/`, `.swarm/`, `node_modules/`, etc.) +* **Non-fabrication** (no invented APIs, files, or metadata) +* **Semantic correctness** (refactor-only constraints, atomic patches, task graphs) +* **Cross-agent interoperability** (architect → planner → coder → reviewer) This is the **unified contract** for all agents. @@ -66,9 +67,9 @@ fixtures/// ### Rules -- Filenames are intentionally generic (`prompt.md`, `expected.json`, `expected.patch`, `verify.ts`). -- `repo/` is optional and contains only the minimal source context needed for the task. -- Agent folder names define the agent; no additional naming conventions are required. +* Filenames are intentionally generic (`prompt.md`, `expected.json`, `expected.patch`, `verify.ts`). +* `repo/` is optional and contains only the minimal source context needed for the task. +* Agent folder names define the agent; no additional naming conventions are required. --- @@ -76,10 +77,10 @@ fixtures/// This command: -- Discovers all fixtures automatically -- Loads each agent’s expected output -- Validates it against the correct Zod schema -- Runs the agent’s `verify.ts` using `actual === expected` (self-test) +* Discovers all fixtures automatically +* Loads each agent’s expected output +* Validates it against the correct Zod schema +* Runs the agent’s `verify.ts` using `actual === expected` (bootstrap self-test mode) ```bash npm run verify @@ -87,7 +88,8 @@ npm run verify You should see output like: -``` +```text +[run-verify] BOOTSTRAP MODE ACTIVE – using expected outputs as actuals. zero-change/task-001-is-even/architect OK zero-change/task-001-is-even/planner OK zero-change/task-001-is-even/coder OK @@ -96,6 +98,26 @@ zero-change/task-001-is-even/reviewer OK If anything violates the agent schema or scenario logic, it will fail with a clear reason. +## **Strict real-agent mode (optional)** + +Once real agents are wired into the runner, you can require actual execution instead of bootstrap mode: + +```bash +npm run verify -- --strict-real-agents +``` + +In this mode, the harness will *fail* unless `getActualOutput(...)` is implemented to call real agents. + +## **Concurrency (optional)** + +You can run fixtures in parallel batches: + +```bash +npm run verify -- --concurrency 8 +``` + +If omitted, a sensible default is used. + --- ## **Snapshot Mode (optional)** @@ -108,6 +130,50 @@ npm run verify -- --update This regenerates each `expected.json` / `expected.patch` as the new golden snapshot. +You can combine this with other flags, for example: + +```bash +npm run verify -- --update --golden +``` + +to refresh only the golden-path fixtures. + +--- + +## **Golden Path Mode (optional)** + +You can define a small, curated set of fixtures as a **golden path** for fast checks and CI stability. + +Golden fixtures are configured in: + +```text +golden-fixtures.config.json +``` + +Example: + +```json +{ + "fixtures": [ + "zero-change/task-001-is-even/planner", + "single-file/task-101-single-file-low-complexity/planner", + "ambiguity/task-501-unclear-requirements/planner" + ] +} +``` + +To run only these fixtures: + +```bash +npm run verify -- --golden +``` + +You can also combine golden mode with filters or concurrency, e.g.: + +```bash +npm run verify -- --golden --concurrency 8 +``` + --- # **4. Authoring New Fixtures** @@ -123,9 +189,9 @@ For each agent folder (architect/planner/coder/reviewer): 3. **Write `verify.ts`** A thin wrapper around shared helpers: - - schema validation (via Zod) - - semantic checks (e.g., “no new features”, “single low-complexity task”, “atomic patch”) - - forbidden-path safety + * schema validation (via Zod) + * semantic checks (e.g., “no new features”, “single low-complexity task”, “atomic patch”) + * forbidden-path safety 4. **Add a `repo/` folder** only if your scenario requires source context. @@ -153,19 +219,19 @@ Every `verify.ts` receives: and must enforce: -- **Schema discipline** +* **Schema discipline** Output must match the locked agent schema. -- **Determinism** +* **Determinism** No randomization, timestamps, or unstable ordering. -- **Forbidden-path hygiene** +* **Forbidden-path hygiene** No patches or plans touching `dist/`, `build/`, `.swarm/`, `.git/`, `node_modules/`, etc. -- **Non-fabrication** +* **Non-fabrication** No invented APIs, tests, behaviors, paths, or metadata. -- **Semantic correctness** +* **Semantic correctness** Behavior must follow the scenario’s contract (e.g. refactor-only, multi-hunk atomicity, backup rules). Return example: @@ -204,22 +270,22 @@ Every verify file lives four directories below project root, so this path is alw # **6. Philosophy of This Suite** -- **Correctness** > convenience +* **Correctness** > convenience -- **Schemas are versioned contracts** +* **Schemas are versioned contracts** (Add fields as optional; avoid breaking changes.) -- **Determinism is non-negotiable** +* **Determinism is non-negotiable** Output must not depend on environment or ordering. -- **Honesty** +* **Honesty** Models cannot hallucinate structure, APIs, metadata, or files. -- **Composability** +* **Composability** All agents interoperate cleanly: architect → planner → coder → reviewer → swarm -This suite is the _baseline for multi-agent evaluation and integration_. +This suite is the *baseline for multi-agent evaluation and integration*. --- @@ -237,10 +303,10 @@ fixtures//task-XYZ-name/ Each folder requires: -- `prompt.md` -- `expected.json` or `expected.patch` -- `verify.ts` -- (optional) `repo/` +* `prompt.md` +* `expected.json` or `expected.patch` +* `verify.ts` +* (optional) `repo/` Then run: @@ -254,18 +320,21 @@ If all pass, your scenario is valid. # **8. TL;DR for Contributors** -``` +```bash git clone npm install -npm run verify # run all fixtures -npm run verify -- --update # refresh goldens +npm run verify # run all fixtures (bootstrap mode) +npm run verify -- --golden # run curated golden-path fixtures only +npm run verify -- --concurrency 8 # run all fixtures with higher concurrency +npm run verify -- --strict-real-agents # require real agent execution +npm run verify -- --update # refresh goldens # add new tasks under fixtures/... -npm run verify # all tasks auto-discovered +npm run verify # all tasks auto-discovered ``` ## **Why `npm run verify -- --update` Exists (for Contributors)** -Fixtures in this repo use **golden outputs** (`expected.json` / `expected.patch`) that represent the _correct_ result for each scenario. Over time, these goldens can become **outdated** when we intentionally improve schemas, prompts, or agent contracts. When that happens, running `npm run verify` will fail across many fixtures—not because the fixtures are wrong, but because the **spec evolved**. +Fixtures in this repo use **golden outputs** (`expected.json` / `expected.patch`) that represent the *correct* result for each scenario. Over time, these goldens can become **outdated** when we intentionally improve schemas, prompts, or agent contracts. When that happens, running `npm run verify` will fail across many fixtures—not because the fixtures are wrong, but because the **spec evolved**. Instead of editing dozens or hundreds of files by hand, contributors use: @@ -277,25 +346,29 @@ This command automatically regenerates each fixture’s `expected.*` file using Think of it like Jest’s snapshot updates: -> _You write a fixture once, and snapshot mode keeps it healthy whenever the spec evolves._ +> *You write a fixture once, and snapshot mode keeps it healthy whenever the spec evolves.* # Exceptions + ## Documentation-Only Patch Rule ------------------------------ -Documentation-only or comment-only patches are explicitly permitted when the architect clearly requests documentation improvements (e.g., TSDoc, README updates, inline comments). Such patches remain subject to all other rules: minimal, atomic, no forbidden paths, and no runtime behavior changes. +--- + +Documentation-only or comment-only patches are explicitly permitted when the architect clearly requests documentation improvements (e.g., TSDoc, README updates, inline comments). Such patches remain subject to all other rules: minimal, atomic, no forbidden paths, and no runtime behavior changes. ## Configuration & Non-Source File Safety Rule -------------------------------------------- + +--- + When a task requires modifying configuration, environment, workflow, or other normally-forbidden files, the architect MUST: - 1. Explicitly list *every* configuration or non-source file that is permitted - to be modified for this task (e.g., .github/workflows/ci.yml, - config/staging.json, migrations/001-add-users.sql). +1. Explicitly list *every* configuration or non-source file that is permitted + to be modified for this task (e.g., .github/workflows/ci.yml, + config/staging.json, migrations/001-add-users.sql). - 2. Reaffirm that all other configuration, environment, or non-source files - remain forbidden. No sibling files or directories are implicitly allowed. +2. Reaffirm that all other configuration, environment, or non-source files + remain forbidden. No sibling files or directories are implicitly allowed. This explicit-file-whitelist requirement ensures the planner, coder, and reviewer operate with a deterministic and safe scope, preventing accidental or speculative -changes outside the architect’s intent. \ No newline at end of file +changes outside the architect’s intent. diff --git a/golden-fixtures.config.json b/golden-fixtures.config.json new file mode 100644 index 0000000..a5fbc08 --- /dev/null +++ b/golden-fixtures.config.json @@ -0,0 +1,21 @@ +{ + "fixtures": [ + "zero-change/task-001-is-even/planner", + "zero-change/task-010-no-op/planner", + "zero-change/task-000-impossible-requirements/planner", + + "single-file/task-101-single-file-low-complexity/planner", + + + "chains-and-small-dags/task-201-deep-acyclic-chain/planner", + "chains-and-small-dags/task-203-circular-dependency-trap/planner", + "chains-and-small-dags/task-204-max-tasks-and-complexity-caps/planner", + + "core-semantics-and-tests/task-300-basic-two-file-decomp/planner", + "core-semantics-and-tests/task-301-forbidden-paths-filter/planner", + "core-semantics-and-tests/task-302-mixed-types-and-complexities/planner", + + "ambiguity/task-501-unclear-requirements/planner", + "ambiguity/task-502-conflicting-constraints/planner" + ] +} diff --git a/instructions.md b/instructions.md index a8e3f8a..45a1bb6 100644 --- a/instructions.md +++ b/instructions.md @@ -43,29 +43,28 @@ fixtures/ * `` and `` come directly from the master JSON. * Adding a readable slug (`.../task-001-is-even/`) is recommended. * `repo/` is included only when a scenario requires source files. -* fixtures/zero-change/task-001-is-even/* is already completed and serves as a template for new tasks. -* Never try to update src/schemas/*. +* `fixtures/zero-change/task-001-is-even/*` is already completed and serves as a template for new tasks. +* Never try to update `src/schemas/*`. +--- ## Documentation-Only Patch Rule ------------------------------ + Documentation-only or comment-only patches are explicitly permitted when the architect clearly requests documentation improvements (e.g., TSDoc, README updates, inline comments). Such patches remain subject to all other rules: minimal, atomic, no forbidden paths, and no runtime behavior changes. +--- ## Configuration & Non-Source File Safety Rule -------------------------------------------- + When a task requires modifying configuration, environment, workflow, or other normally-forbidden files, the architect MUST: - 1. Explicitly list *every* configuration or non-source file that is permitted - to be modified for this task (e.g., .github/workflows/ci.yml, - config/staging.json, migrations/001-add-users.sql). +1. Explicitly list *every* configuration or non-source file that is permitted + (e.g., `.github/workflows/ci.yml`, `config/staging.json`, `migrations/001-add-users.sql`). - 2. Reaffirm that all other configuration, environment, or non-source files - remain forbidden. No sibling files or directories are implicitly allowed. +2. Reaffirm that **all other** configuration, environment, or non-source files remain forbidden. + No sibling files or directories are implicitly allowed. -This explicit-file-whitelist requirement ensures the planner, coder, and reviewer -operate with a deterministic and safe scope, preventing accidental or speculative -changes outside the architect’s intent. +This explicit-file-whitelist requirement ensures the planner, coder, and reviewer operate with a deterministic and safe scope, preventing accidental or speculative changes outside the architect’s intent. --- @@ -86,8 +85,14 @@ changes outside the architect’s intent. Swap the helper depending on the agent: ```ts -import type { VerifyCtx, VerifyResult } from "test-fixtures/fixture-helpers"; -import { verifyArchitect } from "test-fixtures/fixture-helpers"; // or verifyPlanner, verifyCoder, verifyReviewer +import { + verifyArchitect, + verifyPlanner, + verifyCoder, + verifyReviewer, + type VerifyCtx, + type VerifyResult +} from "../../../../src/fixture-helpers"; export function verify(ctx: VerifyCtx): VerifyResult { return verifyArchitect(ctx, (parsed, ctx) => { @@ -112,13 +117,11 @@ git checkout -b fixtures// # create the prompts, expected outputs, verify.ts, and repo/ if required -npm run verify # fails until all four agents are implemented -npm run verify -- --update # only when confident the goldens are correct - -git add . -git commit -m "feat(fixtures): add / full fixture suite" -git push -u origin HEAD -# open PR; CI runs "npm run ci" +npm run verify # runs in bootstrap mode (expected == actual) +npm run verify -- --update # only when confident the goldens are correct +npm run verify -- --strict-real-agents # if real agents are wired in +npm run verify -- --golden # run curated golden fixtures only +npm run verify -- --concurrency 8 # optional parallel execution ``` **Important:** @@ -159,9 +162,10 @@ Use `--update` only when intentionally regenerating goldens (e.g., after updatin * Must conform to `reviewerOutputSchema`. * Every comment must include: - ```json - "blocking": true | false - ``` +```json +"blocking": true | false +``` + * Comments must be grounded in actual patch lines. --- @@ -184,11 +188,6 @@ fixtures/zero-change/task-001-is-even/ reviewer/verify.ts ``` -* Architect defines a tiny utility. -* Planner emits 1 low-complexity fix task. -* Coder outputs minimal patch correcting logic. -* Reviewer approves with grounded comments. - Running: ```bash @@ -198,6 +197,7 @@ npm run verify produces: ``` +[run-verify] BOOTSTRAP MODE ACTIVE – using expected outputs as actuals. zero-change/task-001-is-even/architect OK zero-change/task-001-is-even/planner OK zero-change/task-001-is-even/coder OK @@ -214,13 +214,13 @@ A full run of: npm run verify ``` -should produce one `OK` line for every `(topic × task × agent)` combination. +should produce one `OK` line for every `(topic × task × agent)` combination — or fewer lines when using `--golden`. When all are green, the fixture suite fully covers the entire JSON roadmap with deterministic, schema-valid, scenario-correct goldens. --- -# 8. Fixture Verify File Import Rule +# **8. Fixture Verify File Import Rule** **All `verify.ts` files must import fixture helpers from the *source* tree, using this exact relative path:** @@ -235,5 +235,6 @@ import { } from "../../../../src/fixture-helpers"; ``` -**Do NOT import from `dist/`** and do NOT change the relative depth. -Every verify file lives four directories below project root, so this path is always correct. \ No newline at end of file +**Do NOT import from `dist/`** +and do NOT change the relative depth. +Every verify file lives four directories below project root, so this path is always correct. diff --git a/src/run-verify.ts b/src/run-verify.ts index 3eb8fdf..1e7c9b2 100644 --- a/src/run-verify.ts +++ b/src/run-verify.ts @@ -11,6 +11,9 @@ const ROOT = path.resolve(__dirname, ".."); const FIXTURES_ROOT = path.join(ROOT, "fixtures"); const AGENTS: AgentKind[] = ["architect", "planner", "coder", "reviewer"]; +const GOLDEN_CONFIG_FILENAME = "golden-fixtures.config.json"; +const DEFAULT_CONCURRENCY = 4; + type FixtureAgentDir = { topic: string; task: string; @@ -18,31 +21,123 @@ type FixtureAgentDir = { dir: string; }; -// Parse CLI arguments correctly and forever -const args = process.argv.slice(2); -const UPDATE_MODE = args.includes("--update"); -const HELP_MODE = args.includes("--help") || args.includes("-h"); +type CliOptions = { + update: boolean; + help: boolean; + golden: boolean; + strictRealAgents: boolean; + filter: string | null; + concurrency: number | null; +}; + +function parseArgs(rawArgs: string[]): CliOptions { + let update = false; + let help = false; + let golden = false; + let strictRealAgents = false; + const filters: string[] = []; + let concurrency: number | null = null; + + for (let i = 0; i < rawArgs.length; i++) { + const arg = rawArgs[i]; + + if (arg === "--update") { + update = true; + } else if (arg === "--help" || arg === "-h") { + help = true; + } else if (arg === "--golden") { + golden = true; + } else if (arg === "--strict-real-agents") { + strictRealAgents = true; + } else if (arg === "--concurrency") { + const next = rawArgs[i + 1]; + if (!next || next.startsWith("-")) { + console.error("Missing value for --concurrency. Example: --concurrency 4"); + process.exit(1); + } + const parsed = Number.parseInt(next, 10); + if (!Number.isFinite(parsed) || parsed <= 0) { + console.error(`Invalid concurrency value: ${next}`); + process.exit(1); + } + concurrency = parsed; + i++; // consume value + } else if (arg.startsWith("--concurrency=")) { + const value = arg.slice("--concurrency=".length); + const parsed = Number.parseInt(value, 10); + if (!Number.isFinite(parsed) || parsed <= 0) { + console.error(`Invalid concurrency value: ${value}`); + process.exit(1); + } + concurrency = parsed; + } else if (arg.startsWith("-")) { + console.error(`Unknown option: ${arg}`); + console.error("Run with --help to see available options."); + process.exit(1); + } else { + filters.push(arg); + } + } + + if (filters.length > 1) { + console.error( + `Too many positional arguments: ${filters.join(" ")}\n` + + "Use at most one filter substring, e.g.:\n" + + " node dist/run-verify.js planner\n" + + " node dist/run-verify.js task-2001" + ); + process.exit(1); + } -// First non-flag, non-update arg is treated as a simple substring filter -const filter = - args.find((arg) => arg !== "--update" && !arg.startsWith("-")) ?? null; + return { + update, + help, + golden, + strictRealAgents, + filter: filters[0] ?? null, + concurrency, + }; +} + +const parsed = parseArgs(process.argv.slice(2)); +const UPDATE_MODE = parsed.update; +const HELP_MODE = parsed.help; +const GOLDEN_MODE = parsed.golden; +const STRICT_REAL_AGENTS_MODE = parsed.strictRealAgents; +const filter = parsed.filter; +const CONCURRENCY = parsed.concurrency ?? DEFAULT_CONCURRENCY; function printHelp() { console.log( [ "Usage:", - " node dist/run-verify.js [--update] [filter]", + " node dist/run-verify.js [options] [filter]", "", "Options:", - " --update Snapshot/update expected outputs from actuals", - " --help, -h Show this help message", + " --update Snapshot/update expected outputs from actuals", + " --golden Run only curated golden-path fixtures", + " --strict-real-agents Require real agent execution (no bootstrap mode)", + " --concurrency N Run up to N fixtures in parallel (default: 4)", + " You may also use --concurrency=N.", + " --help, -h Show this help message", "", "Arguments:", - " filter Optional substring filter applied to", - " '//' labels.", + " filter Optional substring filter applied to", + " '//' labels.", + "", + "Golden fixtures config:", + ` Golden mode reads labels from ${GOLDEN_CONFIG_FILENAME} at the project root.`, + ' Expected shape:', + ' { "fixtures": ["topic/task/agent", "..."] }', + "", + "Execution modes:", + " Default: use expected values as actuals (bootstrap mode).", + " This makes the harness usable before agent wiring.", + " --strict-real-agents: require real agent execution; the harness will", + " fail unless getActualOutput(...) is implemented.", "", "Examples:", - " # Run all fixtures", + " # Run all fixtures in bootstrap mode (expected as actual)", " node dist/run-verify.js", "", " # Run only planner fixtures", @@ -53,6 +148,15 @@ function printHelp() { "", " # Update snapshots for a single task", " node dist/run-verify.js --update task-2001", + "", + " # Run only golden-path fixtures (using golden-fixtures.config.json)", + " node dist/run-verify.js --golden", + "", + " # Run golden-path planner fixtures with higher concurrency", + " node dist/run-verify.js --golden --concurrency 8 planner", + "", + " # Strict mode (will require agent wiring in getActualOutput)", + " node dist/run-verify.js --strict-real-agents", ].join("\n") ); } @@ -126,6 +230,93 @@ function getExpectedPaths(agentDir: string, agent: AgentKind) { }; } +type GoldenConfig = { + fixtures: string[]; +}; + +function loadGoldenFixturesConfig(allLabels: Set): Set { + const configPath = path.join(ROOT, GOLDEN_CONFIG_FILENAME); + + if (!fs.existsSync(configPath)) { + console.error( + `Golden mode requested, but ${GOLDEN_CONFIG_FILENAME} was not found at:\n` + + ` ${configPath}\n` + + 'Expected shape:\n' + + ' { "fixtures": ["topic/task/agent", "..."] }' + ); + process.exit(1); + } + + let parsed: GoldenConfig; + try { + parsed = loadJson(configPath) as GoldenConfig; + } catch (err) { + console.error( + `Failed to parse ${GOLDEN_CONFIG_FILENAME} as JSON:\n ${String(err)}` + ); + process.exit(1); + } + + if ( + !parsed || + !Array.isArray(parsed.fixtures) || + parsed.fixtures.length === 0 + ) { + console.error( + `${GOLDEN_CONFIG_FILENAME} must have a non-empty "fixtures" array.\n` + + 'Example:\n' + + ' { "fixtures": ["topic/task/agent"] }' + ); + process.exit(1); + } + + const trimmed = parsed.fixtures.map((s) => s.trim()).filter(Boolean); + if (trimmed.length === 0) { + console.error( + `${GOLDEN_CONFIG_FILENAME} only contains empty/whitespace fixture labels.` + ); + process.exit(1); + } + + const missing = trimmed.filter((label) => !allLabels.has(label)); + if (missing.length > 0) { + console.error( + `${GOLDEN_CONFIG_FILENAME} references fixtures that do not exist:\n` + + missing.map((m) => ` - ${m}`).join("\n") + + "\n\n" + + "Ensure these labels match real paths under fixtures///." + ); + process.exit(1); + } + + return new Set(trimmed); +} + +type ActualRequest = { + topic: string; + task: string; + agent: AgentKind; + agentDir: string; + expected: unknown; +}; + +async function getActualOutput(req: ActualRequest): Promise { + if (!STRICT_REAL_AGENTS_MODE) { + // Default bootstrap mode: use expected as actual. + // This keeps the harness usable before agent wiring. + return req.expected; + } + + // Strict mode: require real agent execution. + // eslint-disable-next-line no-throw-literal + throw new Error( + `STRICT REAL AGENTS MODE ENABLED: no actual agent execution wired for ` + + `${req.topic}/${req.task}/${req.agent}.\n` + + "Implement getActualOutput(...) to invoke your agents, or run without\n" + + "--strict-real-agents to use bootstrap mode (expected as actual)." + ); +} + async function runOneFixtureAgent( topic: string, task: string, @@ -144,11 +335,16 @@ async function runOneFixtureAgent( return; } - const expected = kind === "json" ? loadJson(expectedPath) : loadText(expectedPath); + const expected = + kind === "json" ? (loadJson(expectedPath) as unknown) : loadText(expectedPath); - // Golden-master mode: for now, feed expected back in as actual. - // In a real system, "actual" would be the real agent output. - const actual = expected; + const actual = await getActualOutput({ + topic, + task, + agent, + agentDir, + expected, + }); const ctx: VerifyCtx = { taskDir: path.dirname(agentDir), @@ -191,38 +387,76 @@ function pathToFileUrl(p: string) { return new URL(`file://${resolved}`); } +async function runWithConcurrency( + fixtures: FixtureAgentDir[], + concurrency: number +): Promise { + const limit = + Number.isFinite(concurrency) && concurrency > 0 ? Math.floor(concurrency) : 1; + + for (let i = 0; i < fixtures.length; i += limit) { + const batch = fixtures.slice(i, i + limit); + await Promise.allSettled( + batch.map((f) => + runOneFixtureAgent(f.topic, f.task, f.agent, f.dir) + ) + ); + } +} + async function main() { if (HELP_MODE) { printHelp(); return; } + if (!STRICT_REAL_AGENTS_MODE) { + console.log( + "[run-verify] BOOTSTRAP MODE ACTIVE – using expected outputs as actuals.\n" + + "Pass --strict-real-agents once your agents are wired into getActualOutput(...)." + ); + } + const fixtures = discoverFixtureAgentDirs(); - if (fixtures.length === 0) { - console.warn("No fixtures found under fixtures///"); - return; - } + const allLabels = new Set( + fixtures.map((f) => `${f.topic}/${f.task}/${f.agent}`) + ); + + const goldenSet = GOLDEN_MODE ? loadGoldenFixturesConfig(allLabels) : null; - let ranCount = 0; + const selected: FixtureAgentDir[] = []; for (const f of fixtures) { const label = `${f.topic}/${f.task}/${f.agent}`; - // Simple, future-proof substring filtering on full label + if (GOLDEN_MODE && goldenSet && !goldenSet.has(label)) { + continue; + } + if (filter && !label.includes(filter)) { continue; } - // eslint-disable-next-line no-await-in-loop - await runOneFixtureAgent(f.topic, f.task, f.agent, f.dir); - ranCount++; + selected.push(f); } - if (filter && ranCount === 0) { - console.error(`No fixtures matched filter: ${filter}`); + if (selected.length === 0) { + console.error( + "No fixtures matched the current selection criteria.\n" + + ` golden: ${GOLDEN_MODE ? "on" : "off"}\n` + + ` filter: ${filter ?? ""}\n` + + ` total discovered fixtures: ${fixtures.length}\n` + + (GOLDEN_MODE + ? ` golden config: ${GOLDEN_CONFIG_FILENAME}\n` + : "") + + "Adjust your golden-fixtures.config.json, filter, or fixture set and retry." + ); process.exitCode = 1; + return; } + + await runWithConcurrency(selected, CONCURRENCY); } main().catch((err) => {