Experimental comparison of Claude Code with Agent Teams (symmetric peers) vs single-agent paradigm for feature implementation on LangGraph.
Key finding: Ceiling effect — all treatments score 104/104 on tiered acceptance tests and 14/16 on post-hoc robustness tests. Zero peer-to-peer communication. Agent Teams provides a 3.6x wall-clock speedup (28 min solo vs 8 min with 8 agents) with no quality improvement. Parallelism, not collaboration.
Second in the ate-series. Predecessor: ate (bug-fixing in Ruff). Successor: ate-arch (architecture design — first significant result).
| Treatment | T1–T4 Score | T5 Robustness | Wall Clock | Peer Messages |
|---|---|---|---|---|
| 0a (solo) | 104/104 | 14/16 | 28 min | N/A |
| 1 (4-agent, 2 features each) | 104/104 | 14/16 | 12 min | 0 |
| 5 (8-agent, 1 feature each) | 104/104 | 14/16 | 8 min | 0 |
See findings for full analysis and experiment-design.md for the protocol.
uv sync --group dev
uv run ate-features exec status
uv run ate-features score show- Claude Code — agentic coding tool
- Agent Teams — multi-agent collaboration feature under study
- Subagents — the default hub-and-spoke delegation mechanism
make test # Unit tests (238)
make test-acceptance # Acceptance tests (104, requires LangGraph setup)
make lint # Ruff linter
make typecheck # mypy strict