A parallel agent runtime for your terminal. Up to 20 AI agents. Zero infrastructure. Built-in quality scoring. Works with any CLI coding agent.
๐ฆฌ Try the demo โ no setup required!
git clone https://github.com/DUBSOpenHub/terminal-stampede.git cd terminal-stampede && ./install.shZero API calls. Just
tmuxandbash. See agents work in real time.![]()
A parallel agent runtime for your terminal. Run up to 20 AI coding agents simultaneously, each in its own tmux pane with its own context window and git branch. Works with any CLI agent that can take a prompt and write code. The sweet spot is 6โ8 agents (the default is 3; configurable with --count).
- ๐ Zero-infrastructure local swarm
- ๐ฅ๏ธ tmux as execution surface
- ๐ Filesystem as atomic message bus
- ๐ Human-in-the-loop observability
- ๐งฑ Simplicity over complexity โ no frameworks, no servers, no message brokers. The simpler the system, the more reliable the output.
- ๐ฏ Shadow scoring โ quality defined before agents run, measured silently after
You've been doing AI coding one task at a time. Ask, wait, ask again, wait again. Terminal Stampede splits your terminal into multiple panes, drops an AI agent into each one, and lets them all charge through your codebase simultaneously. Each agent gets its own brain, its own branch, its own mission. You watch them work in real time through the gold โก borders. Minutes later, everything's done.
Zero infrastructure. No Redis, no HTTP, no Docker, no cloud. Just files on disk and tmux.
Human in the loop, not after the fact. Every agent runs in a visible pane. Zoom in on any one, type into it, kill it, or just watch. Most multi-agent systems give you logs when it's over. This one puts you in the room while it's happening.
tmux is the runtime. Each pane is a full CLI agent session with its own context window. The filesystem is the message bus โ task claiming is an atomic file rename, no locks, no coordination server. Point it at any repo.
Works with any CLI agent. Built with GitHub Copilot CLI, but the pattern is tool-agnostic โ swap the agent command for Aider, Claude Code, or any CLI tool that can read a task and write code.
๐ Read the full story โ "What If You Could Run 20 AI Agents in One Terminal?" โ How Havoc Hackathon, Shadow Score, Dark Factory, and Agent X-Ray led to this experiment.
- macOS or Linux
tmux(brew install tmux)- A CLI coding agent (e.g., GitHub Copilot CLI, Aider, Claude Code)
python3,jq,openssl,git
git clone https://github.com/DUBSOpenHub/terminal-stampede.git
cd terminal-stampede
chmod +x install.sh && ./install.shSix files land in their working locations:
| File | Location | Purpose |
|---|---|---|
| Orchestrator skill | ~/.copilot/skills/stampede/SKILL.md |
Parses commands, generates tasks, monitors, synthesizes |
| Worker agent | ~/.copilot/agents/stampede-worker.agent.md |
Claims tasks, does the work, writes results |
| Merger agent | ~/.copilot/agents/stampede-merger.agent.md |
Auto-merges all branches, resolves conflicts, shadow-scores |
| Launcher | ~/bin/stampede.sh |
Creates tmux session, spawns panes, tracks PIDs |
| Monitor | ~/bin/stampede-monitor.sh |
Live progress, stuck detection, runtime stats |
| Merger script | ~/bin/stampede-merge.sh |
Discovers branches, sorts by size, launches merger |
Note: The skill and agent files install to
~/.copilot/paths for GitHub Copilot CLI. If you use a different CLI agent (Aider, Claude Code, etc.), you only need the shell scripts in~/bin/โ see Option B below.
Option A: From the command line (works with any CLI agent)
Create task files yourself, then launch:
# 1. Create a run directory (inside your repo)
cd ~/my-project
RUN_ID="run-$(date +%Y%m%d-%H%M%S)"
mkdir -p .stampede/$RUN_ID/{queue,claimed,results,logs}
# 2. Add task files (one JSON per task)
cat > .stampede/$RUN_ID/queue/task-001.json << 'EOF'
{
"task_id": "task-001",
"description": "Add input validation to the auth module",
"scope": ["src/auth.py"],
"branch": "stampede/task-001"
}
EOF
# ... repeat for each task
# 3. Launch the fleet
stampede.sh --run-id $RUN_ID --count 8 --repo ~/my-project --model claude-haiku-4.5A Terminal window opens. Eight panes tile across the screen. Gold โก borders show the model and task for each agent. A monitor pane tracks progress in real time. You watch them work.
By default, workers launch with GitHub Copilot CLI. To use a different CLI agent, pass --agent-cmd:
# Claude Code
stampede.sh --run-id $RUN_ID --count 8 --repo ~/my-project --agent-cmd 'claude -p "{prompt}"'
# Aider
stampede.sh --run-id $RUN_ID --count 8 --repo ~/my-project --agent-cmd 'aider --message "{prompt}"'Option B: From a Copilot CLI session (if using GitHub Copilot CLI)
Open a Copilot CLI session and tell the stampede skill what to do:
stampede 8 agents on ~/my-project โ add error handling, write tests, improve docs
The orchestrator reads your codebase, generates task files, launches the fleet, and monitors progress. You watch.
To test Terminal Stampede, we pointed it at this repo. 8 agents ran simultaneously on the terminal-stampede codebase โ adding error handling, creating docs, improving the agent prompts, updating the changelog, and more. Nobody touched anything. They just ran.
| Result | |
|---|---|
| Tasks | 8 |
| Agents | 8 (claude-haiku-4.5) |
| Wall clock | ~6 minutes |
| Success rate | 8/8 |
| Coordination failures | 0 |
| Task | Changes |
|---|---|
| Defensive error handling for stampede.sh | +218 -33 |
| CONTRIBUTING.md (from scratch) | +219 |
| Agent hard-exit rules | +218 -33 |
| Orchestrator failure recovery docs | +132 -1 |
| CHANGELOG update from git history | +100 |
| copilot-instructions.md improvements | +85 -3 |
| Blog accuracy review | +30 -30 |
| Install.sh: uninstall, --check, versioning | +100 |
8 branches. ~800 lines of real changes. The simplest possible architecture โ files on disk, atomic renames, no coordination server โ was also the most reliable. Nothing broke. Nothing conflicted. The agents didn't even know each other existed.
You're a developer. Monday morning. Your codebase needs error handling added to 4 modules, test coverage expanded, docs updated, and the CLI cleaned up. That's 8 tasks.
Today, you work through them one at a time. Ask your AI agent for the first task. Wait. Ask for the second. Wait. Context-switch. Lose momentum. Some tasks take a minute, some take ten, but you're stuck in a queue of your own making.
Terminal Stampede runs them all at once. One command, up to 20 panes, each agent working in parallel on its own git branch. Instead of feeding tasks one by one, you define the batch and let them run. Your development time scales with the longest single task, not the sum of all of them.
| Sequential | Parallel (Stampede) | |
|---|---|---|
| Workflow | One task at a time | All tasks at once |
| Context windows | One shared session | Up to 20 independent sessions |
| Git branches | 1 (sequential) | Up to 20 (parallel, isolated) |
| Your involvement | Babysit each task | Start it and walk away |
Every multi-agent framework out there (LangGraph, CrewAI, AutoGen) runs agents as function calls inside one process. They share one brain. When Agent A is thinking, Agent B waits.
Terminal Stampede does something different. Each agent is a fully independent CLI session running in its own tmux pane with its own context window. It can read code, edit files, run tests, see failures, and fix them. No other agent is competing for its attention.
Each agent = one tmux pane = one independent CLI session = one git branch. They share nothing โ no memory, no context, no files in progress. Twenty agents means twenty completely isolated AI coding sessions running side by side.
Branches are named stampede/task-001, stampede/task-002, etc. After a run, the merger combines them into stampede/merged-{run_id}. Task branches stay around for inspection until you clean up with --teardown.
The "message queue" is just files on disk. The "orchestrator" is just a script. The "agent runtime" is just your terminal. Point it at any repo.
Think of a deli counter. Tasks are tickets on the wall. Agents grab one at a time.
Agent A: mv queue/task-001.json claimed/task-001.json โ succeeds
Agent B: mv queue/task-001.json claimed/task-001.json โ file gone, tries next
No locks. No database. Just filesystem rename โ atomic by POSIX guarantee.
- Claim a task (atomic
mv) - Create git branch:
stampede/task-001 - Read the code, make improvements, run tests
- Write result file (atomic:
.tmp-thenmv) - Claim next task or exit
โ๏ธ [โโโโโโโโโโโโโโโโโโโโ] 75% (6/8) | alive=8 dead=0
If an agent dies mid-task, the orchestrator detects it via PID check, re-queues the task, and another agent picks it up.
When all results are in, the orchestrator checks if any two agents modified the same file:
โ ๏ธ CONFLICT: lib/state.py modified by task-001 and task-003
โ
No conflicts on remaining 6 branches โ ready to merge
"Did you define what good looks like before AI ran, or after?" Most people using AI coding tools have no definition of quality โ they eyeball the output and hope for the best. Stampede bakes evaluation into the runtime itself. The scoring criteria are defined before agents run. Measurement happens silently during and after. The agents never know they're being scored.
After all agents finish, the merger agent combines every branch into one. It merges sequentially (smallest changes first to build a clean base), resolves conflicts using AI that reads both task descriptions to understand intent, and skips anything irreconcilable.
While merging, the merger silently shadow-scores each agent's work across 3 layers:
| Layer | When | What It Measures |
|---|---|---|
| Runtime | During stampede | Time to complete, stuck events, files changed |
| Merge | During merge | Conflict friendliness (clean merge vs. conflicts caused) |
| Quality | After all merges | Completeness, scope adherence, code quality, test impact |
Scores are weighted โ Completeness (30%) matters most, Conflict Friendliness (10%) matters least since it's partly outside the agent's control. The agents never know they're being scored.
๐ฆฌ Shadow Scorecard (weighted)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Agent Model Comp Scope Qual Conflt Test Total +/-
(30%) (25%) (20%) (10%) (15%) /50
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
task-001 claude-sonnet-4.5 10 10 8 10 5 44.2 โก+2
task-002 gpt-5.1 10 10 8 10 5 44.2
task-003 claude-sonnet-4.5 10 10 8 10 5 44.2 ๐-1
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Scores persist across runs to ~/.stampede/model-stats.json, building a leaderboard that shows which AI models consistently produce the best work over time.
Every vendor publishes benchmarks. Every benchmark uses synthetic tests. None of them tell you which model writes the best code on your repo, with your patterns, in your language.
The stampede leaderboard answers that question empirically. Every run shadow-scores each model's work. Scores accumulate across runs. Over time, you get a ranking built from real work on your real codebase โ not from HumanEval, not from vendor marketing, not from someone else's synthetic tests. From your code, your tasks, your results.
๐ Model Leaderboard (12 runs)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
claude-sonnet-4.5 avg 44.2/50 (18 branches)
gpt-5.1-codex avg 42.8/50 (14 branches)
claude-haiku-4.5 avg 41.1/50 (22 branches)
gpt-5.1 avg 39.7/50 (16 branches)
gemini-3-pro avg 38.4/50 (10 branches)
๐ Model stats updated (12 total runs)
stampede.sh --run-id <id> --count <n> --repo <path> [--model <model>] [--agent-cmd <cmd>]
stampede.sh --teardown --run-id <id>
Options:
--run-id Run identifier (format: run-YYYYMMDD-HHMMSS)
--count Number of agents (1-20, sweet spot: 6-8)
--repo Path to any git repository
--model AI model (default: claude-haiku-4.5)
--agent-cmd Custom CLI agent command (default: GitHub Copilot CLI)
Use {prompt} and {model} as placeholders.
--teardown Kill agents, clean up
--no-attach Don't auto-open Terminal window
| Key | What it does |
|---|---|
tmux attach -t stampede-{run_id} |
Attach to the fleet |
Ctrl-B z |
Zoom one pane full screen |
Ctrl-B z again |
Zoom back out to the grid |
Ctrl-B arrow |
Move between panes |
Ctrl-B d |
Detach (agents keep running) |
๐ฌ Zoom into any pane and talk to the agent mid-task. Every pane is a live session โ watch, redirect, or course-correct while the stampede runs.
๐ฅ๏ธ Best on ultrawide. 20 agents on a 49" ultrawide gives each one the space of a normal terminal. One monitor, 20 AI brains, all visible at once.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Orchestrator (SKILL.md) โ
โ Parses intent โ generates tasks โ launches โ
โ agents โ polls results โ synthesizes โ
โโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Launcher (stampede.sh) โ
โ tmux session โ N panes โ PID tracking โ
โโโโโโโโโฌโโโโโโโโฌโโโโโโโโฌโโโโโโโโฌโโโโโโโโโโโโโโโโโโ
โผ โผ โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ฆฌ โโ ๐ฆฌ โโ ๐ฆฌ โโ ๐ฆฌ โ Each agent: own terminal,
โ โโ โโ โโ โ own context window, own branch
โโโโฌโโโโโโโฌโโโโโโโฌโโโโโโโฌโโโ
โ โ โ โ
โผ โผ โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ repo/.stampede/{run_id}/ โ
โ queue/ โ claimed/ โ results/ โ
โ โ
โโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโ
โ all done
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Merger (stampede-merger) โ
โ Auto-merge โ resolve conflictsโ
โ โ shadow score โ leaderboard โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Decision | Why |
|---|---|
| Filesystem as message queue | Simpler than anything else. ls queue/ is your debugger |
| Agents for tasks, skill for orchestrator | Skills load globally, agents load per-session. Clean role isolation. Skill/agent format is Copilot CLI; shell scripts work with any CLI agent |
| Branch per task | No two agents touch main. Conflicts caught at synthesis |
| Auto-merger with AI conflict resolution | Reads both task descriptions to resolve conflicts semantically, not just syntactically |
| Weighted shadow scoring | Completeness (30%) matters most; conflict friendliness (10%) is partly luck |
| Cross-run model leaderboard | Shows which AI models consistently produce the best work over time |
| 500-word result cap | Verbose summaries would blow the orchestrator's context |
--max-autopilot-continues 30 |
Prevents runaway agents from burning unlimited quota (Copilot CLI flag; other CLIs have their own limits) |
| Lightweight models for grunt work | Save the powerful model for synthesis, use fast ones for parallel tasks |
Built during a Havoc Hackathon, where AI models competed to design this framework across elimination rounds with sealed judging. The winning architecture was synthesized from Claude Opus 4.6 (Fast) and GPT-5.3-Codex, then battle-tested with live stampedes on real codebases.
Read the full story: I Split One Terminal Into 20 AI Brains. Here's What Happened. โ
MIT โ use it, fork it, stampede with it. ๐ฆฌ
Created with ๐ by DUBSOpenHub. Works with any CLI coding agent.
Let's build! ๐โจ

