Local-first, open-source Agent Arena for prototyping single-agent systems (SAS), multi-agent systems (MAS), evaluator-first experimentation, and paper-aligned scaling analysis.
Brainqub3 is an AI agent consultancy focused on production-grade workflow automation. The goal is to ship measurable workflow improvements in weeks, not quarters, integrated into real systems with authentication, logging, guardrails, and ownership handoff.
- Website: https://brainqub3.com
- Core delivery model: integrated software outcomes (not workshops, decks, or long transformation programs)
- Typical engagement cadence:
1-2 sprints: Prototype to Production2-4 sprints: Workflow Integration Build1-2 days/week: Fractional Engineer retainer
- What Brainqub3 delivers:
- Workflow automation (triage, routing, intake, enrichment, summarisation)
- Copilots and agents embedded in existing tools (ticketing, CRM, Slack, email, dashboards)
- Production hardening (evals, guardrails, logging, monitoring, rollback planning)
- Delivery loop: scope -> integrate -> guardrails -> measure -> ship -> handoff
- Repository maintenance: this repository is maintained by Brainqub3 founder, John Adeojo.
- Company: Brainqub3 is a consulting brand of
DATA-CENTRIC SOLUTIONS LTD(England and Wales, Company No.14829432), London, UK.
Every local run (human or agent-driven) must satisfy:
- Python
3.11+ uvinstalled and onPATH- Project dependencies synced into
.venvviauv - Official Python package
claude-agent-sdkinstalled in that environment ANTHROPIC_API_KEYpresent in.envor shell environment
Use uv run ... for all project commands.
For first-clone setup in Claude Code, run:
/lab-setup
If your Claude Code UI namespaces project commands, use /project:lab-setup.
For live checks in namespaced UIs, use /project:lab-setup-live.
Modes:
/lab-setup: deterministic local bootstrap + offline doctor checks./lab-setup-live: full live doctor checks (requiresANTHROPIC_API_KEY).
Claude may not be able to read .env due repo permissions, so before /lab-setup-live ensure .env is already present with ANTHROPIC_API_KEY (or export the key in shell environment).
These commands use the repo preflight scripts and work across Windows, Linux, and macOS.
Run preflight before SAS/MAS runs.
Unix/macOS:
bash scripts/preflight.sh --bootstrap-uvWindows PowerShell:
./scripts/preflight.ps1 --bootstrap-uvPreflight does all of the following:
- Installs
uvif missing (when--bootstrap-uvis used) - Syncs dependencies (
uv sync --locked --all-extrasifuv.lockexists, elseuv sync --all-extras) - Verifies official
claude-agent-sdkis installed in theuvenvironment - Runs
brainqub3 doctor(or--offlinechecks) - Fails fast on any missing requirement
Windows:
py -3.11 --versionUnix/macOS:
python3.11 --versionWindows:
py -3.11 -m pip install --user uvUnix/macOS:
python3.11 -m pip install --user uvFrom repo root:
uv --version
uv sync --all-extrasIf uv.lock exists and you want strict reproducibility:
uv sync --locked --all-extrasCreate .env from template and set ANTHROPIC_API_KEY:
cp .env.example .envWindows PowerShell alternative:
Copy-Item .env.example .envuv run python -c "import importlib.metadata as m; print(m.version('claude-agent-sdk'))"
uv run brainqub3 doctorIf doctor fails, do not run SAS/MAS until it passes.
Live runs will fail if proxy variables point to a dead proxy. If needed, clear them in your current shell:
Windows PowerShell:
$env:HTTP_PROXY=''; $env:HTTPS_PROXY=''; $env:ALL_PROXY=''
$env:GIT_HTTP_PROXY=''; $env:GIT_HTTPS_PROXY=''Unix/macOS:
unset HTTP_PROXY HTTPS_PROXY ALL_PROXY GIT_HTTP_PROXY GIT_HTTPS_PROXYRun evaluator tests first:
uv run pytest brainqub3/tasks/examples/hello_world/tests -qLive SAS:
uv run brainqub3 run sas --task hello_world --model claude-sonnet-4-5 --instances 3 --require-liveLive MAS:
uv run brainqub3 run mas --task hello_world --arch hybrid --model claude-sonnet-4-5 --n-agents 3 --instances 3 --require-liveBy default, run sas and run mas auto-load the HTML dashboard at http://127.0.0.1:8765.
Use --no-dashboard to disable auto-loading for non-interactive runs.
Dashboard:
uv run brainqub3 dashboardBy default this starts a lightweight local web server at http://127.0.0.1:8765.
You can override runtime options:
uv run brainqub3 dashboard --host 0.0.0.0 --port 9000 --refresh-ms 3000If you cloned the repo and want dashboard data from committed runs:
uv run brainqub3 run reindex
uv run brainqub3 dashboard- Default behavior is live-first. Missing SDK/key causes fail-fast unless mock is explicitly allowed.
- Use
--require-livefor validation runs; it hard-fails if backend is not live. - Use
--allow-mockonly for explicit offline dry-runs. - Do not combine
--require-livewith--allow-mock. - Run artefacts store
runtime.backend_modeandruntime.backend_reason. - SDK transport buffer defaults to 10MB in Arena runs; override via
BRAINQUB3_SDK_MAX_BUFFER_SIZE_BYTESwhen needed.
- Completed runs are finalized with
data/runs/<run_id>/run_manifest.json(content hashes). - Finalized runs are immutable: recomputing metrics in-place is blocked.
- Use soft delete to tombstone a run in the local index while keeping artefacts.
- Use hard delete to remove both artefacts and index rows.
- Empirical prediction outputs are written to
data/predictions/to avoid mutating finalized runs. - Rebuild the SQLite index from canonical run artefacts with
run reindex.
uv run brainqub3 task init <task_name>uv run brainqub3 eval test <task_name>uv run brainqub3 doctoruv run brainqub3 run sas --task <task> --model <model> --instances <N> --require-live [--no-dashboard]uv run brainqub3 run sas --task <task> --model <model> --instances <N> --allowed-tools Read,Write,Edit,Bash,Glob,Grep,WebSearch,WebFetch --require-live [--no-dashboard]uv run brainqub3 run mas --task <task> --arch <independent|centralised|decentralised|hybrid> --model <model> --n-agents <n> --instances <N> --allowed-tools Read,Write,Edit,Bash,Glob,Grep,WebSearch,WebFetch --require-live [--no-dashboard]uv run brainqub3 run elasticity --task <task> --arch <independent|centralised|decentralised|hybrid> --model <model> --n-agents-grid 3,4 --tool-count-grid 6,8 --batch-id <batch>uv run brainqub3 run delete --run-id <id> [--reason "<text>"]uv run brainqub3 run delete --run-id <id> --harduv run brainqub3 run reindex [--verify-manifests]uv run brainqub3 metrics compute --run-id <id>uv run brainqub3 model predict --run-id <id>uv run brainqub3 scenario run --scenario data/scenarios/<name>.yaml [--batch-id <batch>]uv run brainqub3 dashboard
- Store ad-hoc analysis scripts and scratch outputs in
temp/. temp/is git-kept viatemp/.gitkeep; all other files undertemp/are ignored.- Do not create new scratch files in the repository root.
- The scaling model learns elasticities from controlled empirical run pairs per MAS architecture and per coordination metric (
overhead_pct,message_density_c,redundancy_R,efficiency_Ec,error_amp_Ae). eta_nis estimated from pairs wheretool_count_Tis fixed andn_agentschanges:eta_n = ln(x_j / x_i) / ln(n_j / n_i).eta_Tis estimated from pairs wheren_agentsis fixed andtool_count_Tchanges:eta_T = ln(x_j / x_i) / ln(T_j / T_i).- The model uses the median of valid pairwise slopes as the elasticity estimate for each axis.
- Each base coordination metric is then scaled with a non-dimensional factor:
x_hat = clamp(x_base * (n_agents / n0)^eta_n * (tool_count_T / T0)^eta_T). - Scaled coordination metrics are fed into the Table 4 prediction model to produce
P_hat; dashboard curves plot SAS-relative deltas (delta_vs_sas = P_architecture - P_SAS). - In the dashboard Scaling Laws tab, select the explicit elasticity calibration batch for scaling analysis (not a SAS-vs-MAS comparison-only batch).
Paper reference:
- Towards a Science of Scaling Agent Systems. arXiv preprint arXiv:2512.08296.
- PDF: https://arxiv.org/pdf/2512.08296
Intelligence index source:
- Model
intelligence_indexvalues inbrainqub3/config/models.yamlare sourced from the Artificial Analysis Intelligence Index (recorded as a local snapshot inintelligence_source): https://artificialanalysis.ai/evaluations/artificial-analysis-intelligence-index
Scaling guardrails:
- Hypothetical MAS scaling analysis now fails fast when required elasticity support is missing (for example
source=default_zero/ zero controlled pairs). - Calibrate first with shared-batch elasticity sweeps:
uv run brainqub3 run elasticity --task <task> --arch <arch> --batch-id <batch> --n-agents-grid 3,4 --tool-count-grid 6,8. - Elasticity tool policy uses core-vs-full by default: core is
Read, Write, Edit, Bash, Glob, Grep; full addsWebSearchandWebFetch. - Live runs now enforce
allowed_toolsat runtime throughclaude-agent-sdkpermission callbacks (not metadata-only). Tool attempts are recorded in telemetrytool_calledevents.