Brainqub3 Agent Labs

Local-first, open-source Agent Arena for prototyping single-agent systems (SAS), multi-agent systems (MAS), evaluator-first experimentation, and paper-aligned scaling analysis.

About Brainqub3

Brainqub3 is an AI agent consultancy focused on production-grade workflow automation. The goal is to ship measurable workflow improvements in weeks, not quarters, integrated into real systems with authentication, logging, guardrails, and ownership handoff.

Website: https://brainqub3.com
Core delivery model: integrated software outcomes (not workshops, decks, or long transformation programs)
Typical engagement cadence:
- 1-2 sprints: Prototype to Production
- 2-4 sprints: Workflow Integration Build
- 1-2 days/week: Fractional Engineer retainer
What Brainqub3 delivers:
- Workflow automation (triage, routing, intake, enrichment, summarisation)
- Copilots and agents embedded in existing tools (ticketing, CRM, Slack, email, dashboards)
- Production hardening (evals, guardrails, logging, monitoring, rollback planning)
Delivery loop: scope -> integrate -> guardrails -> measure -> ship -> handoff
Repository maintenance: this repository is maintained by Brainqub3 founder, John Adeojo.
Company: Brainqub3 is a consulting brand of DATA-CENTRIC SOLUTIONS LTD (England and Wales, Company No. 14829432), London, UK.

Environment Contract (Required)

Every local run (human or agent-driven) must satisfy:

Python 3.11+
uv installed and on PATH
Project dependencies synced into .venv via uv
Official Python package claude-agent-sdk installed in that environment
ANTHROPIC_API_KEY present in .env or shell environment

Use uv run ... for all project commands.

Claude Code Auto Setup (Deterministic)

For first-clone setup in Claude Code, run:

/lab-setup

If your Claude Code UI namespaces project commands, use /project:lab-setup.

For live checks in namespaced UIs, use /project:lab-setup-live.

Modes:

/lab-setup: deterministic local bootstrap + offline doctor checks.
/lab-setup-live: full live doctor checks (requires ANTHROPIC_API_KEY).

Claude may not be able to read .env due repo permissions, so before /lab-setup-live ensure .env is already present with ANTHROPIC_API_KEY (or export the key in shell environment).

These commands use the repo preflight scripts and work across Windows, Linux, and macOS.

Fast Path (Recommended)

Run preflight before SAS/MAS runs.

Unix/macOS:

bash scripts/preflight.sh --bootstrap-uv

Windows PowerShell:

./scripts/preflight.ps1 --bootstrap-uv

Preflight does all of the following:

Installs uv if missing (when --bootstrap-uv is used)
Syncs dependencies (uv sync --locked --all-extras if uv.lock exists, else uv sync --all-extras)
Verifies official claude-agent-sdk is installed in the uv environment
Runs brainqub3 doctor (or --offline checks)
Fails fast on any missing requirement

Manual Setup (Step-by-Step)

1) Install Python 3.11

Windows:

py -3.11 --version

Unix/macOS:

python3.11 --version

2) Install uv

Windows:

py -3.11 -m pip install --user uv

Unix/macOS:

python3.11 -m pip install --user uv

3) Sync dependencies

From repo root:

uv --version
uv sync --all-extras

If uv.lock exists and you want strict reproducibility:

uv sync --locked --all-extras

4) Configure API key

Create .env from template and set ANTHROPIC_API_KEY:

cp .env.example .env

Windows PowerShell alternative:

Copy-Item .env.example .env

5) Verify official Agent SDK + environment

uv run python -c "import importlib.metadata as m; print(m.version('claude-agent-sdk'))"
uv run brainqub3 doctor

If doctor fails, do not run SAS/MAS until it passes.

Proxy Note (Common Failure Mode)

Live runs will fail if proxy variables point to a dead proxy. If needed, clear them in your current shell:

Windows PowerShell:

$env:HTTP_PROXY=''; $env:HTTPS_PROXY=''; $env:ALL_PROXY=''
$env:GIT_HTTP_PROXY=''; $env:GIT_HTTPS_PROXY=''

Unix/macOS:

unset HTTP_PROXY HTTPS_PROXY ALL_PROXY GIT_HTTP_PROXY GIT_HTTPS_PROXY

Minimal Validation Runbook

Run evaluator tests first:

uv run pytest brainqub3/tasks/examples/hello_world/tests -q

Live SAS:

uv run brainqub3 run sas --task hello_world --model claude-sonnet-4-5 --instances 3 --require-live

Live MAS:

uv run brainqub3 run mas --task hello_world --arch hybrid --model claude-sonnet-4-5 --n-agents 3 --instances 3 --require-live

By default, run sas and run mas auto-load the HTML dashboard at http://127.0.0.1:8765. Use --no-dashboard to disable auto-loading for non-interactive runs.

Dashboard:

uv run brainqub3 dashboard

By default this starts a lightweight local web server at http://127.0.0.1:8765. You can override runtime options:

uv run brainqub3 dashboard --host 0.0.0.0 --port 9000 --refresh-ms 3000

If you cloned the repo and want dashboard data from committed runs:

uv run brainqub3 run reindex
uv run brainqub3 dashboard

Backend Modes

Default behavior is live-first. Missing SDK/key causes fail-fast unless mock is explicitly allowed.
Use --require-live for validation runs; it hard-fails if backend is not live.
Use --allow-mock only for explicit offline dry-runs.
Do not combine --require-live with --allow-mock.
Run artefacts store runtime.backend_mode and runtime.backend_reason.
SDK transport buffer defaults to 10MB in Arena runs; override via BRAINQUB3_SDK_MAX_BUFFER_SIZE_BYTES when needed.

Run Immutability

Completed runs are finalized with data/runs/<run_id>/run_manifest.json (content hashes).
Finalized runs are immutable: recomputing metrics in-place is blocked.
Use soft delete to tombstone a run in the local index while keeping artefacts.
Use hard delete to remove both artefacts and index rows.
Empirical prediction outputs are written to data/predictions/ to avoid mutating finalized runs.
Rebuild the SQLite index from canonical run artefacts with run reindex.

Core Commands

uv run brainqub3 task init <task_name>
uv run brainqub3 eval test <task_name>
uv run brainqub3 doctor
uv run brainqub3 run sas --task <task> --model <model> --instances <N> --require-live [--no-dashboard]
uv run brainqub3 run sas --task <task> --model <model> --instances <N> --allowed-tools Read,Write,Edit,Bash,Glob,Grep,WebSearch,WebFetch --require-live [--no-dashboard]
uv run brainqub3 run mas --task <task> --arch <independent|centralised|decentralised|hybrid> --model <model> --n-agents <n> --instances <N> --allowed-tools Read,Write,Edit,Bash,Glob,Grep,WebSearch,WebFetch --require-live [--no-dashboard]
uv run brainqub3 run elasticity --task <task> --arch <independent|centralised|decentralised|hybrid> --model <model> --n-agents-grid 3,4 --tool-count-grid 6,8 --batch-id <batch>
uv run brainqub3 run delete --run-id <id> [--reason "<text>"]
uv run brainqub3 run delete --run-id <id> --hard
uv run brainqub3 run reindex [--verify-manifests]
uv run brainqub3 metrics compute --run-id <id>
uv run brainqub3 model predict --run-id <id>
uv run brainqub3 scenario run --scenario data/scenarios/<name>.yaml [--batch-id <batch>]
uv run brainqub3 dashboard

Temp Files Policy

Store ad-hoc analysis scripts and scratch outputs in temp/.
temp/ is git-kept via temp/.gitkeep; all other files under temp/ are ignored.
Do not create new scratch files in the repository root.

Scaling Model (Simple)

The scaling model learns elasticities from controlled empirical run pairs per MAS architecture and per coordination metric (overhead_pct, message_density_c, redundancy_R, efficiency_Ec, error_amp_Ae).
eta_n is estimated from pairs where tool_count_T is fixed and n_agents changes: eta_n = ln(x_j / x_i) / ln(n_j / n_i).
eta_T is estimated from pairs where n_agents is fixed and tool_count_T changes: eta_T = ln(x_j / x_i) / ln(T_j / T_i).
The model uses the median of valid pairwise slopes as the elasticity estimate for each axis.
Each base coordination metric is then scaled with a non-dimensional factor: x_hat = clamp(x_base * (n_agents / n0)^eta_n * (tool_count_T / T0)^eta_T).
Scaled coordination metrics are fed into the Table 4 prediction model to produce P_hat; dashboard curves plot SAS-relative deltas (delta_vs_sas = P_architecture - P_SAS).
In the dashboard Scaling Laws tab, select the explicit elasticity calibration batch for scaling analysis (not a SAS-vs-MAS comparison-only batch).

Paper reference:

Towards a Science of Scaling Agent Systems. arXiv preprint arXiv:2512.08296.
PDF: https://arxiv.org/pdf/2512.08296

Intelligence index source:

Model intelligence_index values in brainqub3/config/models.yaml are sourced from the Artificial Analysis Intelligence Index (recorded as a local snapshot in intelligence_source): https://artificialanalysis.ai/evaluations/artificial-analysis-intelligence-index

Scaling guardrails:

Hypothetical MAS scaling analysis now fails fast when required elasticity support is missing (for example source=default_zero / zero controlled pairs).
Calibrate first with shared-batch elasticity sweeps: uv run brainqub3 run elasticity --task <task> --arch <arch> --batch-id <batch> --n-agents-grid 3,4 --tool-count-grid 6,8.
Elasticity tool policy uses core-vs-full by default: core is Read, Write, Edit, Bash, Glob, Grep; full adds WebSearch and WebFetch.
Live runs now enforce allowed_tools at runtime through claude-agent-sdk permission callbacks (not metadata-only). Tool attempts are recorded in telemetry tool_called events.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.claude		.claude
brainqub3		brainqub3
data		data
docs		docs
scripts		scripts
temp		temp
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
2512.08296v2.pdf		2512.08296v2.pdf
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
Trace_v5.MD		Trace_v5.MD
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Brainqub3 Agent Labs

About Brainqub3

Environment Contract (Required)

Claude Code Auto Setup (Deterministic)

Fast Path (Recommended)

Manual Setup (Step-by-Step)

1) Install Python 3.11

2) Install uv

3) Sync dependencies

4) Configure API key

5) Verify official Agent SDK + environment

Proxy Note (Common Failure Mode)

Minimal Validation Runbook

Backend Modes

Run Immutability

Core Commands

Temp Files Policy

Scaling Model (Simple)

About

Uh oh!

Releases

Packages

Languages

License

brainqub3/agent-labs

Folders and files

Latest commit

History

Repository files navigation

Brainqub3 Agent Labs

About Brainqub3

Environment Contract (Required)

Claude Code Auto Setup (Deterministic)

Fast Path (Recommended)

Manual Setup (Step-by-Step)

1) Install Python 3.11

2) Install uv

3) Sync dependencies

4) Configure API key

5) Verify official Agent SDK + environment

Proxy Note (Common Failure Mode)

Minimal Validation Runbook

Backend Modes

Run Immutability

Core Commands

Temp Files Policy

Scaling Model (Simple)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages