An engine for decision-grade AI deliberation.
Delibera is an open-source framework that makes multi-agent reasoning structured, governed, and auditable. Unlike chat-based AI systems, Delibera treats deliberation as a process — not a conversation. The engine controls tree expansion, pruning, and convergence while agents contribute content. Every run can be replayed without re-invoking LLMs.
| Concept | Description |
|---|---|
| Engine | Central orchestrator that controls tree structure and termination. Agents propose; the engine decides. |
| Protocol | Declarative YAML spec defining expansion rules, branch pipelines, pruning, and convergence criteria. |
| Epistemics | Explicit tracking of claims, evidence, and objections. Fact claims require evidence; inferences are validated per-claim against evidence. |
| Retrieval | Multi-source evidence gathering: keyword search, semantic embeddings, web search (Gemini grounding), and hybrid fusion. |
| Verification | Evidence verification layer that validates web search results by fetching URLs, LLM fact-checking, or cross-referencing. |
| Gates | Structured human-in-the-loop checkpoints for scope clarification, tradeoffs, and final approval. |
| Replay | Full runs can be reconstructed from trace logs without calling agents or tools. |
# Install from PyPI
pip install delibera
# Or clone and install with uv
git clone https://github.com/forge-labs-dev/delibera.git
cd delibera
uv sync# Basic run with interactive gates (uses stub agents)
delibera run --question "Should we adopt uv for dependency management?"
# Run with auto-approved gates (for CI/scripts)
delibera run --question "Should we adopt uv?" --auto-approve-gates
# Run with a custom protocol
delibera run --question "Your question" --protocol protocols/tree_v1.yamlTo use LLMs for all agent roles (planner, proposer, researcher, red-teamer, refiner, validator):
# Set your Gemini API key
export GEMINI_API_KEY="your-api-key"
# Run with all LLM agents (auto-enables web evidence retrieval)
delibera run --question "Should we adopt uv?" --use-all-llm --no-gates
# Run individual LLM agents
delibera run --question "Your question" --use-llm-proposer --use-llm-researcher --auto-approve-gates
# Parallel branch execution for faster deliberation
delibera run --question "Your question" --use-all-llm --max-parallel-branches 3 --no-gates
# Specify model and parameters
delibera run --question "Your question" \
--use-all-llm \
--llm-model gemini-2.5-flash \
--llm-temperature 0.3 \
--no-gatesNote: LLM mode requires the google-generativeai package. Install with:
pip install delibera[llm]When --use-all-llm (or --use-llm-researcher) is set, web retrieval is auto-enabled — the LLM researcher generates search queries and Gemini Google Search grounding retrieves real evidence.
Replay and inspection work identically whether the run used LLM or stubs — replay never re-invokes the LLM.
Delibera supports multiple evidence retrieval methods:
# Keyword search over local evidence files (default)
delibera run --question "Your question" --retrieval-method keyword --evidence-dir ./evidence
# Semantic search using Gemini embeddings
delibera run --question "Your question" --retrieval-method embedding --evidence-dir ./evidence
# Web search via Gemini Google Search grounding
delibera run --question "Your question" --retrieval-method web --use-llm-proposer
# Hybrid: combine local + web with Reciprocal Rank Fusion
delibera run --question "Your question" --retrieval-method hybrid --evidence-dir ./evidence
# Enable verification of web results
delibera run --question "Your question" --retrieval-method web --verify --use-llm-proposer# Print a human-readable summary
delibera inspect --run-id <run_id>
# Generate a Markdown report
delibera report --run-id <run_id> --out report.md# Validate trace and artifact consistency
delibera replay --run-id <run_id># Run an evaluation suite
delibera eval --suite suites/basic.yaml
# Save results to JSON
delibera eval --suite suites/basic.yaml --save-results results.jsonname: simple_protocol
protocol_version: v1
max_depth: 1
gates_enabled: true
expand_rules:
- id: expand_options
at_step_id: plan
child_kind: option
max_children: 3
depth: 1
branch_pipeline:
- id: propose
kind: work
step_name: PROPOSE
role: proposer
- id: research
kind: work
step_name: RESEARCH
role: researcher
- id: validate
kind: validate
step_name: CLAIM_CHECK
prune:
rule: epistemic_then_score
keep_k: 2
reduce:
rule: merge_artifacts
convergence:
max_rounds: 0+----------------------------------------------------+
| CLI / API |
+---------------------------+------------------------+
| Protocol Layer | User Gates |
+---------------------------+------------------------+
| Deliberation Engine (Orchestrator) |
+----------------------------------------------------+
| Epistemics | Tools & Policy | Tracing |
+----------------------------------------------------+
| Agents (Stubs or LLM-backed) |
+----------------------------------------------------+
| LLM Providers | Retrieval + Verification |
+----------------------------------------------------+
src/delibera/
engine/ Orchestrator, operators, tree, run state
protocol/ Declarative specs, YAML loader, interpreter
agents/ Stubs + LLM-backed agents (planner, proposer, researcher, redteam, refiner, validator)
epistemics/ Claims, evidence, objections, ledgers, validation
retrieval/ Keyword, embedding, web, hybrid retrievers + verification
llm/ LLM client protocol, Gemini provider, prompts, redaction
scoring/ Weighted metrics, score computation, pruning support
tools/ Tool registry, router, policy engine, built-ins (calculator, docs)
gates/ Gate models, predicates, handlers, response application
trace/ Events, writer, reader, replay, validation
inspect/ Run summarization, Markdown + text report rendering
eval/ Evaluation suite runner, loader, metrics, comparison
cli.py Click-based CLI entry point
- Not an agent framework — Delibera is not LangChain, CrewAI, or AutoGPT. It's a deliberation engine with strict governance.
- Not a workflow orchestrator — Delibera is not Airflow or Prefect. It's specifically for reasoning processes that require epistemic tracking.
- Not autonomous — Delibera does not "decide for you". It produces structured decision artifacts for humans to review.
- Not a chatbot — Outputs are artifacts, not conversations.
See the docs/ directory for detailed documentation:
- Vision — Why Delibera exists
- Architecture — System structure and invariants
- Formalism — Formal model and terminology
- Protocols — Protocol specification
- Epistemics — Claims, evidence, and validation
- Tooling and Policy — Tool access governance
- User Gates — Human-in-the-loop checkpoints
- Tracing and Replay — Audit and replay
# Install dev dependencies
uv sync
# Run tests (547 tests)
uv run pytest
# Type checking (strict mode)
uv run mypy src/
# Linting
uv run ruff check src/ tests/See CONTRIBUTING.md for contributor guidelines.
- Core deliberation engine with 15-phase orchestration loop
- Declarative YAML protocol system with expansion, pruning, convergence
- Multi-level tree expansion — recursive depth-2+ trees with sub-plans per option
- Parallel branch execution —
ThreadPoolExecutorwith configurable--max-parallel-branches - Dynamic protocols — conditional expansion based on metrics, dominance-threshold early termination
- Epistemic layer: claims, evidence, objections with per-claim validation and support linking
- Multi-source evidence retrieval (keyword, embedding, web, hybrid with RRF)
- Auto-enabled web retrieval when LLM researcher is active
- Evidence verification for web results (fetch, LLM, cross-reference)
- LLM integration with Gemini for all agent roles (planner, proposer, researcher, red-teamer, refiner, validator)
- LLM claim validator with semantic evidence matching (replaces keyword heuristic)
- Balanced scoring weights — evidence can counterbalance objections for realistic scores
- Human-in-the-loop gates (scope, tradeoff, final sign-off)
- Tool system with policy engine, routing, and built-in tools
- Complete tracing and deterministic replay (thread-safe for parallel execution)
- Inspection and Markdown report generation
- Evaluation harness with suite runner and metrics
- CLI with extensive configuration options
All agent roles have deterministic stub implementations for testing without API keys. LLM agents fall back to stubs on failure.
See ROADMAP.md for detailed next steps.
See LICENSE file for details.