Skip to content

forge-labs-dev/delibera

Repository files navigation

Delibera

An engine for decision-grade AI deliberation.

Delibera is an open-source framework that makes multi-agent reasoning structured, governed, and auditable. Unlike chat-based AI systems, Delibera treats deliberation as a process — not a conversation. The engine controls tree expansion, pruning, and convergence while agents contribute content. Every run can be replayed without re-invoking LLMs.

Key Concepts

Concept Description
Engine Central orchestrator that controls tree structure and termination. Agents propose; the engine decides.
Protocol Declarative YAML spec defining expansion rules, branch pipelines, pruning, and convergence criteria.
Epistemics Explicit tracking of claims, evidence, and objections. Fact claims require evidence; inferences are validated per-claim against evidence.
Retrieval Multi-source evidence gathering: keyword search, semantic embeddings, web search (Gemini grounding), and hybrid fusion.
Verification Evidence verification layer that validates web search results by fetching URLs, LLM fact-checking, or cross-referencing.
Gates Structured human-in-the-loop checkpoints for scope clarification, tradeoffs, and final approval.
Replay Full runs can be reconstructed from trace logs without calling agents or tools.

Quick Start

Installation

# Install from PyPI
pip install delibera

# Or clone and install with uv
git clone https://github.com/forge-labs-dev/delibera.git
cd delibera
uv sync

Run a Deliberation

# Basic run with interactive gates (uses stub agents)
delibera run --question "Should we adopt uv for dependency management?"

# Run with auto-approved gates (for CI/scripts)
delibera run --question "Should we adopt uv?" --auto-approve-gates

# Run with a custom protocol
delibera run --question "Your question" --protocol protocols/tree_v1.yaml

Run with LLM-Backed Agents

To use LLMs for all agent roles (planner, proposer, researcher, red-teamer, refiner, validator):

# Set your Gemini API key
export GEMINI_API_KEY="your-api-key"

# Run with all LLM agents (auto-enables web evidence retrieval)
delibera run --question "Should we adopt uv?" --use-all-llm --no-gates

# Run individual LLM agents
delibera run --question "Your question" --use-llm-proposer --use-llm-researcher --auto-approve-gates

# Parallel branch execution for faster deliberation
delibera run --question "Your question" --use-all-llm --max-parallel-branches 3 --no-gates

# Specify model and parameters
delibera run --question "Your question" \
  --use-all-llm \
  --llm-model gemini-2.5-flash \
  --llm-temperature 0.3 \
  --no-gates

Note: LLM mode requires the google-generativeai package. Install with:

pip install delibera[llm]

When --use-all-llm (or --use-llm-researcher) is set, web retrieval is auto-enabled — the LLM researcher generates search queries and Gemini Google Search grounding retrieves real evidence.

Replay and inspection work identically whether the run used LLM or stubs — replay never re-invokes the LLM.

Run with Evidence Retrieval

Delibera supports multiple evidence retrieval methods:

# Keyword search over local evidence files (default)
delibera run --question "Your question" --retrieval-method keyword --evidence-dir ./evidence

# Semantic search using Gemini embeddings
delibera run --question "Your question" --retrieval-method embedding --evidence-dir ./evidence

# Web search via Gemini Google Search grounding
delibera run --question "Your question" --retrieval-method web --use-llm-proposer

# Hybrid: combine local + web with Reciprocal Rank Fusion
delibera run --question "Your question" --retrieval-method hybrid --evidence-dir ./evidence

# Enable verification of web results
delibera run --question "Your question" --retrieval-method web --verify --use-llm-proposer

Inspect a Run

# Print a human-readable summary
delibera inspect --run-id <run_id>

# Generate a Markdown report
delibera report --run-id <run_id> --out report.md

Replay a Run

# Validate trace and artifact consistency
delibera replay --run-id <run_id>

Run Evaluation Suites

# Run an evaluation suite
delibera eval --suite suites/basic.yaml

# Save results to JSON
delibera eval --suite suites/basic.yaml --save-results results.json

Example Protocol

name: simple_protocol
protocol_version: v1
max_depth: 1
gates_enabled: true

expand_rules:
  - id: expand_options
    at_step_id: plan
    child_kind: option
    max_children: 3
    depth: 1

branch_pipeline:
  - id: propose
    kind: work
    step_name: PROPOSE
    role: proposer

  - id: research
    kind: work
    step_name: RESEARCH
    role: researcher

  - id: validate
    kind: validate
    step_name: CLAIM_CHECK

prune:
  rule: epistemic_then_score
  keep_k: 2

reduce:
  rule: merge_artifacts

convergence:
  max_rounds: 0

Architecture

+----------------------------------------------------+
|                    CLI / API                        |
+---------------------------+------------------------+
|        Protocol Layer     |      User Gates        |
+---------------------------+------------------------+
|      Deliberation Engine (Orchestrator)             |
+----------------------------------------------------+
|  Epistemics  |  Tools & Policy  |  Tracing          |
+----------------------------------------------------+
|         Agents (Stubs or LLM-backed)               |
+----------------------------------------------------+
|  LLM Providers  |  Retrieval + Verification        |
+----------------------------------------------------+

Module Map

src/delibera/
  engine/        Orchestrator, operators, tree, run state
  protocol/      Declarative specs, YAML loader, interpreter
  agents/        Stubs + LLM-backed agents (planner, proposer, researcher, redteam, refiner, validator)
  epistemics/    Claims, evidence, objections, ledgers, validation
  retrieval/     Keyword, embedding, web, hybrid retrievers + verification
  llm/           LLM client protocol, Gemini provider, prompts, redaction
  scoring/       Weighted metrics, score computation, pruning support
  tools/         Tool registry, router, policy engine, built-ins (calculator, docs)
  gates/         Gate models, predicates, handlers, response application
  trace/         Events, writer, reader, replay, validation
  inspect/       Run summarization, Markdown + text report rendering
  eval/          Evaluation suite runner, loader, metrics, comparison
  cli.py         Click-based CLI entry point

What Delibera Is Not

  • Not an agent framework — Delibera is not LangChain, CrewAI, or AutoGPT. It's a deliberation engine with strict governance.
  • Not a workflow orchestrator — Delibera is not Airflow or Prefect. It's specifically for reasoning processes that require epistemic tracking.
  • Not autonomous — Delibera does not "decide for you". It produces structured decision artifacts for humans to review.
  • Not a chatbot — Outputs are artifacts, not conversations.

Documentation

See the docs/ directory for detailed documentation:

Development

# Install dev dependencies
uv sync

# Run tests (547 tests)
uv run pytest

# Type checking (strict mode)
uv run mypy src/

# Linting
uv run ruff check src/ tests/

See CONTRIBUTING.md for contributor guidelines.

Current Status (v0.2.0)

What's implemented

  • Core deliberation engine with 15-phase orchestration loop
  • Declarative YAML protocol system with expansion, pruning, convergence
  • Multi-level tree expansion — recursive depth-2+ trees with sub-plans per option
  • Parallel branch executionThreadPoolExecutor with configurable --max-parallel-branches
  • Dynamic protocols — conditional expansion based on metrics, dominance-threshold early termination
  • Epistemic layer: claims, evidence, objections with per-claim validation and support linking
  • Multi-source evidence retrieval (keyword, embedding, web, hybrid with RRF)
  • Auto-enabled web retrieval when LLM researcher is active
  • Evidence verification for web results (fetch, LLM, cross-reference)
  • LLM integration with Gemini for all agent roles (planner, proposer, researcher, red-teamer, refiner, validator)
  • LLM claim validator with semantic evidence matching (replaces keyword heuristic)
  • Balanced scoring weights — evidence can counterbalance objections for realistic scores
  • Human-in-the-loop gates (scope, tradeoff, final sign-off)
  • Tool system with policy engine, routing, and built-in tools
  • Complete tracing and deterministic replay (thread-safe for parallel execution)
  • Inspection and Markdown report generation
  • Evaluation harness with suite runner and metrics
  • CLI with extensive configuration options

Stub fallbacks

All agent roles have deterministic stub implementations for testing without API keys. LLM agents fall back to stubs on failure.

Roadmap

See ROADMAP.md for detailed next steps.

License

See LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages