Minimal agent runtime built with DSPy modules and a thin Python loop.
- Plan/Act/Finalize expressed as DSPy
Signature
s, with OpenAI-native tool-calling when available. - Thin runtime (
agent.py
) handles looping, tool routing, and trace persistence. - CLI and FastAPI server, plus a tiny eval harness.
- Python 3.10+
- Create a virtualenv and install (using
uv
, or see pip alternative below):
uv venv && source .venv/bin/activate
uv pip install -e .
cp .env.example .env # set OPENAI_API_KEY or configure Ollama
# Ask a question (append --utc to nudge UTC use when time is relevant)
micro-agent ask --question "What's 2*(3+5)?" --utc
# Run the API server
uvicorn micro_agent.server:app --reload --port 8000
# Run quick evals (repeat small dataset)
python evals/run_evals.py --n 50
Pip alternative:
python -m venv .venv && source .venv/bin/activate
pip install -e .
.env
is loaded automatically (viapython-dotenv
).- Set one of the following provider configs:
- OpenAI (default):
OPENAI_API_KEY
,OPENAI_MODEL
(defaultgpt-4o-mini
) - Ollama:
LLM_PROVIDER=ollama
,OLLAMA_MODEL
(e.g.llama3.2:1b
),OLLAMA_HOST
(defaulthttp://localhost:11434
)
- OpenAI (default):
- Optional tuning:
TEMPERATURE
(default0.2
),MAX_TOKENS
(default1024
) - Tool plugins:
TOOLS_MODULES="your_pkg.tools,other_pkg.tools"
to load extra tools (see Tools below) - Traces location:
TRACES_DIR
(defaulttraces/
) - Compiled demos (OpenAI planner):
COMPILED_DEMOS_PATH
(defaultopt/plan_demos.json
)
Examples:
# OpenAI
export OPENAI_API_KEY=...
export OPENAI_MODEL=gpt-4o-mini
# Ollama
export LLM_PROVIDER=ollama
export OLLAMA_MODEL=llama3.2:1b
export OLLAMA_HOST=http://localhost:11434
micro-agent ask --question <text> [--utc] [--max-steps N]
--utc
appends a hint to prefer UTC when time is used.- Saves a JSONL trace under
traces/<id>.jsonl
and prints the path.
micro-agent replay --path traces/<id>.jsonl [--index -1]
- Pretty-prints a saved record from the JSONL file.
Examples:
micro-agent ask --question "Add 12345 and 67890, then show the current date (UTC)." --utc
micro-agent ask --question "Compute (7**2 + 14)/5 and explain briefly." --max-steps 4
micro-agent replay --path traces/<id>.jsonl --index -1
- Start:
uvicorn micro_agent.server:app --reload --port 8000
- Endpoint:
POST /ask
- Request JSON:
{ "question": "...", "max_steps": 6, "use_tool_calls": bool? }
- Response JSON:
{ "answer": str, "trace_id": str, "trace_path": str, "steps": [...], "usage": {...}, "cost_usd": number }
- Health:
GET /healthz
(ok),GET /health
(provider/model),GET /version
(package version)
- Request JSON:
Example:
curl -s http://localhost:8000/ask \
-H 'content-type: application/json' \
-d '{"question":"What\'s 2*(3+5)?","max_steps":6}' | jq .
OpenAPI:
- FastAPI publishes
/openapi.json
and interactive docs at/docs
. - Schemas reflect
AskRequest
andAskResponse
models inmicro_agent/server.py
.
- Ask, capture
trace_id
, then fetch the full trace by id:
RESP=$(curl -s http://localhost:8000/ask \
-H 'content-type: application/json' \
-d '{"question":"Add 12345 and 67890, then UTC time.","max_steps":6}')
echo "$RESP" | jq .
TID=$(echo "$RESP" | jq -r .trace_id)
curl -s http://localhost:8000/trace/$TID | jq .
- Replay the saved JSONL locally using the CLI (last record by default index -1):
micro-agent replay --path traces/$TID.jsonl --index -1
- Controlled via
MICRO_AGENT_LOG
(debug|info|warning|error). Default:INFO
. - Applies to both CLI and server.
- Built-ins live in
micro_agent/tools.py
:calculator
: safe expression evaluator. Supports+ - * / ** % // ( )
and!
via rewrite tofact(n)
.now
: current timestamp;{timezone: "utc"|"local"}
(default local).
- Each tool is defined as:
Tool(
"name",
"description",
{"type":"object","properties":{...},"required":[...]},
handler_function,
)
- Plugins: set
TOOLS_MODULES
to a comma-separated list of importable modules. Each module should expose either aTOOLS: dict[str, Tool]
or aget_tools() -> dict[str, Tool]
.
Runtime validation
- Tool args are validated against the JSON Schema before execution; invalid args add a
⛔️validation_error
step and the agent requests a correction in the next loop. Seemicro_agent/tools.py
(run_tool) andmicro_agent/agent.py
(validation error handling).
Calculator limits
- Factorial capped at 12; exponent size bounded; AST node count limited; large magnitudes rejected to prevent runaway compute. Only a small set of arithmetic nodes is allowed.
- OpenAI: uses DSPy
PlanWithTools
withJSONAdapter
to enable native function-calls. The model may returntool_calls
or afinal
answer; tool calls are executed via our registry. - Others (e.g., Ollama): uses a robust prompt with few-shot JSON decision demos. Decisions are parsed with strict JSON; on failure we try
json_repair
(if installed) and Python-literal parsing. - Policy enforcement: if the question implies math, the agent requires a
calculator
step before finalizing; likewise for time/date with thenow
tool. Violations are recorded in the trace as⛔️policy_violation
steps and planning continues.
Code references (discoverability)
- Replay subcommand:
micro_agent/cli.py
(subparserreplay
, printing JSONL) - Policy enforcement markers:
micro_agent/agent.py
(look for⛔️policy_violation
and⛔️validation_error
) - Provider fallback and configuration:
micro_agent/config.py
(configure_lm
tries Ollama → OpenAI → registry fallbacks) - JSON repair in decision parsing:
micro_agent/runtime.py
(parse_decision_text
usesjson_repair
if available)
- Each run appends a record to
traces/<id>.jsonl
with fields:id
,ts
,question
,steps
,answer
. - Steps are
{tool, args, observation}
in order of execution. - Replay:
micro-agent replay --path traces/<id>.jsonl --index -1
. - Fetch by id (HTTP):
GET /trace/{id}
(CORS enabled).
- Dataset:
evals/tasks.yaml
(small, mixed math/time tasks). Rubric:evals/rubrics.yaml
. - Run:
python evals/run_evals.py --n 50
. - Metrics printed:
success_rate
,avg_latency_sec
,avg_lm_calls
,avg_tool_calls
,avg_steps
,avg_cost_usd
,n
. - Scoring supports both
expect_contains
(answer substring) andexpect_key
(key present in any tool observation). Weights come fromrubrics.yaml
(contains_weight
,key_weight
).
- Model:
gpt-4o-mini
, N=30 - Before (no demos): success_rate 1.00; avg_latency_sec ~0.188; avg_lm_calls 3.33; avg_tool_calls 1.17; avg_steps 3.17
- After (compiled demos loaded): success_rate 1.00; avg_latency_sec ~0.188; avg_lm_calls 3.33; avg_tool_calls 1.17; avg_steps 3.17 Notes: For this small dataset, demos neither help nor hurt. For larger flows, compile demos from your real tasks.
- The agent aggregates token counts and cost. If provider usage isn’t exposed, it estimates tokens from prompts/outputs and computes cost using prices.
- Set env prices for OpenAI models (USD per 1K tokens):
export OPENAI_INPUT_PRICE_PER_1K=0.005 # example
export OPENAI_OUTPUT_PRICE_PER_1K=0.015 # example
Defaults: for OpenAI models, built‑in prices are used if env isn’t set (best‑effort):
- gpt-4o-mini: $0.00015 in / $0.0006 out per 1K tokens
- gpt-4o (and 4.1): $0.005 in / $0.015 out per 1K tokens
You can override via the env vars above. Evals print
avg_cost_usd
.
- Compile optimized few-shot demos for the OpenAI
PlanWithTools
planner and save to JSON:
micro-agent optimize --n 12 --tasks evals/tasks.yaml --save opt/plan_demos.json
- Apply compiled demos automatically by placing them at the default path or setting:
export COMPILED_DEMOS_PATH=opt/plan_demos.json
- Optional: print a DSPy teleprompting template (for notebooks):
micro-agent optimize --n 12 --template
The agent loads these demos on OpenAI providers and attaches them to the PlanWithTools
predictor to improve tool selection and output consistency.
micro_agent/config.py
: configures DSPy LM. Tries Ollama first if requested, else OpenAI; supportsdspy.Ollama
,dspy.OpenAI
, and registry fallbacks likedspy.LM("openai/<model>")
.micro_agent/signatures.py
: DSPySignature
s for plan/act/finalize and OpenAI tool-calls.micro_agent/agent.py
: the runtime loop (~100+ LOC). Builds a JSON decision prompt, executes tools, enforces policy, and finalizes.micro_agent/runtime.py
: trace format, persistence, and robust JSON decision parsing utilities.micro_agent/cli.py
: CLI entry (micro-agent
).micro_agent/server.py
: FastAPI app exposingPOST /ask
.evals/
: tiny harness to sample tasks, capture metrics, and save traces.
- Make targets:
make init
,make run
,make serve
,make evals
,make test
. - Tests:
pytest -q
(note: tests are minimal and do not cover all paths).
- Build:
make docker-build
- Run (OpenAI):
OPENAI_API_KEY=... make docker-run
(maps:8000
) - Run (Ollama on host):
make docker-run-ollama
(useshost.docker.internal:11434
) - Env (OpenAI):
OPENAI_API_KEY
,OPENAI_MODEL=gpt-4o-mini
- Env (Ollama):
LLM_PROVIDER=ollama
,OLLAMA_HOST=http://host.docker.internal:11434
,OLLAMA_MODEL=llama3.1:8b
- Service:
POST http://localhost:8000/ask
andGET /trace/{id}
- DSPy is pinned to
dspy-ai>=2.5.0
. Some adapters (e.g.,JSONAdapter
,dspy.Ollama
) may vary across versions; the code tries multiple backends and falls back to generic registry forms when needed. - If
json_repair
is installed, it is used opportunistically to salvage slightly malformed JSON decisions.- Optional install:
pip install -e .[repair]
- Optional install:
- Usage/cost capture is best-effort: exact numbers depend on provider support; otherwise the agent estimates from text.
- The finalization step often composes from tool results for reliability; you can swap in a DSPy
Finalize
predictor if preferred. - Add persistence to a DB instead of JSONL by replacing
dump_trace
. - Add human-in-the-loop, budgets, retries, or branching per your needs.
Prove: an “agent” can be expressed as DSPy modules plus a thin runtime loop.