diff --git a/docs/community/engagement-plan.md b/docs/community/engagement-plan.md new file mode 100644 index 0000000..59e2ccf --- /dev/null +++ b/docs/community/engagement-plan.md @@ -0,0 +1,138 @@ +# LangChain Community Engagement Plan + +Goal: Establish AgentGuard as the go-to tool for AI agent cost control by engaging where developers already ask for help. 2-3 posts per week for the first month. + +## Channels to Monitor + +| Channel | URL | Frequency | +|---------|-----|-----------| +| LangChain GitHub Issues | github.com/langchain-ai/langchain/issues | Daily | +| LangChain GitHub Discussions | github.com/langchain-ai/langchain/discussions | Daily | +| LangGraph GitHub Issues | github.com/langchain-ai/langgraph/issues | 3x/week | +| r/LangChain | reddit.com/r/LangChain | 3x/week | +| r/LocalLLaMA | reddit.com/r/LocalLLaMA | 2x/week | +| LangChain Discord #general | discord.gg/langchain | Daily | +| LangChain Discord #help | discord.gg/langchain | Daily | +| CrewAI GitHub Discussions | github.com/joaomdmoura/crewAI/discussions | 2x/week | +| Hacker News (AI threads) | news.ycombinator.com | 2x/week | +| Stack Overflow [langchain] tag | stackoverflow.com/questions/tagged/langchain | 2x/week | + +## 10 Pre-Researched GitHub Issues to Engage + +These are recurring problem categories. Search for new instances weekly. + +### 1. "Agent stuck in infinite loop" +- **Search:** `is:issue is:open label:bug "infinite loop" OR "stuck" OR "repeating"` +- **Repos:** langchain, langgraph, crewai +- **Response angle:** LoopGuard detects repeated tool calls and raises LoopDetected + +### 2. "Unexpected high token usage / cost" +- **Search:** `is:issue "token usage" OR "cost" OR "expensive" OR "budget"` +- **Repos:** langchain, langgraph +- **Response angle:** BudgetGuard enforces hard dollar limits at runtime + +### 3. "Agent makes too many API calls" +- **Search:** `is:issue "too many calls" OR "rate limit" OR "429" OR "max iterations"` +- **Repos:** langchain, langgraph +- **Response angle:** RateLimitGuard + BudgetGuard(max_calls=N) + +### 4. "How to track costs per agent run" +- **Search:** `is:issue OR is:discussion "track cost" OR "cost per run" OR "cost tracking"` +- **Repos:** langchain, crewai +- **Response angle:** Tracer + patch_openai auto-tracks cost per call, estimate_cost() for estimates + +### 5. "Agent timeout / takes too long" +- **Search:** `is:issue "timeout" OR "takes too long" OR "hanging" OR "never finishes"` +- **Repos:** langchain, langgraph, crewai +- **Response angle:** TimeoutGuard enforces wall-clock limits + +### 6. "How to add callbacks / custom logging" +- **Search:** `is:issue OR is:discussion "callback" OR "custom logging" OR "trace"` +- **Repos:** langchain +- **Response angle:** AgentGuardCallbackHandler plugs into existing callback system + +### 7. "LangGraph node cost tracking" +- **Search:** `is:issue "cost" OR "budget" OR "tracking"` in langgraph +- **Response angle:** guarded_node decorator wraps any node with budget + tracing + +### 8. "CrewAI agent cost management" +- **Search:** `is:issue OR is:discussion "cost" OR "budget" OR "expensive"` in crewai +- **Response angle:** AgentGuardCrewHandler + step_callback integration + +### 9. "CI testing for agent behavior" +- **Search:** `is:issue "ci" OR "testing" OR "evaluation" OR "regression"` +- **Repos:** langchain, langgraph +- **Response angle:** EvalSuite + CI cost gates GitHub Action + +### 10. "Observability / monitoring for agents" +- **Search:** `is:issue OR is:discussion "observability" OR "monitoring" OR "dashboard"` +- **Repos:** langchain, crewai +- **Response angle:** Full tracing with zero dependencies, optional hosted dashboard + +## 5 Template Responses + +### Template 1: Cost overrun / budget question + +> I ran into the same problem. I built [AgentGuard](https://github.com/bmdhodl/agent47) specifically for this — it lets you set a hard dollar limit on agent runs. When the budget is hit, it raises an exception and stops the agent immediately. +> +> Quick example: `BudgetGuard(max_cost_usd=5.00)` — that's it. Works with LangChain, LangGraph, CrewAI, or raw OpenAI/Anthropic. Zero dependencies. +> +> Happy to help if you have questions about integrating it. + +### Template 2: Infinite loop / stuck agent + +> This is a common pattern — the agent calls the same tool with the same args because it can't interpret the result. I've been working on [AgentGuard](https://github.com/bmdhodl/agent47) which has a `LoopGuard` that detects exactly this. It watches a sliding window of tool calls and raises `LoopDetected` when it sees repeats. +> +> There's also `FuzzyLoopGuard` for when the args change slightly but it's still effectively looping. + +### Template 3: Cost tracking / observability + +> For cost tracking, I've been using [AgentGuard](https://github.com/bmdhodl/agent47). It has built-in pricing for OpenAI, Anthropic, Google, Mistral models and auto-tracks cost when you patch the SDK client. Output goes to JSONL files or the hosted dashboard. +> +> The LangChain integration is a callback handler: `AgentGuardCallbackHandler(budget_guard=BudgetGuard(max_cost_usd=5.00))` — auto-extracts token usage from LLM responses. + +### Template 4: CI / testing question + +> We added cost gates to our CI pipeline using [AgentGuard](https://github.com/bmdhodl/agent47). It records traces during test runs, then asserts properties like max cost, no loops, and completion time. There's a GitHub Action that fails the build if any assertion breaks. +> +> The EvalSuite API is chainable: `EvalSuite("traces.jsonl").assert_no_loops().assert_budget_under(tokens=50000).run()` + +### Template 5: LangGraph specific + +> For LangGraph cost tracking, [AgentGuard](https://github.com/bmdhodl/agent47) has a `guarded_node` decorator that wraps any node with budget and loop guards. The budget is shared across all nodes, so a $5 limit applies to the entire graph execution. +> +> You can also add a standalone `guard_node` between steps for explicit budget checks. + +## Engagement Rules + +1. **Be helpful first.** Only mention AgentGuard when it genuinely solves the problem. Never force it. +2. **No code blocks in comments.** Keep responses short (2-4 sentences), casual, and human. Link to docs for details. +3. **Answer the actual question.** If AgentGuard doesn't solve their specific problem, help anyway. Goodwill compounds. +4. **Never disparage competitors.** State facts about what AgentGuard does. Don't FUD LangSmith, Langfuse, or Portkey. +5. **Disclose when relevant.** If asked directly, say you're the maintainer. Don't hide it. +6. **One comment per thread.** Never reply to yourself or bump. If someone responds, engage naturally. +7. **Track engagement.** Log each post in the tracker below. + +## Weekly Tracker + +| Week | Date | Channel | Thread | Response | Engagement | +|------|------|---------|--------|----------|------------| +| 1 | | | | | | +| 1 | | | | | | +| 1 | | | | | | + +## Metrics (Monthly) + +- GitHub stars gained +- PyPI downloads delta +- Dashboard signups +- Inbound GitHub issues from community +- Threads where AgentGuard was mentioned by others (not us) + +## Month 1 Targets + +- 12 community posts (3/week) +- 5 new GitHub stars +- 50 new PyPI downloads +- 2 dashboard signups +- 1 organic mention by someone else diff --git a/docs/cost-guardrails.md b/docs/cost-guardrails.md new file mode 100644 index 0000000..6538329 --- /dev/null +++ b/docs/cost-guardrails.md @@ -0,0 +1,244 @@ +# Cost Guardrails Guide + +Stop runaway AI agent costs before they happen. This guide covers everything you need to enforce dollar budgets on agent runs. + +## Why Cost Guardrails? + +AI agents are expensive and unpredictable. A single agent run can make 3 or 300 LLM calls depending on the task. Without guardrails: + +- A stuck agent burns your entire OpenAI budget in minutes +- Cost overruns on autonomous tasks average 340% ([source](https://arxiv.org/abs/2401.15811)) +- You only find out when the invoice arrives + +AgentGuard's `BudgetGuard` enforces hard dollar limits at runtime — the agent stops the moment it exceeds your budget. + +## Quickstart + +```bash +pip install agentguard47 +``` + +```python +from agentguard import BudgetGuard, BudgetExceeded + +budget = BudgetGuard(max_cost_usd=5.00) + +# After each LLM call: +budget.consume(tokens=1500, calls=1, cost_usd=0.045) + +# When cumulative cost exceeds $5.00 → BudgetExceeded raised +``` + +That's it. Three lines to add a hard budget to any agent. + +## Configuration + +### BudgetGuard Parameters + +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `max_tokens` | `int` | `None` | Maximum total tokens. `None` = unlimited | +| `max_calls` | `int` | `None` | Maximum total API calls. `None` = unlimited | +| `max_cost_usd` | `float` | `None` | Maximum total cost in USD. `None` = unlimited | +| `warn_at_pct` | `float` | `None` | Fraction (0.0-1.0) to trigger warning. `None` = no warning | +| `on_warning` | `callable` | `None` | Callback invoked with message when `warn_at_pct` is crossed | + +### Examples + +```python +# Dollar limit only +BudgetGuard(max_cost_usd=10.00) + +# Dollar + call limit with warning +BudgetGuard(max_cost_usd=5.00, max_calls=100, warn_at_pct=0.8) + +# Token limit only +BudgetGuard(max_tokens=50_000) + +# Full config with warning callback +BudgetGuard( + max_cost_usd=5.00, + max_calls=200, + max_tokens=100_000, + warn_at_pct=0.8, + on_warning=lambda msg: print(f"WARNING: {msg}"), +) +``` + +### consume() Method + +```python +budget.consume(tokens=0, calls=0, cost_usd=0.0) +``` + +Call after each LLM API call. Pass any combination of tokens, calls, and cost. Raises `BudgetExceeded` if any configured limit is exceeded. + +### Checking State + +```python +state = budget.state +print(f"Tokens: {state.tokens_used}") +print(f"Calls: {state.calls_used}") +print(f"Cost: ${state.cost_used:.4f}") +``` + +## How Costs Are Calculated + +### Built-in Pricing + +AgentGuard includes hardcoded pricing for major models (last updated 2026-02-01): + +| Provider | Models | +|----------|--------| +| OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o1-mini, o3-mini | +| Anthropic | claude-3.5-sonnet, claude-3.5-haiku, claude-3-opus, claude-sonnet-4.5, claude-haiku-4.5, claude-opus-4.6 | +| Google | gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash | +| Mistral | mistral-large, mistral-small | +| Meta | llama-3.1-70b | + +### Manual Cost Estimation + +```python +from agentguard import estimate_cost + +cost = estimate_cost("gpt-4o", input_tokens=1000, output_tokens=500) +# → $0.0075 +``` + +### Custom Model Pricing + +Add pricing for custom or fine-tuned models: + +```python +from agentguard.cost import update_prices + +# (input_price_per_1k, output_price_per_1k) +update_prices({("openai", "my-fine-tuned-model"): (0.003, 0.006)}) +``` + +## Auto-Tracking with OpenAI / Anthropic + +Skip manual `consume()` calls — patch the SDK to auto-track costs: + +```python +from agentguard import Tracer, BudgetGuard, patch_openai, patch_anthropic + +tracer = Tracer( + service="my-agent", + guards=[BudgetGuard(max_cost_usd=5.00, warn_at_pct=0.8)], +) + +patch_openai(tracer) # auto-tracks all ChatCompletion calls +patch_anthropic(tracer) # auto-tracks all Messages calls + +# Use OpenAI/Anthropic normally — costs tracked automatically +``` + +When done, clean up: + +```python +from agentguard import unpatch_openai, unpatch_anthropic + +unpatch_openai() +unpatch_anthropic() +``` + +## LangChain Integration + +```bash +pip install agentguard47[langchain] +``` + +```python +from agentguard import Tracer, BudgetGuard +from agentguard.integrations.langchain import AgentGuardCallbackHandler + +tracer = Tracer(service="my-agent") +handler = AgentGuardCallbackHandler( + tracer=tracer, + budget_guard=BudgetGuard(max_cost_usd=5.00), +) + +# Pass to any LangChain component +llm = ChatOpenAI(callbacks=[handler]) +``` + +The callback handler auto-extracts token usage from LLM responses and feeds it into BudgetGuard. + +## LangGraph Integration + +```bash +pip install agentguard47[langgraph] +``` + +```python +from agentguard import Tracer, BudgetGuard +from agentguard.integrations.langgraph import guarded_node + +tracer = Tracer(service="my-graph-agent") +budget = BudgetGuard(max_cost_usd=5.00) + +@guarded_node(tracer=tracer, budget_guard=budget) +def research_node(state): + return {"messages": state["messages"] + [result]} +``` + +## Dashboard Integration + +Send traces to the hosted dashboard for centralized monitoring: + +```python +from agentguard import Tracer, BudgetGuard, HttpSink + +tracer = Tracer( + sink=HttpSink( + url="https://app.agentguard47.com/api/ingest", + api_key="ag_...", + ), + guards=[BudgetGuard(max_cost_usd=50.00)], +) +``` + +The dashboard provides: +- Real-time cost tracking across all agents +- Budget alerts via email and webhook +- Remote kill switch to stop agents mid-run +- Cost breakdown by agent, model, and time period + +## CI Cost Gates + +Fail CI if agent costs exceed a threshold: + +```yaml +- uses: bmdhodl/agent47/.github/actions/agentguard-eval@main + with: + trace-file: traces.jsonl + assertions: "no_errors,max_cost:5.00" +``` + +Full workflow: [docs/ci/cost-gate-workflow.yml](ci/cost-gate-workflow.yml) + +## FAQ + +**Q: Does BudgetGuard work without a dashboard?** +Yes. BudgetGuard is local — it runs in your process with zero network calls. The dashboard is optional. + +**Q: How accurate are the cost estimates?** +Token-level accurate for supported models. AgentGuard uses published per-token pricing. For models not in the built-in list, use `update_prices()` to add custom pricing. + +**Q: What happens when BudgetExceeded is raised?** +It's a normal Python exception. Your agent loop's try/except catches it and you decide what to do — log it, retry with a cheaper model, return a partial result, etc. + +**Q: Is it thread-safe?** +Yes. BudgetGuard uses a lock internally. Safe to share across threads. + +**Q: Can I reset the budget mid-run?** +Create a new `BudgetGuard` instance. There's no reset method by design — budgets should be immutable per run. + +## API Reference + +- [`BudgetGuard`](https://github.com/bmdhodl/agent47#guards) — budget enforcement +- [`estimate_cost()`](https://github.com/bmdhodl/agent47#cost-tracking) — per-call cost estimation +- [`patch_openai()`](https://github.com/bmdhodl/agent47#openai--anthropic-auto-instrumentation) — auto-instrumentation +- [`AgentGuardCallbackHandler`](https://github.com/bmdhodl/agent47#langchain) — LangChain integration +- [`guarded_node`](https://github.com/bmdhodl/agent47#langgraph) — LangGraph integration diff --git a/site/blog/ai-agent-cost-overruns.html b/site/blog/ai-agent-cost-overruns.html new file mode 100644 index 0000000..d84edb0 --- /dev/null +++ b/site/blog/ai-agent-cost-overruns.html @@ -0,0 +1,338 @@ + + + + + + AI Agent Cost Overruns: Why They Happen and How to Prevent Them | AgentGuard + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+

AI Agent Cost Overruns: Why They Happen and How to Prevent Them

+
By AgentGuard Team · February 20, 2026 · 7 min read
+ +

AI agents are moving from demos to production. With that shift comes a problem nobody talks about until it hits their invoice: cost overruns averaging 340% on autonomous tasks.

+ +

A research agent tasked with a $2 job returns a $9 bill. A code review agent loops 47 times on a single file. A customer support agent escalates to GPT-4 for every message, burning through $200 in an afternoon. These are not edge cases. They are the default behavior of autonomous agents without runtime budget enforcement.

+ +

The Scale of the Problem

+ +

LLM API costs are deceptively linear in documentation and exponential in practice. Here is why:

+ + + +
+

Real scenario: A LangChain ReAct agent with access to a web search tool was asked to "find the best restaurant in Austin." It called the search tool 83 times, each time refining its query, spending $47 on a task that should have cost $0.50.

+
+ +

Three Failure Modes That Drain Budgets

+ +
+
+ 1. Infinite Loops +

Agent calls the same tool with the same arguments, gets the same result, and tries again. Common with ReAct agents that misinterpret tool output as an error. Each loop iteration costs $0.03-0.30 depending on context size.

+
+
+ 2. Escalating Retries +

Agent encounters an error and retries with progressively longer prompts. "Add more context" is the default recovery strategy for most LLMs. Each retry is more expensive than the last because the context window grows.

+
+
+ 3. Model Cascading +

Agent decides its current model is not capable enough and routes to a more expensive one. GPT-3.5 to GPT-4, Claude Haiku to Opus. A single cascade can 10x the cost of a step, and the agent may cascade on every step.

+
+
+ +

Why Existing Tools Do Not Fix This

+ +

The AI observability market is full of tools that track costs. LangSmith, Langfuse, Portkey -- they all show you beautiful dashboards of what your agents spent. The problem is timing. These tools are post-hoc. They tell you what happened after the damage is done.

+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CapabilityMonitoring ToolsRuntime Enforcement
Track cost per callYesYes
Dashboard visualizationYesOptional
Stop agent at dollar limitNoYes
Detect infinite loopsNoYes
Warn before budget hitNoYes
Enforce in CI/CDNoYes
+
+ +

Monitoring tells you the house burned down. Enforcement prevents the fire. You need both, but enforcement is the one that saves money.

+ +

The Solution: Runtime Budget Enforcement

+ +

Runtime enforcement means the guard runs inside your agent process. Every LLM call checks the budget before returning. When the limit is hit, the guard raises an exception and the agent stops immediately.

+ +

Here is what it looks like with AgentGuard:

+ +
from agentguard import Tracer, BudgetGuard, LoopGuard, patch_openai
+
+# Two guards: budget cap + loop detection
+budget = BudgetGuard(max_cost_usd=5.00, warn_at_pct=0.8)
+loops = LoopGuard(max_repeats=3, window=6)
+
+tracer = Tracer(
+    service="research-agent",
+    guards=[budget, loops],  # auto-check on every event
+)
+patch_openai(tracer, budget_guard=budget)
+
+# Agent runs normally. Guards enforce limits automatically.
+# - BudgetExceeded raised at $5.00
+# - LoopDetected raised if same tool called 3x in 6 events
+ +

Three things happen automatically:

+ +
    +
  1. Every OpenAI call is intercepted. Token usage and cost are extracted from the response and fed to the BudgetGuard.
  2. +
  3. Every tool call is checked for loops. The LoopGuard tracks the last N events and detects repeated patterns.
  4. +
  5. Exceptions propagate up. BudgetExceeded and LoopDetected are standard Python exceptions. They stop the agent cleanly, no matter what framework you use.
  6. +
+ +
+

Key insight: Guards that raise exceptions are fundamentally different from alerts. An alert requires a human to notice and act. An exception requires no human -- it stops the agent immediately, even at 3 AM.

+
+ +

The Cost of Inaction

+ +

Every day you run agents without budget enforcement is a day you are betting on best-case behavior from a system designed to be unpredictable. The math is simple:

+ + + +

Adding a budget guard takes three lines of code and costs nothing. The SDK is free, MIT-licensed, and has zero dependencies.

+ +
+

Stop overspending on AI agents

+

Three lines of Python. Zero dependencies. Hard budget limits that actually stop the agent.

+
pip install agentguard47
+ +
+
+ + +
+ + \ No newline at end of file diff --git a/site/blog/budget-limits-ai-agents.html b/site/blog/budget-limits-ai-agents.html new file mode 100644 index 0000000..fbde633 --- /dev/null +++ b/site/blog/budget-limits-ai-agents.html @@ -0,0 +1,308 @@ + + + + + + How to Set Budget Limits on AI Agents | AgentGuard + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+

How to Set Budget Limits on AI Agents

+
By AgentGuard Team · February 20, 2026 · 6 min read
+ +

AI agents are powerful, but they are also expensive and unpredictable. An autonomous agent with access to GPT-4 can burn through $50 in minutes if it gets stuck in a reasoning loop or decides to call an API 200 times to "be thorough." According to internal benchmarks, cost overruns on autonomous agent tasks average 340% above the expected budget.

+ +

The problem is not that agents are expensive. The problem is that nothing stops them once they start spending. Most observability tools track costs after the fact. By the time you see the dashboard, the damage is done.

+ +

This guide shows three patterns for setting hard budget limits on AI agents using Python. Each pattern builds on the last, from manual caps to fully automatic cost tracking.

+ +

Pattern 1: Hard Dollar Cap

+ +

The simplest pattern. Create a BudgetGuard with a dollar limit and call consume() after each LLM call. When the limit is hit, it raises BudgetExceeded and your agent stops immediately.

+ +
from agentguard import BudgetGuard, BudgetExceeded
+
+# Create a guard with a $5 limit
+guard = BudgetGuard(max_cost_usd=5.00)
+
+def call_llm(prompt):
+    response = openai_client.chat.completions.create(
+        model="gpt-4",
+        messages=[{"role": "user", "content": prompt}]
+    )
+    cost = response.usage.total_tokens * 0.00003  # approximate
+    guard.consume(cost_usd=cost)  # raises BudgetExceeded at $5
+    return response
+
+try:
+    for step in range(100):
+        result = call_llm("Next step in research...")
+except BudgetExceeded as e:
+    print(f"Agent stopped: {e}")
+    # BudgetExceeded: cost_usd 5.02 exceeds limit 5.00
+ +

This is a hard stop. The exception propagates up and kills the agent loop. No graceful degradation, no warnings -- just a circuit breaker. For many use cases, this is exactly what you want.

+ +

You can also set limits on tokens and call counts:

+ +
guard = BudgetGuard(
+    max_cost_usd=5.00,   # dollar cap
+    max_tokens=500000,    # token cap
+    max_calls=50,         # call count cap
+)
+guard.consume(tokens=1500, calls=1, cost_usd=0.045)
+ +

Any limit that gets hit first triggers BudgetExceeded. This is useful when you want to guard against both cost and runaway call volume.

+ +

Pattern 2: Warning at 80%

+ +

Sometimes you want a heads-up before the hard stop. The warn_at_pct parameter fires a callback when usage crosses a threshold, giving your agent a chance to wrap up gracefully.

+ +
from agentguard import BudgetGuard
+
+def on_budget_warning(msg):
+    print(f"WARNING: {msg}")
+    # Could also: switch to cheaper model, save state, notify Slack
+
+guard = BudgetGuard(
+    max_cost_usd=5.00,
+    warn_at_pct=0.8,         # warn at 80% ($4.00)
+    on_warning=on_budget_warning,
+)
+
+# After consuming $4.01:
+# WARNING: cost_usd at 80.2% of limit 5.00 (used 4.01)
+
+# After consuming $5.01:
+# raises BudgetExceeded
+ +

The warning fires once. After that, the agent keeps running until it hits the hard cap. This gives you a two-stage system: warn at 80%, kill at 100%.

+ +
+

Tip: Use the warning callback to switch to a cheaper model (GPT-4o-mini instead of GPT-4) when you are approaching the budget. This lets the agent finish its current task without blowing the cap.

+
+ +

Pattern 3: Auto-Tracking with patch_openai

+ +

Manual consume() calls work but require discipline. Miss one call and your budget tracking is wrong. The patch_openai function eliminates this by automatically intercepting every OpenAI API call, extracting token usage from the response, and feeding it to the BudgetGuard.

+ +
from agentguard import Tracer, BudgetGuard, patch_openai
+
+# Set up budget guard and tracer
+budget = BudgetGuard(max_cost_usd=5.00, warn_at_pct=0.8)
+tracer = Tracer(service="my-agent", guards=[budget])
+
+# Patch OpenAI — every call is now auto-tracked
+patch_openai(tracer, budget_guard=budget)
+
+# Use OpenAI normally. No manual consume() calls needed.
+import openai
+client = openai.OpenAI()
+
+for step in range(100):
+    response = client.chat.completions.create(
+        model="gpt-4",
+        messages=[{"role": "user", "content": "Research step..."}]
+    )
+    # BudgetGuard.consume() is called automatically
+    # with real token counts and cost from the response
+ +

No manual bookkeeping. Every chat.completions.create call is intercepted, the token usage is extracted from the response, the cost is estimated using built-in per-model pricing, and consume() is called automatically. When the budget is hit, BudgetExceeded raises just like before.

+ +

The same pattern works for Anthropic:

+ +
from agentguard import patch_anthropic
+
+patch_anthropic(tracer, budget_guard=budget)
+
+# Now Anthropic calls are auto-tracked too
+ +

Which Pattern Should You Use?

+ + + +

All three patterns can be combined. Use patch_openai for automatic tracking and add a warn_at_pct callback for early warnings. The hard cap is always enforced regardless of which pattern you choose.

+ +

What Happens When the Budget Is Hit?

+ +

BudgetExceeded is a regular Python exception. It propagates up the call stack and can be caught with a standard try/except. This means it integrates naturally with any agent framework:

+ + + +

No special handling needed. The agent stops, and you get the last valid state.

+ +
+

Start enforcing budgets in 60 seconds

+

Zero dependencies. MIT licensed. Works with any Python agent.

+
pip install agentguard47
+ +
+
+ + +
+ + \ No newline at end of file diff --git a/site/blog/langchain-cost-tracking.html b/site/blog/langchain-cost-tracking.html new file mode 100644 index 0000000..73955a9 --- /dev/null +++ b/site/blog/langchain-cost-tracking.html @@ -0,0 +1,408 @@ + + + + + + LangChain Cost Tracking: Complete Guide | AgentGuard + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+

LangChain Cost Tracking: Complete Guide

+
By AgentGuard Team · February 20, 2026 · 8 min read
+ +

LangChain is the most popular framework for building AI agents. But it has a blind spot: there is no built-in way to track or enforce cost limits. You can build a ReAct agent in 10 lines of code, but you have no idea what it will cost until the bill arrives.

+ +

This guide covers everything you need to add cost tracking and budget enforcement to LangChain and LangGraph agents using AgentGuard. From basic token counting to CI cost gates, you will have full visibility and control over agent spend.

+ +

Why LangChain Cost Tracking Matters

+ +

LangChain agents are autonomous. They decide how many LLM calls to make, which tools to invoke, and when to stop. This is powerful for building useful agents, but it means costs are inherently unpredictable:

+ + + +

Without cost tracking, you are flying blind. Without budget enforcement, you have no safety net.

+ +

Step 1: Install AgentGuard with LangChain Support

+ +
pip install agentguard47[langchain]
+ +

This installs the core SDK (zero dependencies) plus the optional langchain-core integration. If you are using LangGraph, that is included automatically since LangGraph depends on langchain-core.

+ +

Step 2: Set Up the Callback Handler

+ +

AgentGuard integrates with LangChain through its callback system. The AgentGuardCallbackHandler hooks into every LLM call, chain run, and tool invocation to automatically track costs and check guards.

+ +
from agentguard import Tracer, BudgetGuard, LoopGuard
+from agentguard.integrations.langchain import AgentGuardCallbackHandler
+
+# Set up guards
+budget = BudgetGuard(
+    max_cost_usd=5.00,
+    warn_at_pct=0.8,
+    on_warning=lambda msg: print(f"Budget warning: {msg}"),
+)
+loops = LoopGuard(max_repeats=3, window=6)
+
+# Create tracer with JSONL output
+tracer = Tracer(service="langchain-agent")
+
+# Create the callback handler
+handler = AgentGuardCallbackHandler(
+    tracer=tracer,
+    budget_guard=budget,
+    loop_guard=loops,
+)
+
+# Use with any LangChain agent or chain
+result = agent.invoke(
+    {"input": "Research the latest AI papers"},
+    config={"callbacks": [handler]},
+)
+ +

That is it. Every LLM call the agent makes now flows through AgentGuard. Token usage is extracted automatically from the LLM response metadata. If the agent exceeds $5 or loops 3 times, it gets a BudgetExceeded or LoopDetected exception.

+ +

Step 3: Auto-Extract Token Usage

+ +

The callback handler automatically extracts token counts from LangChain's LLM response objects. This works with any LLM provider that LangChain supports -- OpenAI, Anthropic, Google, Cohere, and others.

+ +
+
+ What gets tracked +

Prompt tokens, completion tokens, total tokens, model name, and estimated cost in USD for every LLM call.

+
+
+ How cost is calculated +

AgentGuard's built-in estimate_cost() uses per-model pricing tables for GPT-4, GPT-4o, Claude, Gemini, and more. Updated regularly.

+
+
+ What gets guarded +

BudgetGuard checks cumulative cost after each call. LoopGuard checks for repeated tool invocations. Both raise exceptions on violation.

+
+
+ What gets traced +

Every chain start/end, LLM call, and tool invocation is emitted as a structured JSONL event with timing, cost, and span hierarchy.

+
+
+ +

Step 4: LangGraph with guarded_node

+ +

If you are using LangGraph, AgentGuard provides a guarded_node decorator that wraps individual graph nodes with tracing and guard checks. This gives you per-node cost tracking across your entire graph.

+ +
from agentguard import Tracer, BudgetGuard, LoopGuard
+from agentguard.integrations.langgraph import guarded_node
+from langgraph.graph import StateGraph
+
+budget = BudgetGuard(max_cost_usd=10.00)
+loops = LoopGuard(max_repeats=3)
+tracer = Tracer(service="langgraph-agent")
+
+# Decorate each node — tracing + guards applied automatically
+@guarded_node(tracer=tracer, budget_guard=budget, loop_guard=loops)
+def research_node(state):
+    # Your node logic here
+    result = llm.invoke(state["query"])
+    return {"research": result.content}
+
+@guarded_node(tracer=tracer, budget_guard=budget, loop_guard=loops)
+def summarize_node(state):
+    result = llm.invoke(f"Summarize: {state['research']}")
+    return {"summary": result.content}
+
+# Build graph normally
+graph = StateGraph(dict)
+graph.add_node("research", research_node)
+graph.add_node("summarize", summarize_node)
+graph.add_edge("research", "summarize")
+graph.set_entry_point("research")
+app = graph.compile()
+ +

Each node execution is traced as a separate span. The BudgetGuard tracks cumulative cost across all nodes, so a $10 limit applies to the entire graph run, not per node. If one node exhausts the budget, subsequent nodes are never reached.

+ +
+

Tip: For existing graphs where you cannot use decorators, use the guard_node function instead: graph.add_node("research", guard_node(research_fn, tracer=tracer, budget_guard=budget))

+
+ +

Step 5: View Cost Reports

+ +

AgentGuard writes structured JSONL trace files by default. Use the CLI to generate human-readable cost reports from these traces:

+ +
# Generate a cost report from trace data
+agentguard report traces.jsonl
+
+# Output:
+# ┌──────────────────────┬────────┬──────────┬──────────┐
+# │ Span                 │ Calls  │ Tokens   │ Cost     │
+# ├──────────────────────┼────────┼──────────┼──────────┤
+# │ research_node        │ 3      │ 12,450   │ $0.37    │
+# │ summarize_node       │ 1      │ 2,100    │ $0.06    │
+# │ TOTAL                │ 4      │ 14,550   │ $0.43    │
+# └──────────────────────┴────────┴──────────┴──────────┘
+ +

This gives you per-node and per-agent cost breakdowns. You can see exactly which part of your pipeline is expensive and optimize accordingly.

+ +

Step 6: CI Cost Gates with GitHub Actions

+ +

The final piece is preventing cost regressions in CI. AgentGuard includes a GitHub Action that runs your agent test suite and fails the build if costs exceed a threshold.

+ +
# .github/workflows/cost-gate.yml
+name: Cost Gate
+on: [pull_request]
+jobs:
+  cost-check:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - run: pip install agentguard47
+      - run: python -m pytest tests/ -v
+      - name: Check costs
+        run: |
+          agentguard eval traces.jsonl \
+            --assert-max-cost 2.00 \
+            --assert-max-calls 50
+ +

If any test run generates traces exceeding $2.00 total cost or 50 LLM calls, the CI build fails. This catches cost regressions before they reach production -- a new prompt template that accidentally doubles token usage, a tool that triggers extra LLM calls, or a loop that was not caught in local testing.

+ +

Putting It All Together

+ +

Here is a complete example that combines everything: LangChain callback handler, budget enforcement, loop detection, and JSONL trace output.

+ +
from agentguard import (
+    Tracer, BudgetGuard, LoopGuard,
+    JsonlFileSink, BudgetExceeded, LoopDetected,
+)
+from agentguard.integrations.langchain import AgentGuardCallbackHandler
+from langchain_openai import ChatOpenAI
+from langchain.agents import create_react_agent, AgentExecutor
+
+# Guards
+budget = BudgetGuard(
+    max_cost_usd=5.00,
+    warn_at_pct=0.8,
+    on_warning=lambda msg: print(f"[WARN] {msg}"),
+)
+loops = LoopGuard(max_repeats=3, window=6)
+
+# Tracer with file output
+sink = JsonlFileSink("traces.jsonl")
+tracer = Tracer(sink=sink, service="my-agent", guards=[budget, loops])
+
+# LangChain setup
+handler = AgentGuardCallbackHandler(
+    tracer=tracer,
+    budget_guard=budget,
+    loop_guard=loops,
+)
+llm = ChatOpenAI(model="gpt-4", callbacks=[handler])
+
+# Run agent with protection
+try:
+    result = agent_executor.invoke(
+        {"input": "Analyze Q4 sales data"},
+        config={"callbacks": [handler]},
+    )
+    print(f"Result: {result}")
+except BudgetExceeded as e:
+    print(f"Agent stopped — budget exceeded: {e}")
+except LoopDetected as e:
+    print(f"Agent stopped — loop detected: {e}")
+
+# View the report
+# $ agentguard report traces.jsonl
+ +

What You Get

+ + + +

All of this with zero hard dependencies. The core SDK uses Python stdlib only. The LangChain integration requires langchain-core, which you already have if you are using LangChain.

+ +
+

Add cost tracking to your LangChain agent

+

One callback handler. Zero hard dependencies. Budget enforcement that actually stops the agent.

+
pip install agentguard47[langchain]
+ +
+
+ + +
+ + \ No newline at end of file diff --git a/site/compare.html b/site/compare.html new file mode 100644 index 0000000..d4da42d --- /dev/null +++ b/site/compare.html @@ -0,0 +1,362 @@ + + + + + + AgentGuard vs LangSmith vs Langfuse vs Portkey — AI Agent Cost Tracking Comparison + + + + + + + + + + + + + + + + + + + + + + + +
+ + +

AgentGuard vs LangSmith vs Langfuse vs Portkey

+

The only AI agent observability tool with runtime budget enforcement. Compare features, pricing, and integrations side-by-side.

+ +
+ View on GitHub + Try Free Dashboard +
+ + +
+
+ Runtime intervention +

AgentGuard kills agents mid-run when they exceed spend limits. Others only report after the damage is done.

+
+
+ Zero dependencies +

Pure Python stdlib. One package, nothing to audit. No supply chain risk, no dependency conflicts.

+
+
+ Free and open source +

MIT-licensed SDK. No per-trace pricing for local use. Dashboard optional.

+
+
+ + +

Feature comparison

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FeatureAgentGuardLangSmithLangfusePortkey
Hard budget enforcementYes — raises exception at limitNoNoNo
Kill agent mid-runYes — BudgetExceeded stops executionNoNoNo
Loop detectionYes — exact + fuzzy + A-B-A-BNoNoNo
Cost trackingYesYesYesYes
Tracing / spansYesYesYesYes
Timeout guardYes — wall-clock enforcementNoNoPartial — gateway-level only
Rate limit guardYes — per-minute throttlingNoNoYes — gateway-level
Runtime dependenciesZero5+3+3+
Open source SDKMITProprietaryMITPartial
Self-hosted optionYes — SDK works fully offlineNoYesNo
CI cost gatesYes — GitHub Action includedNoNoNo
LangChain integrationYesYes — nativeYesYes
LangGraph integrationYes — guarded_node decoratorYes — nativePartialNo
CrewAI integrationYesNoPartialNo
OpenAI/Anthropic auto-patchYes — one-line patchingVia wrapperYesYes — gateway proxy
+
+ + +

Pricing comparison

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PlanAgentGuardLangSmithLangfusePortkey
Free tierUnlimited local use
+ 10K dashboard events/mo
5K traces/mo50K observations/mo10K requests/mo
Paid plans$39/mo (Pro)
$79/mo (Team)
$39/mo + $2.50/1K traces$59/mo (Pro)$49/mo
Per-trace pricingNo — flat rateYes — $2.50/1K tracesYes — overage chargesNo
SDK costFree forever (MIT)Free (proprietary)Free (MIT)Free
+
+ + +

Why AgentGuard for budget enforcement?

+
+
+

LangSmith tracks costs. AgentGuard enforces them.

+

LangSmith shows you what an agent spent after it finishes. AgentGuard raises BudgetExceeded at the dollar limit you set and stops the agent immediately. The difference between a dashboard alert and a circuit breaker.

+
+
+

Langfuse is open source. So is AgentGuard.

+

Both are MIT-licensed. Langfuse focuses on tracing and prompt management. AgentGuard adds runtime guards — budget limits, loop detection, and timeout enforcement that stop agents before they cause damage.

+
+
+

Portkey is a gateway. AgentGuard is a library.

+

Portkey proxies all LLM traffic through their servers. AgentGuard runs in your process with zero network calls. No latency overhead, no data leaving your infrastructure, no single point of failure.

+
+
+

Zero dependencies means zero risk.

+

Every dependency is supply chain attack surface. AgentGuard uses Python stdlib only. One package to install, one package to audit. No transitive vulnerabilities, no version conflicts.

+
+
+ + +

Add budget enforcement in 3 lines

+

No signup required. No API keys. Works offline.

+
from agentguard import Tracer, BudgetGuard, patch_openai
+
+tracer = Tracer(guards=[BudgetGuard(max_cost_usd=5.00, warn_at_pct=0.8)])
+patch_openai(tracer)  # auto-tracks every OpenAI call
+
+# Use OpenAI normally — agent stops at $5
+ +
+ View on GitHub + pip install agentguard47 +
+ + +
+ + diff --git a/site/index.html b/site/index.html index 9a01cc7..4c2e1bf 100644 --- a/site/index.html +++ b/site/index.html @@ -214,7 +214,7 @@

Your agents are running. Do you know what they're spending?

- 502 + 516 Tests passing
@@ -407,6 +407,7 @@

Get updates

© 2026 BMD PAT LLC · MIT-licensed SDK · Zero dependencies

diff --git a/site/sitemap.xml b/site/sitemap.xml index 67dab7b..2f7ea2f 100644 --- a/site/sitemap.xml +++ b/site/sitemap.xml @@ -10,4 +10,24 @@ 2026-02-14 0.7 + + https://agentguard47.com/compare.html + 2026-02-20 + 0.9 + + + https://agentguard47.com/blog/budget-limits-ai-agents.html + 2026-02-20 + 0.8 + + + https://agentguard47.com/blog/ai-agent-cost-overruns.html + 2026-02-20 + 0.8 + + + https://agentguard47.com/blog/langchain-cost-tracking.html + 2026-02-20 + 0.8 +