my agent is dumb (maid) is a telemetry dev tool to inspect your agent behavior via agent traces.
(Under development)
- Open source
- It's customizable based on your needs
- It has a really cool mascot
- Start both services:
docker compose up --build
- Open the UI:
http://localhost:5555
- Connect your own agent
- or if you want to understand how the dev tool works, spin up the one in
/dummy-agent
- or if you want to understand how the dev tool works, spin up the one in
- Start the frontend: npm run dev:frontend
- Start the inspection backend: npm run dev:inspection
- (Optional) Start the agent: npm run dev:agent
How it works?
You have at your disposal methods present in the InspectorReporter (/reporter) that can be used as manual hooks during your agent loop lifecycle.
You can treat these methods as enhanced console logs.
Note: If you don't use typescript, just adapt the code present in the reporter based on your programming language of choice.
When you send SSE events to the inspector to be displayed in the Agent inspection panel you must send specific constrained informations (if you don't customize the frontend).
The main component is the reporter:
import { createHttpInspectionReporter } from "./reporter";
import { InspectionEventLabel } from "./protocol/types";
const reporter = createHttpInspectionReporter();Below are some examples of what informations you can send:
The log() method sends simple string messages without structured children.
await reporter.log("Full OpenRouter API response: ...");
await reporter.log("Model message: ...");The trace() method sends events with a parent/child structure that will be displayed with expandable reasoning details in the UI. The reasoning label will be highlighted in orange to distinguish it from other content (if provided).
// Send a trace event with reasoning
await reporter.trace(
"Final Assistant message",
[
{ label: InspectionEventLabel.Reasoning, data: reasoning },
{ label: InspectionEventLabel.Content, data: finalContent }
]
);You can optionally include token usage information that will be displayed as a child node:
await reporter.trace(
"API request completed",
[
{ label: InspectionEventLabel.Content, data: "..." }
],
{
promptTokens: 340,
modelOutputTokens: 49,
totalTokens: 389,
modelReasoningTokens: 30
}
);Measure and report the execution time of tool calls to identify performance bottlenecks. Wrap your tool execution with timing measurements and include both the duration and tool call details in the trace:
const startTime = performance.now();
// ... execute your tool here ...
const endTime = performance.now();
const durationMs = endTime - startTime;
// Report tool execution with timing
await inspectionReporter.trace(
`Tool ${toolName} executed`,
[
{ label: InspectionEventLabel.Timing, data: `${durationMs.toFixed(2)}ms` },
{ label: InspectionEventLabel.ToolCalls, data: JSON.stringify({ tool: toolName, args, result }, null, 2) }
]
);You can group invocations to have more readable traces and visualize the time between consecutive events within each agent loop iteration.
The latency heatmap visualizes the time between consecutive events within each agent loop iteration. To enable it, mark the start and end of your agent loop:
// At the start of processing a user message/request
await reporter.invocationStart("Agent is processing the user input...");
// ... your agent logic, traces, tool calls ...
// At the end of the loop (when response is complete)
await reporter.invocationEnd("Invocation completed");Report errors that occur during agent execution to track error rates per invocation.
Wrap error-prone operations in try/catch blocks and report errors:
try {
// ... your agent logic ...
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
await reporter.error("Agent loop failed", errorMessage);
await reporter.invocationEnd("Agent loop failed with error.");
throw error; // re-throw if needed
}You can also report specific error conditions:
// Report empty content as an error
if (hasEmptyContent) {
await reporter.error("Empty content returned", "Model returned empty or null content");
}The error rate is calculated client-side from persisted events, so it survives server restarts and accurately reflects the actual error rate based on what's visible in the UI.
tokens➜ (currentTokensUsage: number, modelContextLimit: number)model➜ (modelName: string)context➜ (ctx: ContextMessage[])tools➜ (tools: AgentToolDefinition[])
All protocol types are available in protocol/types.ts.
You can check the agent implementation in /dummy-agent and /reporter as a reference.
You can export the current inspection state to share or analyze later, and import previously saved snapshots to restore the state.
Click the "export ↓" button in the inspection header to download the current state. Two formats are available:
- JSON ➜ Full machine-readable snapshot that can be imported back into maid for further analysis
- TXT ➜ Human-readable format for easy sharing and review
Click the "import ↑" button to load a previously exported JSON snapshot. This restores the complete inspection state, allowing you to review and analyze historical agent behavior without reconnecting to a running agent.
You can generate a snapshot based on your Amp or Claude Code conversation using the generate-maid-snapshots skill present in .agents/skillsor .claude/skills.
Evaluate your agent responses using LLM-based scoring on 5 criteria: correctness, completeness, clarity, relevance, and helpfulness.
Requires OPENROUTER_API_KEY in your .env file.
await inspectionReporter.evaluable(userInput, finalContent, requestTokenUsage); // optional: customEvaluationPromptReturns scores (1-10), overall score, summary, strengths, weaknesses, and suggestions.
You can add a custom system prompt via the UI or when you call the evaluable() reporter method.
You can integrate maid to your custom agent loop by using a custom coding agent of your choice and feeding in the SETUP.md prompt.
