AgentDebug v4.0 - README

Overview

AgentDebug v4.0 is a comprehensive, self-contained debugging framework for AI agent trajectories. It combines fine-grained error analysis, critical error detection, and reinforcement learning (RL)-guided iterative debugging to autonomously identify, prioritize, and correct failures in agent execution traces.

Built with First Principles reasoning, the system models agent behavior as modular steps (planning, tool use, memory, system) and uses a Q-learning agent to learn optimal debugging strategies over time. Performance is rigorously benchmarked across time, memory, success rate, and learning convergence.

Key Features

Feature	Description
Error Taxonomy	Probabilistic error injection based on real-world failure modes
Trajectory Modeling	Structured representation of agent steps with state, action, module
Fine-Grained Analysis	Stochastic error detection with context-aware heuristics
Critical Error Detection	Prioritizes planning/tool failures (high-impact)
RL-Guided Debugging	Q-learning agent learns which modules to fix first
Performance Benchmarking	Time, memory, success rate, Q-value convergence
Cross-Platform Memory Tracking	`resource` (Unix) / `tracemalloc` (Windows)
Extensible Tools & Agents	Plug-in real tools (web search, code analysis)
Unit Tested	Full test suite with edge cases

System Architecture

[Agent Execution] → [Trajectory] → [Fine-Grained Analysis]
        ↓
[Critical Error Detection] → [RL Agent Chooses Module]
        ↓
[Correct Step] → [Update Q-Table] → [Repeat until Success]

Core Components

Step – Atomic unit: (state, action, module)
Trajectory – Sequence of steps; evaluates success
DebuggingAgent – Q-learning agent over error states
Benchmarker – Measures time/memory/success across runs
SimpleAgent – Simulates real agent with tools + memory

Error Taxonomy

Module	Error Type	Probability
planning	`reasoning_loop`	0.4
	`incomplete_plan`	0.6
tool	`tool_selection_error`	0.5
	`invalid_tool_input`	0.5
memory	`retrieval_failure`	0.7
	`context_overflow`	0.3
system	`execution_timeout`	0.6
	`resource_limit_exceeded`	0.4

Errors are injected stochastically during simulation and analysis.

Tools Included

Tool	Function
`web_search`	Simulates search: `"Web search results for: {query}"`
`code_analysis`	Returns: `"Code analysis results for: {query}"`

Easily extendable via Tool(name, func, description)

Benchmark Scenarios

Scenario	Trajectory Size	Query
Small	5	`"Find AI info"`
Medium	100	`"Find AI info"`
Large	1000	`"Complex query for AI debugging analysis"`

10 runs per scenario
Metrics logged to benchmark_results.txt
Includes std dev, peak memory, Q-value change

Output Example

Initial Trajectory Analysis:
+--------+----------+---------------------+---------+
| Step   | Module   | Action              | Error   |
+========+==========+=====================+=========+
| 1      | planning | reason_query        | None    |
| 2      | memory   | retrieve_context    | retrieval_failure |
| 3      | tool     | web_search          | None    |
| 4      | planning | generate_response   | None    |
| 5      | memory   | store_context       | None    |
+--------+----------+---------------------+---------+

Iteration 1: Refined feedback: Focused on memory. Critical error at step 2 (memory): retrieval_failure...
Trajectory successful after debugging!

Files Generated

benchmark_results.txt – Full benchmark logs
Console output – Real-time trajectory + debugging steps

How to Run

1. Execute Main Demo

python AgentDebugv4.0.txt

Runs:

Single agent simulation

3-step iterative debugging

Final trajectory evaluation

2. Run Full Benchmarks

python AgentDebugv4.0.txt

Automatically runs:

3 scenarios × 10 runs

Aggregated metrics

Logs to benchmark_results.txt

3. Run Unit Tests

python -m unittest AgentDebugv4.0.txt

Or let the script run them automatically at the end.

Requirements

Python 3.8+
Standard libraries only:
- random, logging, unittest
- typing, collections, numpy
- tabulate, time, platform
- resource (Unix), tracemalloc (Windows)

No external dependencies

Customization

Add New Modules

ERROR_TAXONOMY["new_module"] = {"error_a": 0.7, "error_b": 0.3}

Add New Tools

def my_tool(query): return f"Result: {query}"
new_tool = Tool("my_tool", my_tool, "Does something useful")
agent = SimpleAgent(tools=[..., new_tool])

Tune RL Agent

rl_agent = DebuggingAgent(
    modules=[...],
    learning_rate=0.2,
    discount_factor=0.95,
    epsilon=0.3
)

Design Philosophy (First Principles)

All agent failures are observable in trajectory
Critical errors (planning/tool) block success
Debugging is a sequential decision problem → RL
Measure everything: time, memory, learning
Self-contained, reproducible, extensible

Future Improvements

Persistent Q-table across runs
Multi-agent collaborative debugging
Real LLM integration (via API)
Visualization dashboard (Matplotlib/Plotly)
Confidence scoring per correction

Author

numbnut – Built with First Principles reasoning and better axioms.

"If you cannot debug it, you cannot trust it."

License

MIT License – Free to use, modify, and extend.

AgentDebug v4.0 – Because perfect agents don’t exist. Perfect debuggers do.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
AgentDebugv4.0		AgentDebugv4.0
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AgentDebug v4.0 - README

Overview

Key Features

System Architecture

Core Components

Error Taxonomy

Tools Included

Benchmark Scenarios

Output Example

Files Generated

How to Run

1. Execute Main Demo

2. Run Full Benchmarks

3. Run Unit Tests

Requirements

Customization

Add New Modules

Add New Tools

Tune RL Agent

Design Philosophy (First Principles)

Future Improvements

Author

License

About

Uh oh!

Releases

Packages

Satoshi88818/AgentDebugv4.0

Folders and files

Latest commit

History

Repository files navigation

AgentDebug v4.0 - README

Overview

Key Features

System Architecture

Core Components

Error Taxonomy

Tools Included

Benchmark Scenarios

Output Example

Files Generated

How to Run

1. Execute Main Demo

2. Run Full Benchmarks

3. Run Unit Tests

Requirements

Customization

Add New Modules

Add New Tools

Tune RL Agent

Design Philosophy (First Principles)

Future Improvements

Author

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages