Point it at your agent repo. It discovers what's tunable. It optimizes your agents.
AutoResearchClaw writes papers. CORAL evolves code. AutoAgentClaw optimizes agents.
- [2026-03-24] Skill-based architecture: OpenClaw-native skills, self-improving learnings, skill-creator meta-skill
- [2026-03-23] 3 verified benchmarks: HotpotQA (+29.3%), Customer Support (+5.1%), GSM8K (+7.1%)
- [2026-03-22] Enhanced research pipeline: Semantic Scholar + arXiv + GitHub + PyPI search with provenance
- [2026-03-22] 4 execution modes: per-experiment, autonomous, parallel, parallel-autonomous (CORAL-style)
- [2026-03-21] 4-level optimization hierarchy (MaAS-informed) with algorithm registry
- [2026-03-21] Cross-run learning with level-aware skills and algorithm metadata
- [2026-03-21] Initial release with 12-stage pipeline, protected eval, sentinel watchdog
AutoAgentClaw has been tested on real benchmarks with real LLM-based agent systems. All results use Claude subscription auth ($0 API cost).
| Benchmark | Agents | Data | Baseline | After | Improvement | Key Technique |
|---|---|---|---|---|---|---|
| HotpotQA | 2 (researcher + reasoner) | Real HotpotQA (20 questions) | 0.5597 | 0.7236 | +29.3% | Autonomous LLM + Optuna Bayesian |
| Customer Support | 3 (classifier + responder + reviewer) | Synthetic (10 tickets) | 0.8783 | 0.9233 | +5.1% | Feedback loop + Autonomous LLM |
| GSM8K Math | 3 (decomposer + solver + verifier) | Real GSM8K (15 problems) | 0.9333 | 1.0000 | +7.1% | Autonomous LLM (communication) |
All examples are included in
docs/examples/— clone and run them yourself.
📈 What the optimizer discovered
HotpotQA — The reasoner produced correct but verbose answers (high accuracy, low F1). The optimizer:
- (Level 1) Added output format constraints → +2.7%
- (Level 2) Optuna found optimal temperature=0.2 + max_tokens=200 → +23.2%
- (Level 1) Improved researcher information quality → +3.4%
Customer Support — Classification was perfect but response quality was weak. The optimizer:
- (Level 2) Grid searched token budgets → 0% (config wasn't the bottleneck)
- (Level 1) Feedback loop refined responder prompt → +2.8%
- (Level 3) Autonomous LLM restructured reviewer approval logic → +5.1%
GSM8K Math — 14/15 problems correct, 1 failure on complex multi-step reasoning. The optimizer:
- (Level 1) Tried prompt refinement → 0% (prompts weren't the issue)
- (Level 2) Grid searched temperature/tokens → 0%
- (Level 3) Autonomous LLM improved inter-agent communication → +7.1% (15/15 correct)
pip install -e . && autoagent run \
--target ~/my-agents \
--eval eval.py \
--metric accuracy \
--direction maximize \
--auto-approveAutoAgentClaw reads your agent repo, discovers tunable parameters, researches optimization techniques, and runs experiments — all automatically. No rewriting required.
You have a multi-agent system. It works, but you want it to work better. AutoAgentClaw:
| Step | What Happens | Inspired By |
|---|---|---|
| 🔍 Discover | Reads your repo, finds agents, prompts, configs, topology | Zero-config (novel) |
| 📚 Research | Searches Semantic Scholar + arXiv + web for optimization techniques | AutoResearchClaw |
| 🧠 Strategize | Analyzes baseline, identifies bottleneck, plans level-by-level approach | ARIS + MaAS |
| ⚙️ Optimize | CORAL-style autonomous workers run experiments in parallel worktrees | CORAL |
| 📊 Track | Protected eval, sentinel watchdog, leaderboard, dashboard | CORAL |
| 💡 Learn | Skills accumulate across runs — run N+1 is smarter than run N | MetaClaw |
AutoAgentClaw is an OpenClaw-compatible service. Install it in OpenClaw and launch autonomous agent optimization with a single message — or use it standalone via CLI, Claude Code, or any AI coding assistant.
If you already use OpenClaw as your AI assistant:
1️⃣ Share the GitHub repo URL with OpenClaw
2️⃣ OpenClaw auto-reads AUTOAGENT_AGENTS.md → understands the optimization pipeline
3️⃣ Say: "Optimize the agents at ~/my-agents using eval.py"
4️⃣ Done — OpenClaw clones, installs, configures, runs, and returns results
That's it. OpenClaw handles git clone, pip install, config setup, and pipeline execution automatically. You just chat.
💡 What happens under the hood
- OpenClaw reads
AUTOAGENT_AGENTS.md→ learns the agent optimizer role - OpenClaw reads
README.md→ understands installation and pipeline structure - OpenClaw copies
config.autoagent.example.yaml→config.autoagent.yaml - Uses your Claude subscription (or asks for API key)
- Runs
pip install -e .+autoagent run --target <path> --eval <script> - Returns the optimization report, best configs, topology diffs, and leaderboard
For deeper integration, AutoAgentClaw includes a bridge adapter system with 6 optional capabilities:
# config.autoagent.yaml
openclaw_bridge:
use_cron: true # ⏰ Scheduled optimization runs (overnight)
use_message: true # 💬 Progress notifications (Discord/Slack/Telegram)
use_memory: true # 🧠 Cross-session skill persistence
use_sessions_spawn: true # 🔀 Parallel optimization workers (CORAL-style)
use_web_fetch: true # 🌐 Research optimization techniques
use_browser: false # 🖥️ Not needed for optimizationEach flag activates a typed adapter protocol. When OpenClaw provides these capabilities, the adapters consume them without code changes.
AutoAgentClaw can use any ACP-compatible coding agent as its LLM backend — no API keys required:
| Agent | Command | Provider | Status |
|---|---|---|---|
| Claude Code | claude |
Anthropic | ✅ Tested |
| Codex CLI | codex |
OpenAI | 🔲 Supported (untested) |
| Copilot CLI | gh |
GitHub | 🔲 Supported (untested) |
| Gemini CLI | gemini |
🔲 Supported (untested) | |
| OpenCode | opencode |
Open Source | 🔲 Supported (untested) |
| Kimi CLI | kimi |
Moonshot | 🔲 Supported (untested) |
All benchmarks in this README were tested with Claude Code (subscription auth, $0 cost). Other ACP agents are supported via the same interface but have not been verified yet. Community testing welcome — see CONTRIBUTING.md.
┌──────────────────────────────────────────────────────────────────────┐
│ │
│ Phase A: Discovery Phase B: Strategy │
│ ┌─────────┐ ┌─────────┐ ┌─────────────┐ ┌──────────┐ │
│ │ 1. Scan │→│ 2. Find │→ │ 4. Research │→│ 5. Gate │ │
│ │ Repo │ │ Agents │ │ SS+arXiv+Web │ │ (human) │ │
│ └─────────┘ └────┬────┘ └──────┬───────┘ └────┬─────┘ │
│ │ 3.Gate │ │ │
│ └────────────────┘ │ │
│ ▼ │
│ Phase C: Optimization │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ 6. Level 1 │→│ 7. Level 2 │→│ 8. Level 3 │→│ 9. Cross- │ │
│ │ Behavior │ │ Config │ │ Comms │ │ Validate │ │
│ │ (parallel │ │ (parallel │ │ (parallel │ │ │ │
│ │ workers) │ │ workers) │ │ workers) │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └─────┬──────┘ │
│ │ │
│ Phase D: Analysis Phase E: Finalize │ │
│ ┌──────────────┐ ┌────────┐ ┌────────────┐ │ │
│ │10. Extract │→│11. Gen │→│12. Apply │◄──────────────┘ │
│ │ Skills │ │ Report │ │ Gate │ │
│ └──────────────┘ └────────┘ └────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────┘
| Phase | Stages | What Happens |
|---|---|---|
| A: Discovery | 1-3 | Read repo, discover agents & tunable params, classify by optimization level |
| B: Strategy | 4-5 | Multi-source research (Semantic Scholar + arXiv + web), LLM strategy planning |
| C: Optimization | 6-9 | Level-based optimization with CORAL-style parallel autonomous workers |
| D: Analysis | 10-11 | Extract skills with level + algorithm metadata, generate report |
| E: Finalization | 12 | Human reviews — apply to main, new branch, or reject |
Not every problem needs all levels. The framework automatically determines which levels are relevant:
| Level | What It Optimizes | Cost | Example |
|---|---|---|---|
| 1: Agent Behavior | Prompts, instructions, output format | 💚 Lowest | "Add conciseness constraint" |
| 2: Agent Configuration | Temperature, max_tokens, model, tools | 🟡 Moderate | "Reduce temperature to 0.3" |
| 3: Inter-Agent Communication | What info is passed, message format | 🟠 Higher | "Filter researcher output" |
| 4: System Topology | Add/remove agents, restructure graph | 🔴 Highest | "Add a review agent with feedback loop" |
The framework optimizes Level 1 first (cheapest, highest ROI), then progresses to higher levels only if lower levels show diminishing returns.
execution:
worker_mode: "autonomous" # per-experiment | autonomous
max_parallel_workers: 3 # 1 = sequential, >1 = parallel| Mode | worker_mode |
max_parallel_workers |
Description |
|---|---|---|---|
| Sequential | per-experiment |
1 | One LLM call per experiment (fastest for small budgets) |
| Parallel | per-experiment |
3 | Multiple short-lived calls simultaneously |
| Autonomous | autonomous |
1 | CORAL-style long-lived session (recommended) |
| Full CORAL | autonomous |
3 | Parallel long-lived sessions in git worktrees |
Before optimizing, the framework researches techniques from multiple sources:
| Source | What It Searches | API |
|---|---|---|
| Semantic Scholar | Academic papers on agent optimization | Free, no key |
| arXiv | Recent preprints | Free, no key |
| Claude Web Search | Blogs, GitHub repos, practical guides | Via Claude CLI |
| Algorithm Registry | Coded algorithms already installed | Local check |
Every finding includes provenance (source URL, paper title, year, citations) for traceability.
Skills as research cache — second run skips research if matching skills exist from a previous run.
AutoAgentClaw uses a three-tier knowledge system inspired by OpenClaw skills and the self-improving agent pattern:
| Tier | What | Loaded When | Lifetime |
|---|---|---|---|
| Default Skills | 12 curated optimization techniques (CoT, output format, temperature tuning, etc.) | Always (descriptions in context) | Permanent |
| Project Skills | Research findings + learned principles for THIS project | On demand | Decay over 30 days |
| Learned Skills | Techniques promoted after working on 3+ projects | Always | Permanent until disproven |
Each skill is a proper AgentSkills directory with SKILL.md + optional scripts/, references/, assets/:
autoagent/default_skills/
L1-chain-of-thought/
SKILL.md # When to apply, how to apply, evidence
references/evidence.md # Papers, benchmark results
L2-temperature-tuning/
SKILL.md
scripts/temp_sweep.py # Quick temperature sweep script
Self-improving: After each run, the framework:
- Records experiments in
LEARNINGS.md(what worked, what failed) - Extracts reusable principles as new skills via the
autoagent-skill-creator - Promotes skills to global
_learned/after 3+ project confirmations - Decays confidence on old skills — stale knowledge fades naturally
Project-scoped: Skills from HotpotQA don't leak into Customer Support. Each project has its own skills + learnings directory.
The eval script is copied to .autoagent/private/ — the optimizer can never modify it.
A Sentinel watchdog continuously monitors for:
- 🎯 Reward hacking (suspicious score jumps)
- 💰 Cost anomalies (unexpected API cost spikes)
- 🔒 Eval tampering (checksum mismatch)
- 📉 Score regression and oscillation
Add coded optimization algorithms alongside the LLM-as-optimizer:
from autoagent.algorithms import BaseOptimizer, register_optimizer
class MyOptimizer(BaseOptimizer):
name = "my-optimizer"
handles_levels = [1] # Only Level 1 (behavior)
handles_types = ["prompt"]
def can_handle(self, target): return target.param_type == "prompt"
def optimize(self, target, eval_fn, budget): ...
register_optimizer(MyOptimizer())The framework handles everything else: eval, tracking, skills, dashboard. Skills remember which algorithm worked at which level for automatic selection in future runs.
git clone https://github.com/skyve2012/AutoAgentClaw.git
cd AutoAgentClaw
pip install -e .
autoagent setup # Verify requirementsautoagent run \
--target docs/examples/hotpotqa-agents \
--eval eval.py \
--metric score \
--direction maximize \
--max-experiments 8 \
--auto-approveThe demo includes a deliberately suboptimal researcher + reasoner system that AutoAgentClaw optimizes by discovering better prompts, tuning parameters, and improving output format.
autoagent dashboard --target docs/examples/hotpotqa-agents
# Opens http://localhost:3000 — topology viz, leaderboard, score chartautoagent run \
--target ~/my-agents \
--eval eval.py \
--metric accuracy \
--direction maximizeRequirements for your agent repo: An
eval.pythat returns a JSON with your metric (e.g.,{"accuracy": 0.85}). That's it.
autoagent init # Interactive config wizard📝 Minimum required config
target:
path: "~/my-agents"
eval_script: "eval.py"
metric: "accuracy"
direction: "maximize"📋 Full Configuration Reference
# === LLM Provider ===
llm:
provider: "openai"
base_url: "https://api.openai.com/v1"
api_key_env: "OPENAI_API_KEY"
primary_model: "gpt-4o"
fallback_models: ["gpt-4o-mini"]
# === Target Agent System ===
target:
path: "~/my-agents"
eval_script: "eval.py"
metric: "accuracy"
direction: "maximize"
# === Search Space (auto-discovered if omitted) ===
search_space:
freeze: [] # Param IDs to NOT optimize
priority: [] # Optimize these first
dimensions: {} # Override algorithm per dimension
# === Optimization Budget ===
budget:
max_experiments: 100
max_time_minutes: 180
max_cost_usd: 50.00
pilot_experiments: 5
# === Execution ===
execution:
worker_mode: "autonomous" # per-experiment | autonomous
max_parallel_workers: 3
eval_timeout_sec: 300
# === Pipeline Gates ===
gates:
auto_approve: false # Set true for overnight runs
# === Knowledge / Cross-Run Learning ===
knowledge:
enabled: true
skills_dir: "~/.autoagent/skills"
decay_rate: 0.05
# === Notifications ===
notifications:
channel: "console" # console | discord | slack
notify_on: ["new_best_score", "optimization_complete"]
# === Dashboard ===
dashboard:
port: 3000
auto_open: true
# === OpenClaw Bridge ===
openclaw_bridge:
use_cron: true
use_message: true
use_memory: true
use_sessions_spawn: true
use_web_fetch: true
use_browser: false
# === Sentinel Watchdog ===
sentinel:
enabled: true
detect_reward_hacking: true
detect_cost_anomaly: trueautoagent setup # Verify system requirements
autoagent init # Create config interactively
autoagent run [options] # Run optimization pipeline
autoagent resume --target <path> # Resume an interrupted run
autoagent dashboard [options] # Start web dashboard
autoagent log --target <path> # View experiment log + leaderboard
autoagent status --target <path> # Show all runs + skill count
autoagent skills # List accumulated skills🔧 autoagent run options
| Flag | Description | Default |
|---|---|---|
--target, -t |
Path to target agent repo | (required) |
--eval |
Evaluation script path | eval.py |
--metric, -m |
Metric name to optimize | accuracy |
--direction, -d |
maximize or minimize |
maximize |
--config |
Config file path | auto-detect |
--max-experiments |
Max experiments to run | 100 |
--max-cost |
Max cost in USD | 50.0 |
--max-time |
Max time in minutes | 180 |
--auto-approve |
Skip all approval gates | false |
--interactive, -i |
Interactive mode | false |
from autoagent.pipeline.runner import PipelineRunner
from autoagent.config import load_config
from pathlib import Path
config = load_config("config.autoagent.yaml")
runner = PipelineRunner(config, Path("~/my-agents").expanduser())
result = runner.run()
print(f"Baseline: {result.baseline_score:.4f}")
print(f"Best: {result.best_score:.4f} (+{result.improvement_pct:.1f}%)")# Per-project (in your agent repo)
.autoagent/
├── runs/
│ └── ao-YYYYMMDD-HHMMSS/
│ ├── discovery.json # Agents, params, topology graph
│ ├── strategy.json # Optimization plan (phases, algorithms)
│ ├── attempts.jsonl # Every experiment (scores, diffs, feedback)
│ ├── leaderboard.json # Top-10 ranked configurations
│ └── report.md # Human-readable optimization report
├── private/
│ └── eval.py # Protected eval (tamper-proof copy)
└── knowledge/
├── attempts/ # CORAL-style shared attempt records
└── notes/ # Worker observations and learnings
# Persistent knowledge (across runs + projects)
~/.autoagent/
├── projects/<project-hash>/
│ ├── skills/ # Research + learned skills for THIS project
│ │ └── research-textgrad/SKILL.md
│ ├── .learnings/
│ │ ├── LEARNINGS.md # What worked / failed (structured entries)
│ │ └── ERRORS.md # What went wrong and why
│ └── lessons/
│ └── round-001.jsonl # Structured experiment records per round
└── skills/
├── _default/ # 12 curated optimization skills (ship with framework)
└── _learned/ # Skills promoted from 3+ projects
| Method | Command | Setup |
|---|---|---|
| 🦞 OpenClaw | "Optimize my agents at ~/my-agents" | Zero — just chat |
| 💻 CLI | autoagent run --target ~/my-agents --eval eval.py |
pip install -e . |
| 🐍 Python API | PipelineRunner(config, path).run() |
pip install -e . |
| Claude Code | claude in the AutoAgentClaw directory |
Clone repo |
| Codex CLI | codex in the AutoAgentClaw directory |
Clone repo |
| Any AI Assistant | Point it at AUTOAGENT_AGENTS.md |
Clone repo |
AutoAgentClaw/
├── autoagent/
│ ├── cli.py # CLI: setup, init, run, dashboard, log, status, skills
│ ├── config.py # YAML config (Pydantic models)
│ ├── models.py # 4-level data models (MaAS hierarchy)
│ ├── pipeline/
│ │ ├── runner.py # 12-stage pipeline orchestrator
│ │ └── stages.py # Stage definitions with gate support
│ ├── agents/
│ │ ├── discovery_agent.py # Auto-discovers agents, prompts, topology
│ │ ├── research_agent.py # Multi-source research (SS + arXiv + web)
│ │ ├── strategy_agent.py # Level-based optimization planning
│ │ ├── worker_agent.py # CORAL-style LLM-as-optimizer workers
│ │ ├── autonomous_worker.py # Long-lived autonomous sessions
│ │ ├── prompt_opt_agent.py # Evolutionary fallback optimizer
│ │ └── reflect_agent.py # Heartbeat reflection + convergence
│ ├── evaluator/
│ │ ├── runner.py # Protected eval runner
│ │ ├── tracker.py # Experiment tracking + leaderboard
│ │ └── sentinel.py # Watchdog (reward hacking, cost, tampering)
│ ├── algorithms/
│ │ └── registry.py # Pluggable algorithm registry
│ ├── knowledge/
│ │ ├── skill_manager.py # Cross-run learning (extract, inject, decay)
│ │ └── shared.py # CORAL-style shared knowledge filesystem
│ ├── bridge/
│ │ └── adapters.py # OpenClaw bridge (6 typed adapter protocols)
│ ├── llm/
│ │ └── providers.py # Multi-provider (Claude CLI, API, ACP, OpenAI)
│ ├── dashboard/
│ │ ├── server.py # FastAPI backend (8 API endpoints)
│ │ └── static/index.html # Dashboard UI (topology, charts, leaderboard)
│ └── workspace.py # Git worktree manager (parallel workers)
├── AUTOAGENT_AGENTS.md # OpenClaw service definition
├── CLAUDE.md # Claude Code instructions
├── config.autoagent.example.yaml
├── .claude/skills/
│ ├── autoagent/ # Main entry skill (OpenClaw integration)
│ ├── autoagent-skill-creator/ # Meta-skill for creating optimization skills
│ ├── autoagent-grid-search/ # Algorithm skill: systematic parameter search
│ ├── autoagent-feedback-loop/ # Algorithm skill: feedback-driven revision
│ └── autoagent-optuna-search/ # Algorithm skill: Bayesian optimization
├── docs/
│ ├── images/logo.png
│ └── examples/
│ ├── hotpotqa-agents/ # 2-agent QA (Real HotpotQA, +29.3%)
│ ├── math-reasoning/ # 3-agent math (Real GSM8K, +7.1%)
│ ├── customer-support/ # 3-agent support (Synthetic, +5.1%)
│ └── code-generation/ # 1-agent coder (Real HumanEval)
└── tests/ # 27 unit tests
| Contribution | Description |
|---|---|
| 🔍 Zero-Config Agent Discovery | Reads any agent repo and auto-discovers tunable parameters — no framework lock-in, no rewriting required |
| 📐 4-Level Optimization Hierarchy | MaAS-informed systematic approach: Behavior → Configuration → Communication → Topology, optimizing cheapest levels first |
| 🔬 Research-Driven Strategy | Multi-source research (Semantic Scholar + arXiv + GitHub + PyPI) before optimization — the first agent optimizer to search for code implementations |
| 🧠 Self-Improving Skill System | OpenClaw-native three-tier knowledge: curated defaults → project-specific learnings → cross-project promotions with confidence decay |
| 🤖 Autonomous LLM-as-Optimizer | CORAL-style long-lived Claude sessions that read code, propose changes, evaluate, and iterate — achieving 200-700% improvements |
| 🔄 Pluggable Algorithm Skills | Algorithms are installable skills with SKILL.md + scripts/ — add a new optimizer by creating a directory |
| 🛡️ Protected Eval + Sentinel | Tamper-proof evaluation with watchdog monitoring for reward hacking, cost anomalies, and score oscillation |
| 📊 Live Dashboard | Real-time topology visualization, performance charts, experiment leaderboard, and dimension heatmaps |
| 🦞 OpenClaw Native | Works with Claude subscription ($0 API cost), supports 6 ACP agents, bridge adapters for parallel execution |
- 12-stage pipeline with human approval gates
- Zero-config agent discovery (LLM-driven + regex fallback)
- Multi-source research (Semantic Scholar + arXiv + GitHub + PyPI)
- 4-level optimization hierarchy (MaAS-informed)
- Protected evaluator with checksum verification
- Sentinel watchdog (reward hacking, cost anomaly, score oscillation)
- Cross-validation with git snapshot restore
- Optimization report generation
- Per-experiment mode (one LLM call per experiment)
- Autonomous mode (CORAL-style long-lived Claude sessions)
- Parallel workers via git worktrees
- SIGINT interrupt-resume heartbeat (CORAL-style mid-session reflection)
-
autoagent resumecommand for crashed/interrupted runs
- 12 curated default optimization skills
- OpenClaw-native skill format (SKILL.md + scripts/ + references/)
- Skill-creator meta-skill for generating new skills
- Three-tier knowledge: default → project → learned
- Project-scoped skills (no cross-project leakage)
- Confidence decay on stale skills
- Structured learnings (LEARNINGS.md + round-N.jsonl)
- Skill promotion after 3+ project confirmations
- Auto-install pip packages discovered during research
- Generate BaseOptimizer adapters for discovered libraries
- LLM-as-optimizer (autonomous Claude sessions)
- Grid search (systematic parameter sweep)
- Feedback loop (iterative LLM revision)
- Optuna Bayesian optimization
- Pluggable algorithm registry with auto-discovery
- Integration with real TextGrad library
- Integration with DSPy MIPROv2
- MCTS-based topology search (AFlow-style)
- Multi-objective Pareto optimization
- OpenClaw integration (AUTOAGENT_AGENTS.md + bridge adapters)
- ACP support (Claude, Codex, Gemini, OpenCode, Kimi)
- Claude subscription auth ($0 cost)
- Web dashboard (topology viz, leaderboard, charts)
- CLI (setup, init, run, dashboard, log, status, skills)
- Verify and test non-Claude ACP agents (Codex, Gemini, etc.)
- Docker/SSH remote execution for heavy evaluations
- ClaweHub skill publishing
- Multi-language README (CN, JA, KR)
- HotpotQA — 2-agent QA (real data, +29.3%)
- GSM8K Math — 3-agent reasoning (real data, +7.1%)
- Customer Support — 3-agent triage (synthetic, +5.1%)
- Code Generation — 1-agent coder (real HumanEval data)
- GAIA benchmark (compare with EvoAgentX results)
- SWE-bench (software engineering agents)
- WebArena (web navigation agents)
- Multi-agent debate optimization benchmark
- Multi-round auto-continue (run N rounds until diminishing returns)
- Cross-model adversarial review (Claude optimizes, GPT reviews)
- Pilot experiments before strategy commitment
- Multi-agent debate for strategy planning
- Fine-tuning integration (Level 5: RL/SFT on agents)
- Cost-aware Pareto optimization (quality vs latency vs cost)
AutoAgentClaw builds on ideas from:
- CORAL — Multi-agent evolution infrastructure (parallel workers, protected eval, shared knowledge, heartbeat)
- AutoResearchClaw — OpenClaw integration, MetaClaw cross-run learning, staged pipeline, bridge adapters
- ResearchClaw — Claim/evidence graph, provenance tracking, experiment contracts
- ARIS — Research-driven strategy, overnight autonomous runs
- Karpathy's autoresearch — The outer-loop optimizer pattern
- MaAS — 4-level optimization hierarchy for multi-agent systems
- EvoAgentX — Pluggable algorithm registry pattern
- OpenClaw Skill System — AgentSkills standard, self-improving skill pattern
MIT
If you find AutoAgentClaw useful in your research or projects, please cite:
@software{shen2026autoagentclaw,
title = {AutoAgentClaw: Automatic Multi-Agent System Optimization},
author = {Shen, Hongyu},
year = {2026},
url = {https://github.com/skyve2012/AutoAgentClaw},
note = {A framework for automatically optimizing agent systems through
zero-config discovery, research-driven strategy, and self-improving
skills. Built on OpenClaw with CORAL-style parallel execution and
MaAS-informed 4-level optimization hierarchy.}
}Built with 🦞 by Hongyu Shen
