Skip to content

[Survey] A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

License

Notifications You must be signed in to change notification settings

EvoAgentX/Awesome-Self-Evolving-Agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

38 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Awesome-Self-Evolving-Agents

Awesome arXiv Contribution Welcome GitHub star chart License

πŸ€– We're still cooking β€” Stay tuned! πŸ€–
⭐ Give us a star if you like it! ⭐

Evolve Tree
Figure: A visual taxonomy of AI agent evolution and optimisation techniques, categorised into three major directions: single-agent optimisation, multi-agent optimisation, and domain-specific optimisation. The tree structure illustrates the development of these approaches from 2023 to 2025, including representative methods within each branch.

AI Agents Development Path

Development Path

Conceptual Framework of the Self-Evolving AI Agents

Conceptual Framework

Open-Source Framework

1. Single-Agent Optimisation

1.1 πŸ€– LLM Behaviour Optimisation

1.1.1 πŸ“Œ Training-Based Behaviour Optimisation

(1) πŸ”§ Supervised Fine-Tuning Approaches
  • (ICLR'24) ToRA: A tool-integrated reasoning agent for mathematical problem solving [πŸ“ Paper] [πŸ’» Code]
  • (NeurIPS'22) STaR : Bootstrapping reasoning with reasoning [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'24) NExT: Teaching large language models to reason about code execution [πŸ“ Paper]
  • (EMNLP'24) MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning [πŸ“ Paper]
(2) πŸ”§ Reinforcement Learning Approaches

1.1.2 πŸ“Œ Test-Time Behaviour Optimisation

(1) πŸ”§ Feedback-Based Approaches
  • (ICLR'23) CodeT: Code Generation with Generated Tests [πŸ“ Paper] [πŸ’» Code]
  • (ICML'23) LEVER: Learning to Verify Language-to-Code Generation with Execution [πŸ“ Paper] [πŸ’» Code]
  • (ESEC/FSE'23) Baldur: Whole-Proof Generation and Repair with Large Language Models [πŸ“ Paper]
  • (ACL'24) Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations [πŸ“ Paper]
  • (EMNLP'24) Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'24) Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs [πŸ“ Paper]
  • (ICLR'25) Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning [πŸ“ Paper]
  • (Arxiv'25) Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy [πŸ“ Paper] [πŸ’» Code]
(2) πŸ”§ Search-Based Approaches
(3οΌ‰πŸ”§ Reasoning-Based Approaches

1.2 πŸ’¬ Prompt Optimisation

1.2.1 πŸ“Œ Edit-Based Prompt Optimisation

1.2.2 πŸ“Œ Evolutionary Prompt Optimisation

  • (ICLR'24) EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers [πŸ“ Paper] [πŸ’» Code]
  • (ICML'24) Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution [πŸ“ Paper]

1.2.3 πŸ“Œ Generative Prompt Optimisation

1.2.4 πŸ“Œ Text Gradient-Based Prompt Optimisation

1.3 🧠 Memory Optimization

  • (ICML'24) A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts [πŸ“ Paper]
  • (ICML'24) Agent Workflow Memory [πŸ“ Paper]
  • (AAAI'24) MemoryBank: Enhancing Large Language Models with Long-Term Memory [πŸ“ Paper]
  • (EMNLP'24) GraphReader: Building graph-based agent to enhance long-context [πŸ“ Paper]
  • (Arxiv'24) "My agent understands me better": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based Agents [πŸ“ Paper]
  • (ICLR'25) Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations [πŸ“ Paper]
  • (ICLR'25) Boosting knowledge intensive reasoning of llms via inference-time hybrid information [πŸ“ Paper] [πŸ’» Code]
  • (ACL'25) Improving factuality with explicit working memory [πŸ“ Paper]
  • (Arxiv'25) A-MEM: Agentic Memory for LLM Agents [πŸ“ Paper]
  • (Arxiv'25) Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory [πŸ“ Paper]
  • (Arxiv'25) Memento: Fine‑tuningβ€―LLMβ€―Agentsβ€―withoutβ€―Fine‑tuningβ€―LLMs [πŸ“β€―Paper] [πŸ’»β€―Code]
  • (Arxiv'25) Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning [πŸ“β€―Paper]
  • (Arxiv'25) Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory [πŸ“β€―Paper] [πŸ’»β€―Code]

1.4 🧰 Tool Optimization

1.4.1 πŸ“Œ Training-Based Tool Optimisation

(1) Supervised Fine-Tuning for Tool Optimisation
(2) Reinforcement Learning for Tool Optimisation

1.4.2 πŸ“Œ Inference-Time Tool Optimisation

(1) Prompt-Based Optimisation
(2) Reasoning-Based Optimisation

1.4.3 πŸ“Œ Tool Functionality Optimisation

  • (EMNLP'23) CREATOR : Tool creation for disentangling abstract and concrete reasoning of large language model [πŸ“ Paper] [πŸ’» Code]
  • (ICML'24) Offline Training of Language Model Agents with Functions as Learnable Weights [πŸ“ Paper]
  • (CVPR'24) CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'25) Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution [πŸ“ Paper] [πŸ’» Code]

1.5 🧰 Unified Optimization

  • (Arxiv'25) Building Self-Evolving Agents via Experience-Driven Lifelong Learning: A Framework and Benchmark [πŸ“ Paper] [πŸ’» Code]

2. Multi-Agent Optimisation

  • (ICML'25) Multi-Agent Architecture Search via Agentic Supernet [πŸ“ Paper] [πŸ’»Code]
  • (ICML'25) MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving [πŸ“ Paper]
  • (ICLR'25) AFlow: Automating Agentic Workflow Generation [πŸ“ Paper] [πŸ’» Code]
  • (ICLR'25) WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models [πŸ“ Paper]
  • (ICLR'25) Flow: Modularized Agentic Workflow Automation [πŸ“ Paper]
  • (ICLR'25) Automated Design of Agentic Systems [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'25) FlowReasoner: Reinforcing Query-Level Meta-Agents [πŸ“ Paper]
  • (Arxiv'25) AgentNet: Decentralized Evolutionary Coordination for LLM-Based Multi-Agent Systems [πŸ“ Paper]
  • (Arxiv'25) MAS-GPT: Training LLMs to Build LLM-Based Multi-Agent Systems [πŸ“ Paper]
  • (Arxiv'25) FlowAgent: Achieving Compliance and Flexibility for Workflow Agents [πŸ“ Paper]
  • (Arxiv'25) ScoreFlow: Mastering LLM Agent Workflows via Score-Based Preference Optimization [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'25) Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies [πŸ“ Paper]
  • (Arxiv'25) MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision [πŸ“ Paper]
  • (Arxiv'25) MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming [πŸ“ Paper]
  • (ICML'24) GPTSwarm: Language Agents as Optimizable Graphs [πŸ“ Paper] [Code]
  • (ICLR'24) DSPy: Compiling Declarative Language Model Calls into State-of-the-Art Pipelines [πŸ“ Paper] [πŸ’» Code]
  • (ICLR'24) AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors [πŸ“ Paper] [πŸ’» Code]
  • (ICLR'24) MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework [πŸ“ Paper] [πŸ’» Code]
  • (COLM'24) A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration [πŸ“ Paper]
  • (COLM'24) AutoGen: Enabling next-Gen LLM Applications via Multi-Agent Conversations [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'24) G-Designer: Architecting Multi-Agent Communication Topologies via Graph Neural Networks [πŸ“ Paper]
  • (Arxiv'24) AutoFlow: Automated Workflow Generation for Large Language Model Agents [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'24) Symbolic Learning Enables Self-Evolving Agents [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'24) Adaptive In-Conversation Team Building for Language Model Agents [πŸ“ Paper]
  • (Arxiv'25) Chain‑of‑Agents: End‑to‑End Agent Foundation Models via Multi‑Agent Distillation and Agentic RL [πŸ“β€―Paper] [πŸ’»β€―Code]
  • (Arxiv’25) Agentβ€―KB: Leveraging Cross‑Domain Experience for Agentic Problem Solving [πŸ“β€―Paper] [πŸ’»β€―Code]

3. Domain-Specific Optimisation

3.1 🧬 Biomedicine

3.1.1 πŸ“Œ Medical Diagnosis

  • (EMNLP'24) MMedAgent: Learning to Use Medical Tools with Multi-modal Agent [πŸ“ Paper] [πŸ’» Code]
  • (NeurIPS'24) MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'25) HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research [πŸ“ Paper][πŸ’» Code]
  • (Arxiv'25) STELLA: Self-Evolving LLM Agent for Biomedical Research [πŸ“ Paper][πŸ’» Code]
  • (MICCAI'25) MedAgentSim: Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'25) PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology [πŸ“ Paper]
  • (Arxiv'25) MDTeamGPT: A Self-Evolving LLM-based Multi-Agent Framework for Multi-Disciplinary Team Medical Consultation [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'25) MedAgent-Pro: Towards Evidence-based Multi-modal Medical Diagnosis via Reasoning Agentic Workflow [πŸ“ Paper] [πŸ’» Code]

3.1.2 πŸ“Œ Molecular Discovery

  • (ICLR'24) CACTUS: Chemistry Agent Connecting Tool-Usage to Science [πŸ“ Paper] [πŸ’» Code]
  • (NMI'24) ChemCrow: Augmenting large language models with chemistry tools [πŸ“ Paper] [πŸ’» Code]
  • (ICLR'25) ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning[πŸ“ Paper] [πŸ’» Code]
  • (ICLR'25) OSDA Agent: Leveraging Large Language Models for De Novo Design of Organic Structure Directing Agents [πŸ“ Paper]
  • (Arxiv'25) DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration [πŸ“ Paper]
  • (Arxiv'25) LIDDIA: Language-based Intelligent Drug Discovery Agent [πŸ“ Paper]

3.2 πŸ’» Programming

3.2.1 πŸ“Œ Code Refinement

3.2.2 πŸ“Œ Code Debugging

  • (ACL'23) Self-Edit: Fault-Aware Code Editor for Code Generation [πŸ“ Paper]
  • (ICLR'24) Teaching Large Language Models to Self-Debug [πŸ“ Paper]
  • (ICA'24) RGD: Multi-LLM based agent debugger via refinement and generation guidance. [πŸ“ Paper]
  • (Arxiv'25) Large Language Model Guided Self-Debugging Code Generation [πŸ“ Paper]

3.3 Scientific Research

3.4 πŸ’°πŸ“š Financial and Legal Research

3.4.1 πŸ“Œ Financial Decision-Making

  • (AAAI'24) FinRobot: an open-source ai agent platform for financial applications using large language models [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'24) PEER: Expertizing domain-specific tasks with a multi-agent framework and tuning methods [πŸ“ Paper] [πŸ’» Code]
  • (NeurIPS'25) Fincon: A synthesized llm multi-agent system with conceptual verbal reinforcement for enhanced financial decision making [πŸ“ Paper] [πŸ’» Code]

3.4.2 πŸ“Œ Legal Reasoning

  • (Arxiv'24) LawLuo: A Multi-Agent Collaborative Framework for Multi-Round Chinese Legal Consultation [πŸ“ Paper]
  • (ICIC'24) Legalgpt: Legal chain of thought for the legal large language model multi-agent framework [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'24) LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model [πŸ“ Paper] [πŸ’» Code]
  • (ACL Findings'25) AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents [πŸ“ Paper] [πŸ’» Code]

3.5 🧩 Other Domain-Specific Optimisation

4. Evaluation

4.1 πŸ“ˆ Benchmark-Based Evaluation

4.1.1 πŸ“Œ Tool and API-Driven Agents

4.1.2 πŸ“Œ Web Navigation and Browsing Agents

4.1.3 πŸ“Œ Coding Agents

4.1.4 Scientific Research Agents

4.1.4 πŸ“Œ Multi-Agent Collaboration and Generalists

4.1.5 πŸ“Œ GUI and Multimodal Environment Agents

4.2 βš–οΈ LLM-Based Evaluation

4.2.1 πŸ“Œ LLM-as-a-Judge

  • (Arxiv'24) Towards Better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications [πŸ“ Paper]
  • (Arxiv'24) LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods [πŸ“ Paper]
  • (Arxiv'25) LiveIdeaBench: Evaluating LLMs’ Divergent Thinking for Scientific Idea Generation with Minimal Context [πŸ“ Paper] [πŸ’» Code]
  • (ACL'25) Auto-Arena: Automating LLM Evaluations with Agent Peer Debate and Committee Voting [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'25) MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation [πŸ“ Paper]

4.2.2 πŸ“Œ Agent-as-a-Judge

4.3 πŸ›‘ Safety, Alignment, and Robustness for Lifelong / Self-Evolving Agents

  • (Arxiv'24) AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents [πŸ“ Paper ]
  • (NeurIPS'24 – Datasets & Benchmarks) RedCode: Risky Code Execution and Generation [πŸ“ Paper ]
  • (Arxiv'24) MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control [πŸ“ Paper] [πŸ’» Code]
  • (Arxiv'23) Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark [πŸ“ Paper ]
  • (Arxiv'24) R-Judge: Benchmarking Safety Risk Awareness for LLM Judges [πŸ“ Paper] [πŸ’» Code]
  • (ACL'25) SafeLawBench: Towards Safe Alignment of Large Language Models [πŸ“ Paper ]

Star History Chart

πŸ“š Citation

If you find this survey useful in your research and applications, please cite using this BibTeX:

@article{fang2025comprehensive,
  title={A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems},
  author={Fang, Jinyuan and Peng, Yanwen and Zhang, Xi and Wang, Yingxu and Yi, Xinhao and Zhang, Guibin and Xu, Yi and Wu, Bin and Liu, Siwei and Li, Zihao and others},
  journal={arXiv preprint arXiv:2508.07407},
  year={2025}
}

β˜• Acknowledgement

We would like to thank Shuyu Guo for his valuable contributions to the early-stage exploration and literature review on agent optimisation.

βœ‰οΈ Contact Us

If you have any questions or suggestions, please feel free to contact us via:

Email: j.fang.2@research.gla.ac.uk and Zaiqiao.Meng@glasgow.ac.uk

About

[Survey] A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published