pith. machine review for the scientific record. sign in

arxiv: 2507.21046 · v4 · submitted 2025-07-28 · 💻 cs.AI

Recognition: 3 theorem links

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:18 UTC · model grok-4.3

classification 💻 cs.AI
keywords self-evolving agentsLLM adaptationcontinual learningtest-time evolutionartificial super intelligenceagent componentsmulti-agent feedback
0
0 comments X

The pith

Self-evolving agents adapt their internal components through ongoing interactions to move beyond static large language models toward artificial super intelligence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models produce strong results on fixed tasks yet stay unchanged after training and cannot adjust to new contexts or data on their own. This survey gathers existing work on agents that modify themselves and arranges it around three questions: which parts of an agent should change, at which points in time the changes should occur, and which techniques should drive those changes. The structure matters because it turns scattered experiments into a clear design space for systems that improve from experience without external retraining. The review also covers how to test such agents, where they already appear in coding or healthcare, and open problems around safety and scale. By supplying this map the paper aims to speed progress from today's fixed models to agents that evolve autonomously.

Core claim

The paper states that self-evolving agents, which update models, memory, tools, and architecture from interactions and feedback, overcome the static limits of large language models. It organizes the literature by the components that evolve, the timing of adaptation such as within a single run or across runs, and the mechanisms that produce change including scalar rewards or textual signals in single-agent or multi-agent settings. The survey supplies tailored benchmarks, lists applications, and lists remaining obstacles on the route to agents that reach super-intelligence without further human intervention.

What carries the argument

The three-dimensional framework that asks what parts of an agent to evolve, when to trigger adaptation, and how to implement the changes through rewards, feedback, or multi-agent coordination.

If this is right

  • Agent components such as models, memory, tools, and overall architecture can be updated through experience.
  • Adaptation can occur inside a single test run or between separate runs.
  • Change can be guided by scalar rewards, textual feedback, or interactions among multiple agents.
  • Specialized metrics and benchmarks exist to track whether evolution improves performance over time.
  • Domains such as coding, education, and healthcare gain from agents that keep improving without manual updates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the framework holds, agents could enter entirely new domains with little or no initial human data.
  • The same structure of what-when-how might apply to physical systems that must learn from real-world sensor streams.
  • Safety requirements could force limits on how freely an agent is allowed to rewrite its own code or goals.
  • Multiple agents evolving together might produce collective behaviors that no single designer intended.

Load-bearing premise

Self-evolving agents will serve as the primary route to artificial super intelligence instead of other scaling or training approaches.

What would settle it

A demonstration that a non-evolving static agent reaches super-intelligent results on open-ended interactive tasks without any self-modification would show that evolution is unnecessary.

read the original abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks but remain fundamentally static, unable to adapt their internal parameters to novel tasks, evolving knowledge domains, or dynamic interaction contexts. As LLMs are increasingly deployed in open-ended, interactive environments, this static nature has become a critical bottleneck, necessitating agents that can adaptively reason, act, and evolve in real time. This paradigm shift -- from scaling static models to developing self-evolving agents -- has sparked growing interest in architectures and methods enabling continual learning and adaptation from data, interactions, and experiences. This survey provides the first systematic and comprehensive review of self-evolving agents, organizing the field around three foundational dimensions: what, when, and how to evolve. We examine evolutionary mechanisms across agent components (e.g., models, memory, tools, architecture), categorize adaptation methods by stages (e.g., intra-test-time, inter-test-time), and analyze the algorithmic and architectural designs that guide evolutionary adaptation (e.g., scalar rewards, textual feedback, single-agent and multi-agent systems). Additionally, we analyze evaluation metrics and benchmarks tailored for self-evolving agents, highlight applications in domains such as coding, education, and healthcare, and identify critical challenges and research directions in safety, scalability, and co-evolutionary dynamics. By providing a structured framework for understanding and designing self-evolving agents, this survey establishes a roadmap for advancing more adaptive, robust, and versatile agentic systems in both research and real-world deployments, and ultimately sheds light on the realization of Artificial Super Intelligence (ASI) where agents evolve autonomously and perform beyond human-level intelligence across tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. This survey provides the first systematic review of self-evolving agents for LLMs, organizing the literature around three core dimensions: what to evolve (agent components including models, memory, tools, and architecture), when to evolve (adaptation stages such as intra-test-time and inter-test-time), and how to evolve (algorithmic and architectural designs using scalar rewards, textual feedback, and single- or multi-agent systems). It further reviews evaluation metrics and benchmarks, applications in coding, education, and healthcare, and challenges in safety, scalability, and co-evolutionary dynamics, framing the work as a roadmap toward more adaptive agents and ultimately ASI.

Significance. If the taxonomy is comprehensive and accurate, the survey consolidates an emerging subfield by offering a structured framework that can guide researchers in identifying gaps and designing experiments. It explicitly credits prior literature and highlights forward-looking challenges without introducing new empirical claims, which strengthens its utility as a reference while the ASI connection serves as motivational context rather than a load-bearing premise.

minor comments (3)
  1. [Abstract] Abstract: the claim of being the 'first systematic and comprehensive review' would benefit from a brief explicit comparison to the most closely related prior surveys (e.g., on agentic LLMs or continual learning) to substantiate novelty.
  2. [Section 2-4] The three-dimensional taxonomy is clearly motivated, but a summary table mapping representative papers to the 'what/when/how' categories would improve navigation and reduce potential overlap between sections.
  3. [Evaluation] Evaluation metrics section: ensure that the discussion of benchmarks includes explicit limitations of current datasets (e.g., lack of long-horizon evolution tracking) to balance the positive coverage.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of our survey and for recommending acceptance. Their summary accurately captures the paper's contributions in providing the first systematic framework for self-evolving agents organized around the what, when, and how dimensions.

Circularity Check

0 steps flagged

No significant circularity in survey's descriptive taxonomic framework

full rationale

This is a literature survey whose contribution is a high-level organizational taxonomy of existing work on self-evolving agents, structured around the dimensions of what, when, and how to evolve. No new equations, fitted parameters, predictions, or derivations are introduced that could reduce to the paper's own inputs by construction. All content consists of summaries and citations to prior literature; the ASI connection is explicitly forward-looking motivation rather than a load-bearing premise or result. The framework is self-contained as a review and does not rely on self-citation chains or ansatzes for its validity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey the paper introduces no new free parameters, axioms, or invented entities; it aggregates and structures prior research without adding ungrounded postulates.

pith-pipeline@v0.9.0 · 5704 in / 1076 out tokens · 29346 ms · 2026-05-14T22:18:23.448291+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • LedgerForcing / HierarchyEmergence reciprocity / hierarchy_emergence_forces_phi echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    self-evolving agents... continuously learning from new data, interactions, and experiences in real-time, leading to systems that are more robust, versatile

  • InevitableStructure inevitability echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    ultimately sheds light on the realization of Artificial Super Intelligence (ASI) where agents evolve autonomously

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 26 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

    cs.AI 2026-05 accept novelty 8.0

    SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.

  2. SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

    cs.AI 2026-05 unverdicted novelty 8.0

    SimWorld Studio uses a self-evolving coding agent to generate adaptive 3D environments that improve embodied agent performance, with reported gains of 18 points over fixed environments in navigation tasks.

  3. ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

    cs.AI 2026-05 conditional novelty 7.0

    ClawForge supplies a generator that turns scenario templates into reproducible command-line tasks testing state conflict handling, where the strongest frontier model scores only 45.3 percent strict accuracy.

  4. Harnessing Agentic Evolution

    cs.AI 2026-05 unverdicted novelty 7.0

    AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.

  5. EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

    cs.LG 2026-05 unverdicted novelty 7.0

    EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-ben...

  6. OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents

    cs.AI 2026-05 unverdicted novelty 7.0

    OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on f...

  7. Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents

    cs.AI 2026-05 unverdicted novelty 7.0

    Evolving-RL jointly optimizes experience extraction and utilization in LLM agents via RL with separate evaluation signals, delivering up to 98.7% relative gains on out-of-distribution tasks in ALFWorld and Mind2Web.

  8. From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

    cs.AI 2026-04 unverdicted novelty 7.0

    OMC framework turns multi-agent AI into self-organizing companies with Talents, Talent Market, and E²R search, achieving 84.67% success on PRDBench (15.48 points above prior art).

  9. M$^\star$: Every Task Deserves Its Own Memory Harness

    cs.PL 2026-04 unverdicted novelty 7.0

    M* evolves distinct Python memory programs per task via population-based reflective search, outperforming fixed-memory baselines on conversation, planning, and reasoning benchmarks.

  10. Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

    cs.CL 2025-11 unverdicted novelty 7.0

    Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.

  11. RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data

    cs.RO 2026-05 unverdicted novelty 6.0

    A co-evolutionary VLM-VGM loop on 500 unlabeled images raises planner success by 30 points and simulator success by 48 percent while beating fully supervised baselines.

  12. Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation

    cs.AI 2026-05 unverdicted novelty 6.0

    Self-evolving LLM agents exhibit capability erosion under continual adaptation, which Capability-Preserving Evolution mitigates by raising retained simple-task performance from 41.8% to 52.8% in workflow evolution und...

  13. FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration

    cs.LG 2026-05 unverdicted novelty 6.0

    FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA ...

  14. Learning Agent Routing From Early Experience

    cs.CL 2026-05 unverdicted novelty 6.0

    BoundaryRouter routes queries to LLM or agent using early experience memory from a seed set, cutting inference time 60.6% versus always using agents and raising performance 28.6% versus always using direct LLM inference.

  15. Evaluation-driven Scaling for Scientific Discovery

    cs.LG 2026-04 unverdicted novelty 6.0

    SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster ...

  16. Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

    cs.AI 2026-04 unverdicted novelty 6.0

    LLM agents trained with a task-success reward on self-generated knowledge can spontaneously explore and adapt to new environments without any rewards or instructions at inference, yielding 20% gains on web tasks and a...

  17. Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization

    cs.AI 2026-04 unverdicted novelty 6.0

    Frontier-Eng is a new benchmark for generative optimization in engineering where agents iteratively improve designs under fixed interaction budgets using executable verifiers, with top models like GPT 5.4 showing limi...

  18. Evaluation as Evolution: Transforming Adversarial Diffusion into Closed-Loop Curricula for Autonomous Vehicles

    cs.RO 2026-04 unverdicted novelty 6.0

    E² uses transport-regularized sparse control on learned reverse-time SDEs with topology-driven selection and Topological Anchoring to generate realistic adversarial scenarios, improving collision discovery by 9.01% on...

  19. Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

    cs.AI 2026-05 conditional novelty 5.0

    The survey proposes the LIFE framework to unify fragmented research on collaboration, failure attribution, and self-evolution in LLM multi-agent systems into a progression toward self-organizing intelligence.

  20. Reinforced Collaboration in Multi-Agent Flow Networks

    cs.LG 2026-05 unverdicted novelty 5.0

    MANGO optimizes multi-agent LLM workflows via flow networks, RL, and textual gradients, delivering up to 12.8% higher performance and 47.4% better efficiency while generalizing to new domains.

  21. Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction

    cs.AI 2026-04 unverdicted novelty 5.0

    Web2BigTable introduces a bi-level multi-agent system that achieves new state-of-the-art results on wide-coverage and deep web-to-table search benchmarks through orchestration, coordination, and closed-loop reflection.

  22. Autogenesis: A Self-Evolving Agent Protocol

    cs.AI 2026-04 unverdicted novelty 5.0

    Autogenesis Protocol defines resource and evolution layers for LLM agents, enabling a system that shows performance gains on long-horizon planning benchmarks.

  23. E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning

    cs.AI 2026-04 unverdicted novelty 5.0

    E3-TIR integrates expert prefixes, guided branches, and self-exploration via mix policy optimization to deliver 6% better tool-use performance with under 10% of the usual synthetic data and 1.46x ROI.

  24. OxyGent: Making Multi-Agent Systems Modular, Observable, and Evolvable via Oxy Abstraction

    cs.AI 2026-04 unverdicted novelty 4.0

    OxyGent supplies a modular framework for multi-agent systems via the Oxy abstraction for composition and monitoring and the OxyBank engine for continuous automated evolution.

  25. A Brief Overview: Agentic Reinforcement Learning In Large Language Models

    cs.AI 2026-04 unverdicted novelty 2.0

    The paper surveys the conceptual foundations, methodological innovations, challenges, and future directions of agentic reinforcement learning frameworks that embed cognitive capabilities like meta-reasoning and self-r...

  26. A Brief Overview: Agentic Reinforcement Learning In Large Language Models

    cs.AI 2026-04 unverdicted novelty 2.0

    This review synthesizes conceptual foundations, methods, challenges, and future directions for agentic reinforcement learning in large language models.

Reference graph

Works this paper leans on

297 extracted references · 297 canonical work pages · cited by 24 Pith papers · 39 internal anchors

  1. [1]

    2025 , eprint=

    Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems , author=. 2025 , eprint=

  2. [2]

    and Wong, Kam-Fai , title =

    Wang, Hongru and Qin, Yujia and Lin, Yankai and Pan, Jeff Z. and Wong, Kam-Fai , title =. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2024 , isbn =. doi:10.1145/3626772.3661381 , abstract =

  3. [3]

    2025 , eprint=

    Toward a Theory of Agents as Tool-Use Decision-Makers , author=. 2025 , eprint=

  4. [4]

    A survey on large language model based autonomous agents , volume =

    Wang, Lei and Ma, Chen and Feng, Xueyang and Zhang, Zeyu and Yang, Hao and Zhang, Jingsen and Chen, Zhiyuan and Tang, Jiakai and Chen, Xu and Lin, Yankai and Zhao, Wayne Xin and Wei, Zhewei and Wen, Jirong , year=. A survey on large language model based autonomous agents , volume=. Frontiers of Computer Science , publisher=. doi:10.1007/s11704-024-40231-1...

  5. [5]

    2025 , eprint=

    Large Language Model Agent: A Survey on Methodology, Applications and Challenges , author=. 2025 , eprint=

  6. [6]

    2025 , eprint=

    G\"odel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement , author=. 2025 , eprint=

  7. [8]

    Agentgen: Enhancing planning abilities for large language model based agent via environment and task generation,

    Agentgen: Enhancing planning abilities for large language model based agent via environment and task generation , author=. arXiv preprint arXiv:2408.00764 , year=

  8. [9]

    arXiv preprint arXiv:2505.23885 , year=

    Owl: Optimized workforce learning for general multi-agent assistance in real-world task automation , author=. arXiv preprint arXiv:2505.23885 , year=

  9. [11]

    arXiv preprint arXiv:2506.09046 , year=

    Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation , author=. arXiv preprint arXiv:2506.09046 , year=

  10. [12]

    Proceedings of COLING , year=

    Towards Adaptive Mechanism Activation in Language Agents , author=. Proceedings of COLING , year=

  11. [13]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    Voyager: An Open-Ended Embodied Agent with Large Language Models , author=. arXiv preprint arXiv:2305.16291 , year=

  12. [14]

    arXiv preprint arXiv:2505.22501 , year=

    EvolveSearch: An Iterative Self-Evolving Search Agent , author=. arXiv preprint arXiv:2505.22501 , year=

  13. [15]

    Webrl: Training llm web agents via self-evolving online curriculum reinforcement learning.arXiv:2411.02337, 2024

    WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning , author=. arXiv preprint arXiv:2411.02337 , year=

  14. [16]

    RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning

    RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning , author=. arXiv preprint arXiv:2504.20073 , year=

  15. [17]

    Tree of Thoughts: Deliberate Problem Solving with Large Language Models , volume =

    Yao, Shunyu and Yu, Dian and Zhao, Jeffrey and Shafran, Izhak and Griffiths, Tom and Cao, Yuan and Narasimhan, Karthik , booktitle =. Tree of Thoughts: Deliberate Problem Solving with Large Language Models , volume =

  16. [18]

    Self-Refine: Iterative Refinement with Self-Feedback , volume =

    Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and Gupta, Shashank and Majumder, Bodhisattwa Prasad and Hermann, Katherine and Welleck, Sean and Yazdanbakhsh, Amir and Clark, Peter , booktitle =. Self-Refine: Iterative Refinem...

  17. [19]

    2021 , isbn =

    Wang, Yiwei and Wang, Wei and Liang, Yuxuan and Cai, Yujun and Hooi, Bryan , title =. 2021 , isbn =. doi:10.1145/3442381.3450025 , booktitle =

  18. [20]

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

    Cha, Hyuntak and Lee, Jaeho and Shin, Jinwoo , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2021 , pages =

  19. [21]

    arXiv preprint arXiv:2405.03279 , year=

    Lifelong knowledge editing for llms with retrieval-augmented continuous prompt learning , author=. arXiv preprint arXiv:2405.03279 , year=

  20. [22]

    Advances in Neural Information Processing Systems , volume=

    Large language models are semi-parametric reinforcement learning agents , author=. Advances in Neural Information Processing Systems , volume=

  21. [23]

    Symbolic learning enables self-evolving agents

    Symbolic learning enables self-evolving agents , author=. arXiv preprint arXiv:2406.18532 , year=

  22. [24]

    arXiv preprint arXiv:2409.00872 , year=

    Self-evolving Agents with reflective and memory-augmented abilities , author=. arXiv preprint arXiv:2409.00872 , year=

  23. [25]

    A-MEM: Agentic Memory for LLM Agents

    A-mem: Agentic memory for llm agents , author=. arXiv preprint arXiv:2502.12110 , year=

  24. [26]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Expel: Llm agents are experiential learners , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  25. [27]

    Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

    Mem0: Building production-ready ai agents with scalable long-term memory , author=. arXiv preprint arXiv:2504.19413 , year=

  26. [28]

    Advances in Neural Information Processing Systems , volume=

    Richelieu: Self-evolving llm-based agents for ai diplomacy , author=. Advances in Neural Information Processing Systems , volume=

  27. [29]

    arXiv preprint arXiv:2410.15665 , year=

    Long term memory: The foundation of ai self-evolution , author=. arXiv preprint arXiv:2410.15665 , year=

  28. [30]

    arXiv preprint arXiv:2505.16067 , year=

    How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior , author=. arXiv preprint arXiv:2505.16067 , year=

  29. [31]

    arXiv e-prints , pages=

    Autoguide: Automated generation and selection of state-aware guidelines for large language model agents , author=. arXiv e-prints , pages=

  30. [32]

    arXiv preprint arXiv:2503.21760 , year=

    Meminsight: Autonomous memory augmentation for llm agents , author=. arXiv preprint arXiv:2503.21760 , year=

  31. [33]

    arXiv preprint arXiv:2310.00656 , year=

    Lego-prover: Neural theorem proving with growing libraries , author=. arXiv preprint arXiv:2310.00656 , year=

  32. [34]

    Cognitive Memory in Large Language Models, April 2025

    Cognitive memory in large language models , author=. arXiv preprint arXiv:2504.02441 , year=

  33. [35]

    Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

    Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

  34. [36]

    Unleashing infinite-length input capacity for large-scale language models with self-controlled memory system

    Enhancing large language model with self-controlled memory framework , author=. arXiv preprint arXiv:2304.13343 , year=

  35. [37]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Memorybank: Enhancing large language models with long-term memory , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  36. [38]

    G- memory: Tracing hierarchical memory for multi-agent systems.arXiv, 2025

    G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems , author=. arXiv preprint arXiv:2506.07398 , year=

  37. [39]

    arXiv preprint arXiv:2502.16923 , year=

    A systematic survey of automatic prompt optimization techniques , author=. arXiv preprint arXiv:2502.16923 , year=

  38. [40]

    arXiv preprint arXiv:2406.11132 , year=

    Reprompt: Planning by automatic prompt engineering for large language models agents , author=. arXiv preprint arXiv:2406.11132 , year=

  39. [41]

    Promptbreeder: Self-referential self-improvement via prompt evolution,

    Promptbreeder: Self-referential self-improvement via prompt evolution , author=. arXiv preprint arXiv:2309.16797 , year=

  40. [42]

    The Eleventh International Conference on Learning Representations , year=

    Large language models are human-level prompt engineers , author=. The Eleventh International Conference on Learning Representations , year=

  41. [43]

    gradient descent

    Automatic prompt optimization with" gradient descent" and beam search , author=. arXiv preprint arXiv:2305.03495 , year=

  42. [44]

    arXiv preprint arXiv:2411.07446 , year=

    Efficient and Accurate Prompt Optimization: the Benefit of Memory in Exemplar-Guided Reflection , author=. arXiv preprint arXiv:2411.07446 , year=

  43. [45]

    DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

    Dspy: Compiling declarative language model calls into self-improving pipelines , author=. arXiv preprint arXiv:2310.03714 , year=

  44. [46]

    arXiv preprint arXiv:2310.06762 , year=

    Trace: A comprehensive benchmark for continual learning in large language models , author=. arXiv preprint arXiv:2310.06762 , year=

  45. [47]

    arXiv preprint arXiv:2310.16427 , year=

    Promptagent: Strategic planning with language models enables expert-level prompt optimization , author=. arXiv preprint arXiv:2310.16427 , year=

  46. [48]

    Large Language Models as Optimizers

    Large language models as optimizers , author=. arXiv preprint arXiv:2309.03409 , year=

  47. [49]

    arXiv preprint arXiv:2312.17025 , year=

    Experiential co-learning of software-developing agents , author=. arXiv preprint arXiv:2312.17025 , year=

  48. [50]

    arXiv preprint arXiv:2412.03092 , year=

    Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization , author=. arXiv preprint arXiv:2412.03092 , year=

  49. [51]

    arXiv preprint arXiv:2402.08702 , year=

    PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Heuristic-based Sampling , author=. arXiv preprint arXiv:2402.08702 , year=

  50. [52]

    arXiv preprint arXiv:2502.02533 , year=

    Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies , author=. arXiv preprint arXiv:2502.02533 , year=

  51. [53]

    Evoagent: Towards automatic multi-agent generation via evolutionary algorithms

    Evoagent: Towards automatic multi-agent generation via evolutionary algorithms , author=. arXiv preprint arXiv:2406.14228 , year=

  52. [54]

    arXiv e-prints , pages=

    LLM-AutoDiff: Auto-Differentiate Any LLM Workflow , author=. arXiv e-prints , pages=

  53. [55]

    CoRR , volume =

    Jiahao Qiu and Xuan Qi and Tongcheng Zhang and Xinzhe Juan and Jiacheng Guo and Yifu Lu and Yimin Wang and Zixin Yao and Qihan Ren and Xun Jiang and Xing Zhou and Dongrui Liu and Ling Yang and Yue Wu and Kaixuan Huang and Shilong Liu and Hongru Wang and Mengdi Wang , title =. CoRR , volume =

  54. [56]

    Hujaifa Islam and Hasmot Ali and Kishor Datta Gupta and Roy George , title =

    Mohd Ariful Haque and Justin Williams and Sunzida Siddique and Md. Hujaifa Islam and Hasmot Ali and Kishor Datta Gupta and Roy George , title =. CoRR , volume =

  55. [57]

    Fatemi and Xiaolong Jin and Zora Zhiruo Wang and Apurva Gandhi and Yueqi Song and Yu Gu and Jayanth Srinivasa and Gaowen Liu and Graham Neubig and Yu Su , title =

    Boyuan Zheng and Michael Y. Fatemi and Xiaolong Jin and Zora Zhiruo Wang and Apurva Gandhi and Yueqi Song and Yu Gu and Jayanth Srinivasa and Gaowen Liu and Graham Neubig and Yu Su , title =. CoRR , volume =

  56. [58]

    Empowering Large Language Model Agents through Action Learning , journal =

    Haiteng Zhao and Chang Ma and Guoyin Wang and Jing Su and Lingpeng Kong and Jingjing Xu and Zhi. Empowering Large Language Model Agents through Action Learning , journal =

  57. [59]

    From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions , booktitle =

    Changle Qu and Sunhao Dai and Xiaochi Wei and Hengyi Cai and Shuaiqiang Wang and Dawei Yin and Jun Xu and Ji. From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions , booktitle =

  58. [60]

    Renxi Wang and Xudong Han and Lei Ji and Shu Wang and Timothy Baldwin and Haonan Li , title =

  59. [61]

    Yu Shang and Yu Li and Keyu Zhao and Likai Ma and Jiahe Liu and Fengli Xu and Yong Li , title =

  60. [62]

    2025 , eprint=

    Zhang, Jenny and Hu, Shengran and Lu, Cong and Lange, Robert and Clune, Jeff , journal=. 2025 , eprint=

  61. [63]

    2024 , url=

    Yuan, Lifan and Chen, Yangyi and Wang, Xingyao and Fung, Yi and Peng, Hao and Ji, Heng , booktitle=. 2024 , url=

  62. [64]

    2023 , eprint=

    Cai, Tianle and Wang, Xuezhi and Ma, Tengyu and Chen, Xinyun and Zhou, Denny , journal=. 2023 , eprint=

  63. [65]

    2023 , url=

    Qian, Cheng and Han, Chi and Fung, Yi and Qin, Yujia and Liu, Zhiyuan and Ji, Heng , booktitle=. 2023 , url=

  64. [66]

    Toolformer: Language Models Can Teach Themselves to Use Tools , booktitle =

    Timo Schick and Jane Dwivedi. Toolformer: Language Models Can Teach Themselves to Use Tools , booktitle =

  65. [67]

    Patil and Tianjun Zhang and Xin Wang and Joseph E

    Shishir G. Patil and Tianjun Zhang and Xin Wang and Joseph E. Gonzalez , title =. NeurIPS , year =

  66. [68]

    CoRR , volume =

    Zhengliang Shi and Yuhan Wang and Lingyong Yan and Pengjie Ren and Shuaiqiang Wang and Dawei Yin and Zhaochun Ren , title =. CoRR , volume =

  67. [69]

    Towards Completeness-Oriented Tool Retrieval for Large Language Models , booktitle =

    Changle Qu and Sunhao Dai and Xiaochi Wei and Hengyi Cai and Shuaiqiang Wang and Dawei Yin and Jun Xu and Ji. Towards Completeness-Oriented Tool Retrieval for Large Language Models , booktitle =

  68. [70]

    Yuanhang Zheng and Peng Li and Wei Liu and Yang Liu and Jian Luan and Bin Wang , title =

  69. [71]

    CoRR , volume =

    Hang Gao and Yongfeng Zhang , title =. CoRR , volume =

  70. [72]

    Kolby Nottingham and Bodhisattwa Prasad Majumder and Bhavana Dalvi Mishra and Sameer Singh and Peter Clark and Roy Fox , title =

  71. [73]

    MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

    Mle-bench: Evaluating machine learning agents on machine learning engineering , author=. arXiv preprint arXiv:2410.07095 , year=

  72. [74]

    , author Chen, S

    Scienceagentbench: Toward rigorous assessment of language agents for data-driven scientific discovery , author=. arXiv preprint arXiv:2410.05080 , year=

  73. [75]

    BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

    Browsecomp: A simple yet challenging benchmark for browsing agents , author=. arXiv preprint arXiv:2504.12516 , year=

  74. [76]

    arXiv preprint arXiv:2410.06703 , year=

    St-webagentbench: A benchmark for evaluating safety and trustworthiness in web agents , author=. arXiv preprint arXiv:2410.06703 , year=

  75. [77]

    arXiv preprint arXiv:2501.07572 , year=

    Webwalker: Benchmarking llms in web traversal , author=. arXiv preprint arXiv:2501.07572 , year=

  76. [78]

    The Twelfth International Conference on Learning Representations , year=

    Gaia: a benchmark for general ai assistants , author=. The Twelfth International Conference on Learning Representations , year=

  77. [79]

    AgentBench: Evaluating LLMs as Agents

    Agentbench: Evaluating llms as agents , author=. arXiv preprint arXiv:2308.03688 , year=

  78. [80]

    arXiv preprint arXiv:2503.13856 , year=

    Mdteamgpt: A self-evolving llm-based multi-agent framework for multi-disciplinary team medical consultation , author=. arXiv preprint arXiv:2503.13856 , year=

  79. [81]

    Multiagentbench: Evaluating the collaboration and competition of llm agents,

    Multiagentbench: Evaluating the collaboration and competition of llm agents , author=. arXiv preprint arXiv:2503.01935 , year=

  80. [82]

    arXiv preprint arXiv:2410.16946 , year=

    Self-evolving multi-agent collaboration networks for software development , author=. arXiv preprint arXiv:2410.16946 , year=

Showing first 80 references.