arxiv: 2507.21046 · v4 · submitted 2025-07-28 · 💻 cs.AI

Recognition: 3 theorem links

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Huan-ang Gao , Jiayi Geng , Wenyue Hua , Mengkang Hu , Xinzhe Juan , Hongzhang Liu , Shilong Liu , Jiahao Qiu

show 19 more authors

Xuan Qi Yiran Wu Hongru Wang Han Xiao Yuhang Zhou Shaokun Zhang Jiayi Zhang Jinyu Xiang Yixiong Fang Qiwen Zhao Dongrui Liu Qihan Ren Cheng Qian Zhenhailong Wang Minda Hu Huazheng Wang Qingyun Wu Heng Ji Mengdi Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-14 22:18 UTC · model grok-4.3

classification 💻 cs.AI

keywords self-evolving agentsLLM adaptationcontinual learningtest-time evolutionartificial super intelligenceagent componentsmulti-agent feedback

0 comments

The pith

Self-evolving agents adapt their internal components through ongoing interactions to move beyond static large language models toward artificial super intelligence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models produce strong results on fixed tasks yet stay unchanged after training and cannot adjust to new contexts or data on their own. This survey gathers existing work on agents that modify themselves and arranges it around three questions: which parts of an agent should change, at which points in time the changes should occur, and which techniques should drive those changes. The structure matters because it turns scattered experiments into a clear design space for systems that improve from experience without external retraining. The review also covers how to test such agents, where they already appear in coding or healthcare, and open problems around safety and scale. By supplying this map the paper aims to speed progress from today's fixed models to agents that evolve autonomously.

Core claim

The paper states that self-evolving agents, which update models, memory, tools, and architecture from interactions and feedback, overcome the static limits of large language models. It organizes the literature by the components that evolve, the timing of adaptation such as within a single run or across runs, and the mechanisms that produce change including scalar rewards or textual signals in single-agent or multi-agent settings. The survey supplies tailored benchmarks, lists applications, and lists remaining obstacles on the route to agents that reach super-intelligence without further human intervention.

What carries the argument

The three-dimensional framework that asks what parts of an agent to evolve, when to trigger adaptation, and how to implement the changes through rewards, feedback, or multi-agent coordination.

If this is right

Agent components such as models, memory, tools, and overall architecture can be updated through experience.
Adaptation can occur inside a single test run or between separate runs.
Change can be guided by scalar rewards, textual feedback, or interactions among multiple agents.
Specialized metrics and benchmarks exist to track whether evolution improves performance over time.
Domains such as coding, education, and healthcare gain from agents that keep improving without manual updates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the framework holds, agents could enter entirely new domains with little or no initial human data.
The same structure of what-when-how might apply to physical systems that must learn from real-world sensor streams.
Safety requirements could force limits on how freely an agent is allowed to rewrite its own code or goals.
Multiple agents evolving together might produce collective behaviors that no single designer intended.

Load-bearing premise

Self-evolving agents will serve as the primary route to artificial super intelligence instead of other scaling or training approaches.

What would settle it

A demonstration that a non-evolving static agent reaches super-intelligent results on open-ended interactive tasks without any self-modification would show that evolution is unnecessary.

read the original abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks but remain fundamentally static, unable to adapt their internal parameters to novel tasks, evolving knowledge domains, or dynamic interaction contexts. As LLMs are increasingly deployed in open-ended, interactive environments, this static nature has become a critical bottleneck, necessitating agents that can adaptively reason, act, and evolve in real time. This paradigm shift -- from scaling static models to developing self-evolving agents -- has sparked growing interest in architectures and methods enabling continual learning and adaptation from data, interactions, and experiences. This survey provides the first systematic and comprehensive review of self-evolving agents, organizing the field around three foundational dimensions: what, when, and how to evolve. We examine evolutionary mechanisms across agent components (e.g., models, memory, tools, architecture), categorize adaptation methods by stages (e.g., intra-test-time, inter-test-time), and analyze the algorithmic and architectural designs that guide evolutionary adaptation (e.g., scalar rewards, textual feedback, single-agent and multi-agent systems). Additionally, we analyze evaluation metrics and benchmarks tailored for self-evolving agents, highlight applications in domains such as coding, education, and healthcare, and identify critical challenges and research directions in safety, scalability, and co-evolutionary dynamics. By providing a structured framework for understanding and designing self-evolving agents, this survey establishes a roadmap for advancing more adaptive, robust, and versatile agentic systems in both research and real-world deployments, and ultimately sheds light on the realization of Artificial Super Intelligence (ASI) where agents evolve autonomously and perform beyond human-level intelligence across tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey gives a useful three-axis taxonomy for self-evolving agents but stays descriptive and adds no new technical results or tests.

read the letter

The main point is a literature survey that organizes work on agents that adapt beyond static LLMs into three dimensions: what to evolve (models, memory, tools, architecture), when to evolve (intra-test-time versus inter-test-time), and how to evolve (rewards, textual feedback, single-agent or multi-agent designs). It also covers benchmarks, applications in coding and healthcare, and open issues like safety and scalability. The structure pulls scattered papers into one map, which is the clearest contribution here. The sections on evolutionary mechanisms and adaptation stages are straightforward and cite relevant prior work without obvious holes at the high level. The discussion of challenges feels practical rather than overstated. As a survey it does not run new experiments or derive fresh mechanisms, so the framework itself is not validated against data or compared head-to-head with alternatives. The link to artificial super intelligence is presented as long-term motivation rather than a claim supported by current evidence, which keeps it from affecting the technical content. The paper is aimed at researchers already working on adaptive agents who need a way to navigate the literature and sketch future directions. Someone looking for concrete new algorithms or reproducible results will not find them. It is solid enough as an organizing reference to merit peer review, where referees could push for tighter comparisons or clearer boundaries between categories.

Referee Report

0 major / 3 minor

Summary. This survey provides the first systematic review of self-evolving agents for LLMs, organizing the literature around three core dimensions: what to evolve (agent components including models, memory, tools, and architecture), when to evolve (adaptation stages such as intra-test-time and inter-test-time), and how to evolve (algorithmic and architectural designs using scalar rewards, textual feedback, and single- or multi-agent systems). It further reviews evaluation metrics and benchmarks, applications in coding, education, and healthcare, and challenges in safety, scalability, and co-evolutionary dynamics, framing the work as a roadmap toward more adaptive agents and ultimately ASI.

Significance. If the taxonomy is comprehensive and accurate, the survey consolidates an emerging subfield by offering a structured framework that can guide researchers in identifying gaps and designing experiments. It explicitly credits prior literature and highlights forward-looking challenges without introducing new empirical claims, which strengthens its utility as a reference while the ASI connection serves as motivational context rather than a load-bearing premise.

minor comments (3)

[Abstract] Abstract: the claim of being the 'first systematic and comprehensive review' would benefit from a brief explicit comparison to the most closely related prior surveys (e.g., on agentic LLMs or continual learning) to substantiate novelty.
[Section 2-4] The three-dimensional taxonomy is clearly motivated, but a summary table mapping representative papers to the 'what/when/how' categories would improve navigation and reduce potential overlap between sections.
[Evaluation] Evaluation metrics section: ensure that the discussion of benchmarks includes explicit limitations of current datasets (e.g., lack of long-horizon evolution tracking) to balance the positive coverage.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of our survey and for recommending acceptance. Their summary accurately captures the paper's contributions in providing the first systematic framework for self-evolving agents organized around the what, when, and how dimensions.

Circularity Check

0 steps flagged

No significant circularity in survey's descriptive taxonomic framework

full rationale

This is a literature survey whose contribution is a high-level organizational taxonomy of existing work on self-evolving agents, structured around the dimensions of what, when, and how to evolve. No new equations, fitted parameters, predictions, or derivations are introduced that could reduce to the paper's own inputs by construction. All content consists of summaries and citations to prior literature; the ASI connection is explicitly forward-looking motivation rather than a load-bearing premise or result. The framework is self-contained as a review and does not rely on self-citation chains or ansatzes for its validity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey the paper introduces no new free parameters, axioms, or invented entities; it aggregates and structures prior research without adding ungrounded postulates.

pith-pipeline@v0.9.0 · 5704 in / 1076 out tokens · 29346 ms · 2026-05-14T22:18:23.448291+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

LedgerForcing / HierarchyEmergence reciprocity / hierarchy_emergence_forces_phi echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

self-evolving agents... continuously learning from new data, interactions, and experiences in real-time, leading to systems that are more robust, versatile
InevitableStructure inevitability echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

ultimately sheds light on the realization of Artificial Super Intelligence (ASI) where agents evolve autonomously

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 26 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning
cs.AI 2026-05 accept novelty 8.0

SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.
SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning
cs.AI 2026-05 unverdicted novelty 8.0

SimWorld Studio uses a self-evolving coding agent to generate adaptive 3D environments that improve embodied agent performance, with reported gains of 18 points over fixed environments in navigation tasks.
ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents
cs.AI 2026-05 conditional novelty 7.0

ClawForge supplies a generator that turns scenario templates into reproducible command-line tasks testing state conflict handling, where the strongest frontier model scores only 45.3 percent strict accuracy.
Harnessing Agentic Evolution
cs.AI 2026-05 unverdicted novelty 7.0

AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.
EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents
cs.LG 2026-05 unverdicted novelty 7.0

EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-ben...
OLIVIA: Online Learning via Inference-time Action Adaptation for Decision Making in LLM ReAct Agents
cs.AI 2026-05 unverdicted novelty 7.0

OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on f...
Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents
cs.AI 2026-05 unverdicted novelty 7.0

Evolving-RL jointly optimizes experience extraction and utilization in LLM agents via RL with separate evaluation signals, delivering up to 98.7% relative gains on out-of-distribution tasks in ALFWorld and Mind2Web.
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
cs.AI 2026-04 unverdicted novelty 7.0

OMC framework turns multi-agent AI into self-organizing companies with Talents, Talent Market, and E²R search, achieving 84.67% success on PRDBench (15.48 points above prior art).
M$^\star$: Every Task Deserves Its Own Memory Harness
cs.PL 2026-04 unverdicted novelty 7.0

M* evolves distinct Python memory programs per task via population-based reflective search, outperforming fixed-memory baselines on conversation, planning, and reasoning benchmarks.
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
cs.CL 2025-11 unverdicted novelty 7.0

Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.
RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data
cs.RO 2026-05 unverdicted novelty 6.0

A co-evolutionary VLM-VGM loop on 500 unlabeled images raises planner success by 30 points and simulator success by 48 percent while beating fully supervised baselines.
Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation
cs.AI 2026-05 unverdicted novelty 6.0

Self-evolving LLM agents exhibit capability erosion under continual adaptation, which Capability-Preserving Evolution mitigates by raising retained simple-task performance from 41.8% to 52.8% in workflow evolution und...
FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration
cs.LG 2026-05 unverdicted novelty 6.0

FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA ...
Learning Agent Routing From Early Experience
cs.CL 2026-05 unverdicted novelty 6.0

BoundaryRouter routes queries to LLM or agent using early experience memory from a seed set, cutting inference time 60.6% versus always using agents and raising performance 28.6% versus always using direct LLM inference.
Evaluation-driven Scaling for Scientific Discovery
cs.LG 2026-04 unverdicted novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster ...
Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration
cs.AI 2026-04 unverdicted novelty 6.0

LLM agents trained with a task-success reward on self-generated knowledge can spontaneously explore and adapt to new environments without any rewards or instructions at inference, yielding 20% gains on web tasks and a...
Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization
cs.AI 2026-04 unverdicted novelty 6.0

Frontier-Eng is a new benchmark for generative optimization in engineering where agents iteratively improve designs under fixed interaction budgets using executable verifiers, with top models like GPT 5.4 showing limi...
Evaluation as Evolution: Transforming Adversarial Diffusion into Closed-Loop Curricula for Autonomous Vehicles
cs.RO 2026-04 unverdicted novelty 6.0

E² uses transport-regularized sparse control on learned reverse-time SDEs with topology-driven selection and Topological Anchoring to generate realistic adversarial scenarios, improving collision discovery by 9.01% on...
Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
cs.AI 2026-05 conditional novelty 5.0

The survey proposes the LIFE framework to unify fragmented research on collaboration, failure attribution, and self-evolution in LLM multi-agent systems into a progression toward self-organizing intelligence.
Reinforced Collaboration in Multi-Agent Flow Networks
cs.LG 2026-05 unverdicted novelty 5.0

MANGO optimizes multi-agent LLM workflows via flow networks, RL, and textual gradients, delivering up to 12.8% higher performance and 47.4% better efficiency while generalizing to new domains.
Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction
cs.AI 2026-04 unverdicted novelty 5.0

Web2BigTable introduces a bi-level multi-agent system that achieves new state-of-the-art results on wide-coverage and deep web-to-table search benchmarks through orchestration, coordination, and closed-loop reflection.
Autogenesis: A Self-Evolving Agent Protocol
cs.AI 2026-04 unverdicted novelty 5.0

Autogenesis Protocol defines resource and evolution layers for LLM agents, enabling a system that shows performance gains on long-horizon planning benchmarks.
E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning
cs.AI 2026-04 unverdicted novelty 5.0

E3-TIR integrates expert prefixes, guided branches, and self-exploration via mix policy optimization to deliver 6% better tool-use performance with under 10% of the usual synthetic data and 1.46x ROI.
OxyGent: Making Multi-Agent Systems Modular, Observable, and Evolvable via Oxy Abstraction
cs.AI 2026-04 unverdicted novelty 4.0

OxyGent supplies a modular framework for multi-agent systems via the Oxy abstraction for composition and monitoring and the OxyBank engine for continuous automated evolution.
A Brief Overview: Agentic Reinforcement Learning In Large Language Models
cs.AI 2026-04 unverdicted novelty 2.0

The paper surveys the conceptual foundations, methodological innovations, challenges, and future directions of agentic reinforcement learning frameworks that embed cognitive capabilities like meta-reasoning and self-r...
A Brief Overview: Agentic Reinforcement Learning In Large Language Models
cs.AI 2026-04 unverdicted novelty 2.0

This review synthesizes conceptual foundations, methods, challenges, and future directions for agentic reinforcement learning in large language models.

Reference graph

Works this paper leans on

297 extracted references · 297 canonical work pages · cited by 24 Pith papers · 39 internal anchors

[1]

2025 , eprint=

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems , author=. 2025 , eprint=

work page 2025
[2]

and Wong, Kam-Fai , title =

Wang, Hongru and Qin, Yujia and Lin, Yankai and Pan, Jeff Z. and Wong, Kam-Fai , title =. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2024 , isbn =. doi:10.1145/3626772.3661381 , abstract =

work page doi:10.1145/3626772.3661381 2024
[3]

2025 , eprint=

Toward a Theory of Agents as Tool-Use Decision-Makers , author=. 2025 , eprint=

work page 2025
[4]

A survey on large language model based autonomous agents , volume =

Wang, Lei and Ma, Chen and Feng, Xueyang and Zhang, Zeyu and Yang, Hao and Zhang, Jingsen and Chen, Zhiyuan and Tang, Jiakai and Chen, Xu and Lin, Yankai and Zhao, Wayne Xin and Wei, Zhewei and Wen, Jirong , year=. A survey on large language model based autonomous agents , volume=. Frontiers of Computer Science , publisher=. doi:10.1007/s11704-024-40231-1...

work page doi:10.1007/s11704-024-40231-1
[5]

2025 , eprint=

Large Language Model Agent: A Survey on Methodology, Applications and Challenges , author=. 2025 , eprint=

work page 2025
[6]

2025 , eprint=

G\"odel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement , author=. 2025 , eprint=

work page 2025
[8]

Agentgen: Enhancing planning abilities for large language model based agent via environment and task generation,

Agentgen: Enhancing planning abilities for large language model based agent via environment and task generation , author=. arXiv preprint arXiv:2408.00764 , year=

work page arXiv
[9]

arXiv preprint arXiv:2505.23885 , year=

Owl: Optimized workforce learning for general multi-agent assistance in real-world task automation , author=. arXiv preprint arXiv:2505.23885 , year=

work page arXiv
[11]

arXiv preprint arXiv:2506.09046 , year=

Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation , author=. arXiv preprint arXiv:2506.09046 , year=

work page arXiv
[12]

Proceedings of COLING , year=

Towards Adaptive Mechanism Activation in Language Agents , author=. Proceedings of COLING , year=

work page
[13]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Voyager: An Open-Ended Embodied Agent with Large Language Models , author=. arXiv preprint arXiv:2305.16291 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[14]

arXiv preprint arXiv:2505.22501 , year=

EvolveSearch: An Iterative Self-Evolving Search Agent , author=. arXiv preprint arXiv:2505.22501 , year=

work page arXiv
[15]

Webrl: Training llm web agents via self-evolving online curriculum reinforcement learning.arXiv:2411.02337, 2024

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning , author=. arXiv preprint arXiv:2411.02337 , year=

work page arXiv
[16]

RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning

RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning , author=. arXiv preprint arXiv:2504.20073 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Tree of Thoughts: Deliberate Problem Solving with Large Language Models , volume =

Yao, Shunyu and Yu, Dian and Zhao, Jeffrey and Shafran, Izhak and Griffiths, Tom and Cao, Yuan and Narasimhan, Karthik , booktitle =. Tree of Thoughts: Deliberate Problem Solving with Large Language Models , volume =

work page
[18]

Self-Refine: Iterative Refinement with Self-Feedback , volume =

Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and Gupta, Shashank and Majumder, Bodhisattwa Prasad and Hermann, Katherine and Welleck, Sean and Yazdanbakhsh, Amir and Clark, Peter , booktitle =. Self-Refine: Iterative Refinem...

work page
[19]

2021 , isbn =

Wang, Yiwei and Wang, Wei and Liang, Yuxuan and Cai, Yujun and Hooi, Bryan , title =. 2021 , isbn =. doi:10.1145/3442381.3450025 , booktitle =

work page doi:10.1145/3442381.3450025 2021
[20]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Cha, Hyuntak and Lee, Jaeho and Shin, Jinwoo , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2021 , pages =

work page 2021
[21]

arXiv preprint arXiv:2405.03279 , year=

Lifelong knowledge editing for llms with retrieval-augmented continuous prompt learning , author=. arXiv preprint arXiv:2405.03279 , year=

work page arXiv
[22]

Advances in Neural Information Processing Systems , volume=

Large language models are semi-parametric reinforcement learning agents , author=. Advances in Neural Information Processing Systems , volume=

work page
[23]

Symbolic learning enables self-evolving agents

Symbolic learning enables self-evolving agents , author=. arXiv preprint arXiv:2406.18532 , year=

work page arXiv
[24]

arXiv preprint arXiv:2409.00872 , year=

Self-evolving Agents with reflective and memory-augmented abilities , author=. arXiv preprint arXiv:2409.00872 , year=

work page arXiv
[25]

A-MEM: Agentic Memory for LLM Agents

A-mem: Agentic memory for llm agents , author=. arXiv preprint arXiv:2502.12110 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Expel: Llm agents are experiential learners , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[27]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Mem0: Building production-ready ai agents with scalable long-term memory , author=. arXiv preprint arXiv:2504.19413 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[28]

Advances in Neural Information Processing Systems , volume=

Richelieu: Self-evolving llm-based agents for ai diplomacy , author=. Advances in Neural Information Processing Systems , volume=

work page
[29]

arXiv preprint arXiv:2410.15665 , year=

Long term memory: The foundation of ai self-evolution , author=. arXiv preprint arXiv:2410.15665 , year=

work page arXiv
[30]

arXiv preprint arXiv:2505.16067 , year=

How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior , author=. arXiv preprint arXiv:2505.16067 , year=

work page arXiv
[31]

arXiv e-prints , pages=

Autoguide: Automated generation and selection of state-aware guidelines for large language model agents , author=. arXiv e-prints , pages=

work page
[32]

arXiv preprint arXiv:2503.21760 , year=

Meminsight: Autonomous memory augmentation for llm agents , author=. arXiv preprint arXiv:2503.21760 , year=

work page arXiv
[33]

arXiv preprint arXiv:2310.00656 , year=

Lego-prover: Neural theorem proving with growing libraries , author=. arXiv preprint arXiv:2310.00656 , year=

work page arXiv
[34]

Cognitive Memory in Large Language Models, April 2025

Cognitive memory in large language models , author=. arXiv preprint arXiv:2504.02441 , year=

work page arXiv
[35]

Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

work page
[36]

Unleashing infinite-length input capacity for large-scale language models with self-controlled memory system

Enhancing large language model with self-controlled memory framework , author=. arXiv preprint arXiv:2304.13343 , year=

work page arXiv
[37]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Memorybank: Enhancing large language models with long-term memory , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[38]

G- memory: Tracing hierarchical memory for multi-agent systems.arXiv, 2025

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems , author=. arXiv preprint arXiv:2506.07398 , year=

work page arXiv
[39]

arXiv preprint arXiv:2502.16923 , year=

A systematic survey of automatic prompt optimization techniques , author=. arXiv preprint arXiv:2502.16923 , year=

work page arXiv
[40]

arXiv preprint arXiv:2406.11132 , year=

Reprompt: Planning by automatic prompt engineering for large language models agents , author=. arXiv preprint arXiv:2406.11132 , year=

work page arXiv
[41]

Promptbreeder: Self-referential self-improvement via prompt evolution,

Promptbreeder: Self-referential self-improvement via prompt evolution , author=. arXiv preprint arXiv:2309.16797 , year=

work page arXiv
[42]

The Eleventh International Conference on Learning Representations , year=

Large language models are human-level prompt engineers , author=. The Eleventh International Conference on Learning Representations , year=

work page
[43]

gradient descent

Automatic prompt optimization with" gradient descent" and beam search , author=. arXiv preprint arXiv:2305.03495 , year=

work page arXiv
[44]

arXiv preprint arXiv:2411.07446 , year=

Efficient and Accurate Prompt Optimization: the Benefit of Memory in Exemplar-Guided Reflection , author=. arXiv preprint arXiv:2411.07446 , year=

work page arXiv
[45]

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

Dspy: Compiling declarative language model calls into self-improving pipelines , author=. arXiv preprint arXiv:2310.03714 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[46]

arXiv preprint arXiv:2310.06762 , year=

Trace: A comprehensive benchmark for continual learning in large language models , author=. arXiv preprint arXiv:2310.06762 , year=

work page arXiv
[47]

arXiv preprint arXiv:2310.16427 , year=

Promptagent: Strategic planning with language models enables expert-level prompt optimization , author=. arXiv preprint arXiv:2310.16427 , year=

work page arXiv
[48]

Large Language Models as Optimizers

Large language models as optimizers , author=. arXiv preprint arXiv:2309.03409 , year=

work page internal anchor Pith review arXiv
[49]

arXiv preprint arXiv:2312.17025 , year=

Experiential co-learning of software-developing agents , author=. arXiv preprint arXiv:2312.17025 , year=

work page arXiv
[50]

arXiv preprint arXiv:2412.03092 , year=

Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization , author=. arXiv preprint arXiv:2412.03092 , year=

work page arXiv
[51]

arXiv preprint arXiv:2402.08702 , year=

PRompt Optimization in Multi-Step Tasks (PROMST): Integrating Human Feedback and Heuristic-based Sampling , author=. arXiv preprint arXiv:2402.08702 , year=

work page arXiv
[52]

arXiv preprint arXiv:2502.02533 , year=

Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies , author=. arXiv preprint arXiv:2502.02533 , year=

work page arXiv
[53]

Evoagent: Towards automatic multi-agent generation via evolutionary algorithms

Evoagent: Towards automatic multi-agent generation via evolutionary algorithms , author=. arXiv preprint arXiv:2406.14228 , year=

work page arXiv
[54]

arXiv e-prints , pages=

LLM-AutoDiff: Auto-Differentiate Any LLM Workflow , author=. arXiv e-prints , pages=

work page
[55]

CoRR , volume =

Jiahao Qiu and Xuan Qi and Tongcheng Zhang and Xinzhe Juan and Jiacheng Guo and Yifu Lu and Yimin Wang and Zixin Yao and Qihan Ren and Xun Jiang and Xing Zhou and Dongrui Liu and Ling Yang and Yue Wu and Kaixuan Huang and Shilong Liu and Hongru Wang and Mengdi Wang , title =. CoRR , volume =

work page
[56]

Hujaifa Islam and Hasmot Ali and Kishor Datta Gupta and Roy George , title =

Mohd Ariful Haque and Justin Williams and Sunzida Siddique and Md. Hujaifa Islam and Hasmot Ali and Kishor Datta Gupta and Roy George , title =. CoRR , volume =

work page
[57]

Fatemi and Xiaolong Jin and Zora Zhiruo Wang and Apurva Gandhi and Yueqi Song and Yu Gu and Jayanth Srinivasa and Gaowen Liu and Graham Neubig and Yu Su , title =

Boyuan Zheng and Michael Y. Fatemi and Xiaolong Jin and Zora Zhiruo Wang and Apurva Gandhi and Yueqi Song and Yu Gu and Jayanth Srinivasa and Gaowen Liu and Graham Neubig and Yu Su , title =. CoRR , volume =

work page
[58]

Empowering Large Language Model Agents through Action Learning , journal =

Haiteng Zhao and Chang Ma and Guoyin Wang and Jing Su and Lingpeng Kong and Jingjing Xu and Zhi. Empowering Large Language Model Agents through Action Learning , journal =

work page
[59]

From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions , booktitle =

Changle Qu and Sunhao Dai and Xiaochi Wei and Hengyi Cai and Shuaiqiang Wang and Dawei Yin and Jun Xu and Ji. From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions , booktitle =

work page
[60]

Renxi Wang and Xudong Han and Lei Ji and Shu Wang and Timothy Baldwin and Haonan Li , title =

work page
[61]

Yu Shang and Yu Li and Keyu Zhao and Likai Ma and Jiahe Liu and Fengli Xu and Yong Li , title =

work page
[62]

2025 , eprint=

Zhang, Jenny and Hu, Shengran and Lu, Cong and Lange, Robert and Clune, Jeff , journal=. 2025 , eprint=

work page 2025
[63]

2024 , url=

Yuan, Lifan and Chen, Yangyi and Wang, Xingyao and Fung, Yi and Peng, Hao and Ji, Heng , booktitle=. 2024 , url=

work page 2024
[64]

2023 , eprint=

Cai, Tianle and Wang, Xuezhi and Ma, Tengyu and Chen, Xinyun and Zhou, Denny , journal=. 2023 , eprint=

work page 2023
[65]

2023 , url=

Qian, Cheng and Han, Chi and Fung, Yi and Qin, Yujia and Liu, Zhiyuan and Ji, Heng , booktitle=. 2023 , url=

work page 2023
[66]

Toolformer: Language Models Can Teach Themselves to Use Tools , booktitle =

Timo Schick and Jane Dwivedi. Toolformer: Language Models Can Teach Themselves to Use Tools , booktitle =

work page
[67]

Patil and Tianjun Zhang and Xin Wang and Joseph E

Shishir G. Patil and Tianjun Zhang and Xin Wang and Joseph E. Gonzalez , title =. NeurIPS , year =

work page
[68]

CoRR , volume =

Zhengliang Shi and Yuhan Wang and Lingyong Yan and Pengjie Ren and Shuaiqiang Wang and Dawei Yin and Zhaochun Ren , title =. CoRR , volume =

work page
[69]

Towards Completeness-Oriented Tool Retrieval for Large Language Models , booktitle =

Changle Qu and Sunhao Dai and Xiaochi Wei and Hengyi Cai and Shuaiqiang Wang and Dawei Yin and Jun Xu and Ji. Towards Completeness-Oriented Tool Retrieval for Large Language Models , booktitle =

work page
[70]

Yuanhang Zheng and Peng Li and Wei Liu and Yang Liu and Jian Luan and Bin Wang , title =

work page
[71]

CoRR , volume =

Hang Gao and Yongfeng Zhang , title =. CoRR , volume =

work page
[72]

Kolby Nottingham and Bodhisattwa Prasad Majumder and Bhavana Dalvi Mishra and Sameer Singh and Peter Clark and Roy Fox , title =

work page
[73]

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

Mle-bench: Evaluating machine learning agents on machine learning engineering , author=. arXiv preprint arXiv:2410.07095 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[74]

, author Chen, S

Scienceagentbench: Toward rigorous assessment of language agents for data-driven scientific discovery , author=. arXiv preprint arXiv:2410.05080 , year=

work page arXiv
[75]

BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

Browsecomp: A simple yet challenging benchmark for browsing agents , author=. arXiv preprint arXiv:2504.12516 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[76]

arXiv preprint arXiv:2410.06703 , year=

St-webagentbench: A benchmark for evaluating safety and trustworthiness in web agents , author=. arXiv preprint arXiv:2410.06703 , year=

work page arXiv
[77]

arXiv preprint arXiv:2501.07572 , year=

Webwalker: Benchmarking llms in web traversal , author=. arXiv preprint arXiv:2501.07572 , year=

work page arXiv
[78]

The Twelfth International Conference on Learning Representations , year=

Gaia: a benchmark for general ai assistants , author=. The Twelfth International Conference on Learning Representations , year=

work page
[79]

AgentBench: Evaluating LLMs as Agents

Agentbench: Evaluating llms as agents , author=. arXiv preprint arXiv:2308.03688 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[80]

arXiv preprint arXiv:2503.13856 , year=

Mdteamgpt: A self-evolving llm-based multi-agent framework for multi-disciplinary team medical consultation , author=. arXiv preprint arXiv:2503.13856 , year=

work page arXiv
[81]

Multiagentbench: Evaluating the collaboration and competition of llm agents,

Multiagentbench: Evaluating the collaboration and competition of llm agents , author=. arXiv preprint arXiv:2503.01935 , year=

work page arXiv
[82]

arXiv preprint arXiv:2410.16946 , year=

Self-evolving multi-agent collaboration networks for software development , author=. arXiv preprint arXiv:2410.16946 , year=

work page arXiv

Showing first 80 references.