pith. sign in

arxiv: 2605.17721 · v1 · pith:7JHGNWMEnew · submitted 2026-05-18 · 💻 cs.AI

EXG: Self-Evolving Agents with Experience Graphs

Pith reviewed 2026-05-19 22:14 UTC · model grok-4.3

classification 💻 cs.AI
keywords self-evolving agentsexperience graphLLM agentsstructured memorycross-task reuseonline experienceagent improvement
0
0 comments X

The pith

EXG turns agent successes and failures into a connected graph for instant reuse across tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EXG as a way to organize what an agent learns from its own runs into a graph that links related successes and failures. This structured form replaces scattered reflections or loose memory stores, letting the agent pull in useful past results right when needed for new problems. A sympathetic reader would see this as a step toward agents that keep getting better on the job rather than staying fixed after initial setup. The graph can grow while the agent works and can also be saved for later use as ready-made memory. Experiments on code and reasoning tasks indicate it delivers better results with less wasted effort than earlier approaches.

Core claim

EXG is the first experience graph designed for self-evolving agents, supporting both online, real-time graph growth during execution for immediate cross-task experience reuse, and offline reuse of a consolidated experience graph as an external memory module. This design also enables EXG to serve as a plug-and-play component for existing self-evolving agents, organizing prior experience into a unified experience graph and improving both solution quality and resource efficiency as deployment progresses.

What carries the argument

The experience graph, which explicitly organizes accumulated successes and failures into a structured, relational representation for real-time growth and consolidated reuse.

If this is right

  • Agents gain immediate cross-task reuse from experiences gathered during execution.
  • A consolidated graph can be used offline as external memory to boost later performance.
  • Existing self-evolving agents can adopt the graph as a plug-in to organize their prior experience.
  • Overall performance-efficiency trade-offs improve compared with ad hoc reflection or fragmented memory.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The graph approach could be tested in domains beyond code and reasoning, such as tool-use or planning agents, to check whether relational linking scales to longer task chains.
  • If the structure keeps overhead low, it might reduce reliance on periodic retraining by letting agents carry forward lessons in a compact, queryable form.
  • Connections to graph-based memory systems in other AI work could be explored to see whether the same relational pattern supports transfer between entirely different agent types.

Load-bearing premise

Successes and failures accumulated during agent execution can be effectively captured and related in a graph structure that enables immediate and transferable reuse without fragmentation or high overhead.

What would settle it

A direct comparison on the same code generation and reasoning benchmarks showing that agents equipped with the experience graph produce no measurable gains in solution quality or resource efficiency over reflection-only or unstructured-memory baselines.

Figures

Figures reproduced from arXiv: 2605.17721 by Hanchen Wang, Lu Qin, Siyuan Zhang, Wenjie Zhang, Ying Zhang, Yuxin Jin.

Figure 1
Figure 1. Figure 1: Overview of the self-evolving experience graph. (a) Trajectory produces structured cases from agent interactions. (b) [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: EXG structured prompt architecture. first initializes a provisional case 𝑐𝑞, which contains the task input and contextual information but does not yet include an output or correctness outcome. Conditioned on 𝑐𝑞, the experience graph G is queried to retrieve relevant prior cases using the graph retrieval and reranking procedures defined in the EXG design. Based on the reranked cases, EXG constructs a set of… view at source ↗
Figure 3
Figure 3. Figure 3: Offline self-evolving via graph reuse. EXG is pre [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Average number of LLM calls per task under the [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Latency breakdown under the online setting on [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Token usage breakdown on HumanEval and MuSiQue. Each bar shows the total number of tokens con￾sumed, with input tokens stacked below output tokens. a deliberate rise in input tokens (+20.0%), reflecting the injection of experience hints, while output tokens are reduced by 19.3%. Compared to SE-Agent-Lite, which incurs 158,125 total tokens, EXG reduces total token consumption by 20.8%, with a 37.8% reductio… view at source ↗
Figure 8
Figure 8. Figure 8: Learning curves on HumanEval. to improve as more tasks are seen. By around 60 tasks, EXG-based methods reach approximately 85%, already exceeding baseline per￾formance by about 7–10 percentage points. By the end of the task sequence, EXG-based methods achieve a cumulative Pass@2 close to 90%, compared to 75–78% for baseline methods. This corresponds to an absolute late-stage improvement of roughly 12–15 pe… view at source ↗
Figure 9
Figure 9. Figure 9: Learning curves on MuSiQue. exposure increases, baseline methods exhibit limited progression: even after around 60 tasks, their Pass@1 rises only marginally to approximately 36–38%, after which further gains largely diminish. In contrast, EXG-based methods collectively display a qualitatively different trajectory. The core EXG-based method shows a consistent upward trend as experience accumulates, reaching… view at source ↗
read the original abstract

Large language model (LLM)-based agents have demonstrated strong capabilities in complex reasoning and problem solving through multi-step interactions, yet most deployed agents remain behaviorally static, with knowledge acquired during execution rarely translating into systematic improvement over time. In response, a growing line of work on self-evolving agents explores how agents can improve through experience during deployment, but most existing approaches either rely on ad hoc reflection limited to single-task correction or adopt unstructured memory that accumulates fragmented experience with delayed usability. To address this limitation, we introduce EXG, an experience graph framework for self-evolving agents that explicitly organizes accumulated successes and failures into a structured, relational representation. EXG is the first experience graph designed for self-evolving agents, supporting both online, real-time graph growth during execution for immediate cross-task experience reuse, and offline reuse of a consolidated experience graph as an external memory module. This design also enables EXG to serve as a plug-and-play component for existing self-evolving agents, organizing prior experience into a unified experience graph and improving both solution quality and resource efficiency as deployment progresses. Extensive experiments across code generation and reasoning benchmarks show that EXG attains more favorable performance-efficiency trade-offs than reflection- and memory-based baselines in both online and offline evaluations. Our results suggest that structuring experience as a graph provides a principled foundation for scalable and transferable self-evolving agent behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces EXG, an experience graph framework for self-evolving LLM-based agents. It organizes accumulated successes and failures into a structured relational graph that supports online real-time growth during execution for immediate cross-task reuse, as well as offline consolidation and reuse as an external memory module. EXG is presented as a plug-and-play component that can be integrated with existing self-evolving agents to improve solution quality and resource efficiency over time. The authors report extensive experiments on code generation and reasoning benchmarks demonstrating more favorable performance-efficiency trade-offs relative to reflection- and memory-based baselines in both online and offline settings.

Significance. If the experimental results hold, the work provides a concrete, graph-structured mechanism for experience reuse that directly targets fragmentation and delayed usability issues in current self-evolving agent designs. The dual support for online growth and offline external-memory use, combined with the plug-and-play integration claim, could offer a reusable primitive for building more adaptive agents. The emphasis on efficiency alongside performance is a practical strength that distinguishes the contribution from purely reflective or flat-memory approaches.

minor comments (3)
  1. The abstract asserts 'extensive experiments' with favorable trade-offs but does not preview key metrics, baselines, or dataset details; adding a concise summary of the evaluation protocol in the abstract or introduction would improve accessibility.
  2. Clarify the precise node and edge definitions for experience fragments early in the manuscript (ideally with a small illustrative example) to make the graph-construction rules immediately understandable before the algorithmic description.
  3. Ensure that the experimental section includes explicit statements of statistical significance or variance across runs for the reported performance-efficiency trade-offs, as this is necessary to support the cross-baseline claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the significance of the EXG framework, and recommendation for minor revision. We are pleased that the dual online/offline design and plug-and-play aspects were viewed favorably.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents EXG as a new architectural framework for structuring agent experience into a relational graph, with design choices for online real-time growth and offline consolidation explicitly described as implementation decisions rather than derived predictions. No equations, parameter fits, or self-citations are invoked to force the core claims; the abstract and positioning against ad-hoc reflection rely on stated motivations and reported benchmark outcomes instead of reducing to input definitions or prior author work by construction. The framework is introduced with plug-and-play integration details that remain testable independently of any self-referential loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on the abstract, the primary addition is the EXG graph structure itself. Limited details prevent exhaustive enumeration of parameters or axioms.

axioms (1)
  • domain assumption LLM-based agents accumulate usable experience from successes and failures that can be relationally structured for reuse
    Foundational premise enabling the graph design, stated in the problem setup and solution description.
invented entities (1)
  • Experience Graph (EXG) no independent evidence
    purpose: To organize accumulated successes and failures into a structured relational representation for online and offline reuse
    Newly introduced framework component positioned as the core innovation.

pith-pipeline@v0.9.0 · 5778 in / 1246 out tokens · 35600 ms · 2026-05-19T22:14:02.846903+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 20 internal anchors

  1. [1]

    Qingyao Ai, Yichen Tang, Changyue Wang, Jianming Long, Weihang Su, and Yiqun Liu. 2025. MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems. arXiv:2510.17281 [cs.LG] https://arxiv.org/abs/2510. 17281

  2. [2]

    Zhicheng Cai, Xinyuan Guo, Yu Pei, Jiangtao Feng, Jinsong Su, Jiangjie Chen, Ya-Qin Zhang, Wei-Ying Ma, Mingxuan Wang, and Hao Zhou. 2025. FLEX: Continuous Agent Evolution via Forward Learning from Experience. arXiv:2511.06449 [cs.LG] https://arxiv.org/abs/2511.06449

  3. [3]

    Zouying Cao, Jiaji Deng, Li Yu, Weikang Zhou, Zhaoyang Liu, Bolin Ding, and Hai Zhao. 2025. Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution. arXiv:2512.10696 [cs.AI] https://arxiv.org/abs/2512.10696

  4. [4]

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

  5. [5]

    Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Huajun Chen, and Ningyu Zhang

  6. [6]

    arXiv:2510.18866 [cs.CL] https://arxiv.org/abs/2510.18866

    LightMem: Lightweight and Efficient Memory-Augmented Generation. arXiv:2510.18866 [cs.CL] https://arxiv.org/abs/2510.18866

  7. [7]

    Jackson Hassell, Dan Zhang, Hannah Kim, Tom Mitchell, and Estevam Hr- uschka. 2025. Learning from Supervision with Semantic and Episodic Mem- ory: A Reflective Approach to Agent Adaptation. arXiv:2510.19897 [cs.CL] https://arxiv.org/abs/2510.19897

  8. [8]

    Chuanrui Hu, Xingze Gao, Zuyi Zhou, Dannong Xu, Yi Bai, Xintong Li, Hui Zhang, Tong Li, Chong Zhang, Lidong Bing, and Yafeng Deng. 2026. EverMemOS: A Self- Organizing Memory Operating System for Structured Long-Horizon Reasoning. arXiv:2601.02163 [cs.AI] https://arxiv.org/abs/2601.02163

  9. [9]

    Xuechen Liang, Meiling Tao, Yinghui Xia, Jianhui Wang, Kun Li, Yijin Wang, Yangfan He, Jingsong Yang, Tianyu Shi, Yuantao Wang, Miao Zhang, and Xueqian Wang. 2025. SAGE: Self-evolving Agents with Reflective and Memory-augmented Abilities.Neurocomput.647, C (Sept. 2025), 12 pages. doi:10.1016/j.neucom.2025. 130470

  10. [10]

    Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Hongzhang Liu, Ronghao Chen, Yangfan He, Daxin Jiang, Binxing Jiao, Chen Hu, and Huacan Wang. 2025. SE-Agent: Self-Evolution Trajectory Optimiza- tion in Multi-Step Reasoning with LLM-Based Agents. arXiv:2508.02085 [cs.AI] https://arxiv.org/abs/2508.02085

  11. [11]

    Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and LINGMING ZHANG

  12. [12]

    InAdvances in Neural Information Processing Systems, A

    Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. InAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 21558–21572. https://proceedings.neurips.cc/paper_files/paper/2023/file/ 43e...

  13. [13]

    Yitao Liu, Chenglei Si, Karthik R Narasimhan, and Shunyu Yao. 2025. Contextual Experience Replay for Self-Improvement of Language Agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Moham- mad Taher Pilehvar (Eds.). Association for Co...

  14. [14]

    Zibin Liu, Cheng Zhang, Xi Zhao, Yunfei Feng, Bingyu Bai, Dahu Feng, Erhu Feng, Yubin Xia, and Haibo Chen. 2025. Beyond Training: Enabling Self-Evolution of Agents with MOBIMEM. arXiv:2512.15784 [cs.AI] https://arxiv.org/abs/2512. 15784

  15. [15]

    Hongliang Lu, Yuhang Wen, Pengyu Cheng, Ruijin Ding, Jiaqi Guo, Haotian Xu, Chutian Wang, Haonan Chen, Xiaoxi Jiang, and Guanjun Jiang. 2025. Search Self-play: Pushing the Frontier of Agent Capability without Supervision. arXiv:2510.18821 [cs.LG] https://arxiv.org/abs/2510.18821

  16. [16]

    Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. 2023. Self-Refine: Iterative Refinement with Self-Feedback. InAdvances in Neural Information Processi...

  17. [17]

    Le, Samira Daruki, Xiangru Tang, Vishy Tirumalashetty, George Lee, Mahsan Rofouei, Hangfei Lin, Jiawei Han, Chen-Yu Lee, and Tomas Pfister

    Siru Ouyang, Jun Yan, I-Hung Hsu, Yanfei Chen, Ke Jiang, Zifeng Wang, Rujun Han, Long T. Le, Samira Daruki, Xiangru Tang, Vishy Tirumalashetty, George Lee, Mahsan Rofouei, Hangfei Lin, Jiawei Han, Chen-Yu Lee, and Tomas Pfister

  18. [18]

    ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

    ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory. arXiv:2509.25140 [cs.AI] https://arxiv.org/abs/2509.25140

  19. [19]

    MemGPT: Towards LLMs as Operating Systems

    Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2024. MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560 [cs.AI] https://arxiv.org/abs/2310.08560

  20. [20]

    Bernstein

    Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, ...

  21. [21]

    Hongjin Qian, Zhao Cao, and Zheng Liu. 2026. MemoBrain: Executive Memory as an Agentic Brain for Reasoning. arXiv:2601.08079 [cs.AI] https://arxiv.org/ abs/2601.08079

  22. [22]

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: language agents with verbal reinforcement learn- ing. InAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Asso- ciates, Inc., 8634–8652. https://proceedings.neurips...

  23. [23]

    Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal

  24. [24]

    ♫ M u S i Q ue: Multihop Questions via Single-hop Question Composition

    MuSiQue: Multihop Questions via Single-hop Question Composition. Transactions of the Association for Computational Linguistics10 (2022), 539–554. doi:10.1162/tacl_a_00475

  25. [25]

    Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv:2305.16291 [cs.AI] https://arxiv.org/ abs/2305.16291

  26. [26]

    Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre- Trained Transformers. InAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 5776–5788. https://proc...

  27. [27]

    Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv:2203.11171 [cs.CL] https://arxiv.org/abs/2203.11171

  28. [28]

    Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, and Graham Neubig. 2024. Agent Workflow Memory. arXiv:2409.07429 [cs.CL] https://arxiv.org/abs/2409.07429

  29. [29]

    Chi, Quoc V

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems(New Orleans, LA, USA) (NIPS ’22). Curran Associates Inc., Red Hook, NY...

  30. [30]

    Rubin Wei, Jiaqi Cao, Jiarui Wang, Jushi Kai, Qipeng Guo, Bowen Zhou, and Zhouhan Lin. 2025. MLP Memory: A Retriever-Pretrained Memory for Large Language Models. arXiv:2508.01832 [cs.CL] https://arxiv.org/abs/2508.01832 , , Yuxin Jin, Siyuan Zhang, Hanchen Wang, Lu Qin, Ying Zhang, and Wenjie Zhang

  31. [31]

    Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

    Tianxin Wei, Noveen Sachdeva, Benjamin Coleman, Zhankui He, Yuanchen Bei, Xuying Ning, Mengting Ai, Yunzhe Li, Jingrui He, Ed H. Chi, Chi Wang, Shuo Chen, Fernando Pereira, Wang-Cheng Kang, and Derek Zhiyuan Cheng. 2025. Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory. arXiv:2511.20857 [cs.CL] https://arxiv.org/abs/2511.20857

  32. [32]

    Rebecca Westhäußer, Wolfgang Minker, and Sebatian Zepf. 2025. Enabling Personalized Long-term Interactions in LLM-based Agents through Persistent Memory and User Profiles. arXiv:2510.07925 [cs.AI] https://arxiv.org/abs/2510. 07925

  33. [33]

    Rong Wu, Xiaoman Wang, Jianbiao Mei, Pinlong Cai, Daocheng Fu, Cheng Yang, Licheng Wen, Xuemeng Yang, Yufan Shen, Yuxin Wang, and Botian Shi. 2025. EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle. arXiv:2510.16079 [cs.CL] https://arxiv.org/abs/2510.16079

  34. [34]

    Peng Xia, Kaide Zeng, Jiaqi Liu, Can Qin, Fang Wu, Yiyang Zhou, Caiming Xiong, and Huaxiu Yao. 2025. Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning. arXiv:2511.16043 [cs.LG] https://arxiv.org/abs/ 2511.16043

  35. [35]

    Zidi Xiong, Yuping Lin, Wenya Xie, Pengfei He, Zirui Liu, Jiliang Tang, Himabindu Lakkaraju, and Zhen Xiang. 2025. How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior. arXiv:2505.16067 [cs.AI] https://arxiv.org/abs/2505.16067

  36. [36]

    Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang

  37. [37]

    A-MEM: Agentic Memory for LLM Agents

    A-MEM: Agentic Memory for LLM Agents. arXiv:2502.12110 [cs.CL] https://arxiv.org/abs/2502.12110

  38. [38]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

  39. [39]

    Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. 2024. Large Language Models as Op- timizers. InInternational Conference on Learning Representations, B. Kim, Y. Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y. Sun (Eds.), Vol. 2024. 12028–12068. https://proceedings.iclr.cc/paper_files/paper/2024/file/ 3339f19c5...

  40. [40]

    Cheng Yang, Xuemeng Yang, Licheng Wen, Daocheng Fu, Jianbiao Mei, Rong Wu, Pinlong Cai, Yufan Shen, Nianchen Deng, Botian Shi, Yu Qiao, and Haifeng Li. 2025. Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks. arXiv:2510.08002 [cs.CL] https://arxiv.org/abs/2510.08002

  41. [41]

    Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (...

  42. [42]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629 [cs.CL] https://arxiv.org/abs/2210.03629

  43. [43]

    Yi Yu, Liuyi Yao, Yuexiang Xie, Qingquan Tan, Jiaqi Feng, Yaliang Li, and Libing Wu. 2026. Agentic Memory: Learning Unified Long-Term and Short-Term Mem- ory Management for Large Language Model Agents. arXiv:2601.01885 [cs.CL] https://arxiv.org/abs/2601.01885

  44. [44]

    Zhenrui Yue, Kartikeya Upasani, Xianjun Yang, Suyu Ge, Shaoliang Nie, Yuning Mao, Zhe Liu, and Dong Wang. 2026. Dr. Zero: Self-Evolving Search Agents without Training Data. arXiv:2601.07055 [cs.AI] https://arxiv.org/abs/2601.07055

  45. [45]

    Yunpeng Zhai, Shuchang Tao, Cheng Chen, Anni Zou, Ziqian Chen, Qingxu Fu, Shinji Mai, Li Yu, Jiaji Deng, Zouying Cao, Zhaoyang Liu, Bolin Ding, and Jingren Zhou. 2025. AgentEvolver: Towards Efficient Self-Evolving Agent System. arXiv:2511.10395 [cs.LG] https://arxiv.org/abs/2511.10395

  46. [46]

    Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, and Shuicheng Yan. 2025. G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems. arXiv:2506.07398 [cs.MA] https://arxiv.org/abs/2506.07398

  47. [47]

    Guibin Zhang, Haotian Ren, Chong Zhan, Zhenhong Zhou, Junhao Wang, He Zhu, Wangchunshu Zhou, and Shuicheng Yan. 2025. MemEvolve: Meta-Evolution of Agent Memory Systems. arXiv:2512.18746 [cs.CL] https://arxiv.org/abs/2512. 18746

  48. [48]

    Kai Zhang, Xiangchao Chen, Bo Liu, Tianci Xue, Zeyi Liao, Zhihan Liu, Xiyao Wang, Yuting Ning, Zhaorun Chen, Xiaohan Fu, Jian Xie, Yuxuan Sun, Boyu Gou, Qi Qi, Zihang Meng, Jianwei Yang, Ning Zhang, Xian Li, Ashish Shah, Dat Huynh, Hengduo Li, Zi Yang, Sara Cao, Lawrence Jang, Shuyan Zhou, Jiacheng Zhu, Huan Sun, Jason Weston, Yu Su, and Yifan Wu. 2025. A...

  49. [49]

    Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, and Kunle Olukotun. 2025. Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models. arXiv:2510.04618 [cs.LG] https://arxiv.org/abs/2510.04618

  50. [50]

    Shuyu Zhang, Yujie Liu, Xinru Wang, Cheng Zhang, Yanmin Zhu, and Bin Li

  51. [51]

    DarwinTOD: LLM-driven Lifelong Self-evolution for Task-oriented Dialog Systems

    DarwinTOD: LLM Driven Lifelong Self Evolution for Task Oriented Dialog Systems. arXiv:2601.07248 [cs.MA] https://arxiv.org/abs/2601.07248

  52. [52]

    Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Weinan Zhang, Ying Wen, Zhiyu Li, Feiyu Xiong, Yutao Qi, Bo Tang, and Muning Wen

  53. [53]

    MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

    MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory. arXiv:2601.03192 [cs.CL] https://arxiv.org/abs/2601.03192

  54. [54]

    Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A Survey on the Memory Mechanism of Large Language Model-based Agents.ACM Trans. Inf. Syst.43, 6, Article 155 (Sept. 2025), 47 pages. doi:10.1145/3748302

  55. [55]

    Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. 2024. ExpeL: LLM agents are experiential learners. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Confer- ence on Innovative Applications of Artificial Intelligence and Fourteenth Sympo- sium on Educational Advances in Artifici...

  56. [56]

    Longtao Zheng, Rundong Wang, Xinrun Wang, and Bo An. 2024. Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control. arXiv:2306.07863 [cs.AI] https://arxiv.org/abs/2306.07863 EXG: Self-Evolving Agents with Experience Graphs , , A Algorithmic Details Algorithm 2 details the procedure for constructing structured ex- perience hints from a r...