EXG: Self-Evolving Agents with Experience Graphs

arxiv: 2605.17721 · v1 · pith:7JHGNWMEnew · submitted 2026-05-18 · 💻 cs.AI

EXG: Self-Evolving Agents with Experience Graphs

Yuxin Jin , Siyuan Zhang , Hanchen Wang , Lu Qin , Ying Zhang , Wenjie Zhang This is my paper

Pith reviewed 2026-05-19 22:14 UTC · model grok-4.3

classification 💻 cs.AI

keywords self-evolving agentsexperience graphLLM agentsstructured memorycross-task reuseonline experienceagent improvement

0 comments p. Extension

pith:7JHGNWME Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{7JHGNWME}

Prints a linked pith:7JHGNWME badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

EXG turns agent successes and failures into a connected graph for instant reuse across tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EXG as a way to organize what an agent learns from its own runs into a graph that links related successes and failures. This structured form replaces scattered reflections or loose memory stores, letting the agent pull in useful past results right when needed for new problems. A sympathetic reader would see this as a step toward agents that keep getting better on the job rather than staying fixed after initial setup. The graph can grow while the agent works and can also be saved for later use as ready-made memory. Experiments on code and reasoning tasks indicate it delivers better results with less wasted effort than earlier approaches.

Core claim

EXG is the first experience graph designed for self-evolving agents, supporting both online, real-time graph growth during execution for immediate cross-task experience reuse, and offline reuse of a consolidated experience graph as an external memory module. This design also enables EXG to serve as a plug-and-play component for existing self-evolving agents, organizing prior experience into a unified experience graph and improving both solution quality and resource efficiency as deployment progresses.

What carries the argument

The experience graph, which explicitly organizes accumulated successes and failures into a structured, relational representation for real-time growth and consolidated reuse.

If this is right

Agents gain immediate cross-task reuse from experiences gathered during execution.
A consolidated graph can be used offline as external memory to boost later performance.
Existing self-evolving agents can adopt the graph as a plug-in to organize their prior experience.
Overall performance-efficiency trade-offs improve compared with ad hoc reflection or fragmented memory.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The graph approach could be tested in domains beyond code and reasoning, such as tool-use or planning agents, to check whether relational linking scales to longer task chains.
If the structure keeps overhead low, it might reduce reliance on periodic retraining by letting agents carry forward lessons in a compact, queryable form.
Connections to graph-based memory systems in other AI work could be explored to see whether the same relational pattern supports transfer between entirely different agent types.

Load-bearing premise

Successes and failures accumulated during agent execution can be effectively captured and related in a graph structure that enables immediate and transferable reuse without fragmentation or high overhead.

What would settle it

A direct comparison on the same code generation and reasoning benchmarks showing that agents equipped with the experience graph produce no measurable gains in solution quality or resource efficiency over reflection-only or unstructured-memory baselines.

Figures

Figures reproduced from arXiv: 2605.17721 by Hanchen Wang, Lu Qin, Siyuan Zhang, Wenjie Zhang, Ying Zhang, Yuxin Jin.

**Figure 1.** Figure 1: Overview of the self-evolving experience graph. (a) Trajectory produces structured cases from agent interactions. (b) [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: EXG structured prompt architecture. first initializes a provisional case 𝑐𝑞, which contains the task input and contextual information but does not yet include an output or correctness outcome. Conditioned on 𝑐𝑞, the experience graph G is queried to retrieve relevant prior cases using the graph retrieval and reranking procedures defined in the EXG design. Based on the reranked cases, EXG constructs a set of… view at source ↗

**Figure 3.** Figure 3: Offline self-evolving via graph reuse. EXG is pre [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Average number of LLM calls per task under the [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Latency breakdown under the online setting on [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 7.** Figure 7: Token usage breakdown on HumanEval and MuSiQue. Each bar shows the total number of tokens consumed, with input tokens stacked below output tokens. a deliberate rise in input tokens (+20.0%), reflecting the injection of experience hints, while output tokens are reduced by 19.3%. Compared to SE-Agent-Lite, which incurs 158,125 total tokens, EXG reduces total token consumption by 20.8%, with a 37.8% reductio… view at source ↗

**Figure 8.** Figure 8: Learning curves on HumanEval. to improve as more tasks are seen. By around 60 tasks, EXG-based methods reach approximately 85%, already exceeding baseline performance by about 7–10 percentage points. By the end of the task sequence, EXG-based methods achieve a cumulative Pass@2 close to 90%, compared to 75–78% for baseline methods. This corresponds to an absolute late-stage improvement of roughly 12–15 pe… view at source ↗

**Figure 9.** Figure 9: Learning curves on MuSiQue. exposure increases, baseline methods exhibit limited progression: even after around 60 tasks, their Pass@1 rises only marginally to approximately 36–38%, after which further gains largely diminish. In contrast, EXG-based methods collectively display a qualitatively different trajectory. The core EXG-based method shows a consistent upward trend as experience accumulates, reaching… view at source ↗

read the original abstract

Large language model (LLM)-based agents have demonstrated strong capabilities in complex reasoning and problem solving through multi-step interactions, yet most deployed agents remain behaviorally static, with knowledge acquired during execution rarely translating into systematic improvement over time. In response, a growing line of work on self-evolving agents explores how agents can improve through experience during deployment, but most existing approaches either rely on ad hoc reflection limited to single-task correction or adopt unstructured memory that accumulates fragmented experience with delayed usability. To address this limitation, we introduce EXG, an experience graph framework for self-evolving agents that explicitly organizes accumulated successes and failures into a structured, relational representation. EXG is the first experience graph designed for self-evolving agents, supporting both online, real-time graph growth during execution for immediate cross-task experience reuse, and offline reuse of a consolidated experience graph as an external memory module. This design also enables EXG to serve as a plug-and-play component for existing self-evolving agents, organizing prior experience into a unified experience graph and improving both solution quality and resource efficiency as deployment progresses. Extensive experiments across code generation and reasoning benchmarks show that EXG attains more favorable performance-efficiency trade-offs than reflection- and memory-based baselines in both online and offline evaluations. Our results suggest that structuring experience as a graph provides a principled foundation for scalable and transferable self-evolving agent behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EXG turns agent experiences into a graph for online reuse during runs and offline consolidation afterward, with experiments claiming better efficiency than reflection or flat memory on code and reasoning tasks.

read the letter

The main takeaway is that EXG gives agents a structured graph to store and connect successes and failures instead of scattered reflections or loose memory lists. This setup supports real-time graph growth while the agent is working, so experience can transfer across tasks immediately, plus an offline mode where the full graph acts as external memory for later use. It also works as a plug-in for other self-evolving agent setups.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces EXG, an experience graph framework for self-evolving LLM-based agents. It organizes accumulated successes and failures into a structured relational graph that supports online real-time growth during execution for immediate cross-task reuse, as well as offline consolidation and reuse as an external memory module. EXG is presented as a plug-and-play component that can be integrated with existing self-evolving agents to improve solution quality and resource efficiency over time. The authors report extensive experiments on code generation and reasoning benchmarks demonstrating more favorable performance-efficiency trade-offs relative to reflection- and memory-based baselines in both online and offline settings.

Significance. If the experimental results hold, the work provides a concrete, graph-structured mechanism for experience reuse that directly targets fragmentation and delayed usability issues in current self-evolving agent designs. The dual support for online growth and offline external-memory use, combined with the plug-and-play integration claim, could offer a reusable primitive for building more adaptive agents. The emphasis on efficiency alongside performance is a practical strength that distinguishes the contribution from purely reflective or flat-memory approaches.

minor comments (3)

The abstract asserts 'extensive experiments' with favorable trade-offs but does not preview key metrics, baselines, or dataset details; adding a concise summary of the evaluation protocol in the abstract or introduction would improve accessibility.
Clarify the precise node and edge definitions for experience fragments early in the manuscript (ideally with a small illustrative example) to make the graph-construction rules immediately understandable before the algorithmic description.
Ensure that the experimental section includes explicit statements of statistical significance or variance across runs for the reported performance-efficiency trade-offs, as this is necessary to support the cross-baseline claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the significance of the EXG framework, and recommendation for minor revision. We are pleased that the dual online/offline design and plug-and-play aspects were viewed favorably.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents EXG as a new architectural framework for structuring agent experience into a relational graph, with design choices for online real-time growth and offline consolidation explicitly described as implementation decisions rather than derived predictions. No equations, parameter fits, or self-citations are invoked to force the core claims; the abstract and positioning against ad-hoc reflection rely on stated motivations and reported benchmark outcomes instead of reducing to input definitions or prior author work by construction. The framework is introduced with plug-and-play integration details that remain testable independently of any self-referential loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on the abstract, the primary addition is the EXG graph structure itself. Limited details prevent exhaustive enumeration of parameters or axioms.

axioms (1)

domain assumption LLM-based agents accumulate usable experience from successes and failures that can be relationally structured for reuse
Foundational premise enabling the graph design, stated in the problem setup and solution description.

invented entities (1)

Experience Graph (EXG) no independent evidence
purpose: To organize accumulated successes and failures into a structured relational representation for online and offline reuse
Newly introduced framework component positioned as the core innovation.

pith-pipeline@v0.9.0 · 5778 in / 1246 out tokens · 35600 ms · 2026-05-19T22:14:02.846903+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

EXG abstracts each completed attempt within a trajectory into a case... golden cases... warning cases... experience graph G=(V,E) with case nodes, task anchor nodes, contain/similarity/correction edges
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

online self-evolving loop... offline reuse of a consolidated experience graph as an external memory module

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 20 internal anchors

[1]

Qingyao Ai, Yichen Tang, Changyue Wang, Jianming Long, Weihang Su, and Yiqun Liu. 2025. MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems. arXiv:2510.17281 [cs.LG] https://arxiv.org/abs/2510. 17281

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Zhicheng Cai, Xinyuan Guo, Yu Pei, Jiangtao Feng, Jinsong Su, Jiangjie Chen, Ya-Qin Zhang, Wei-Ying Ma, Mingxuan Wang, and Hao Zhou. 2025. FLEX: Continuous Agent Evolution via Forward Learning from Experience. arXiv:2511.06449 [cs.LG] https://arxiv.org/abs/2511.06449

work page arXiv 2025
[3]

Zouying Cao, Jiaji Deng, Li Yu, Weikang Zhou, Zhaoyang Liu, Bolin Ding, and Hai Zhao. 2025. Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution. arXiv:2512.10696 [cs.AI] https://arxiv.org/abs/2512.10696

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[5]

Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Huajun Chen, and Ningyu Zhang

work page
[6]

arXiv:2510.18866 [cs.CL] https://arxiv.org/abs/2510.18866

LightMem: Lightweight and Efficient Memory-Augmented Generation. arXiv:2510.18866 [cs.CL] https://arxiv.org/abs/2510.18866

work page arXiv
[7]

Jackson Hassell, Dan Zhang, Hannah Kim, Tom Mitchell, and Estevam Hr- uschka. 2025. Learning from Supervision with Semantic and Episodic Mem- ory: A Reflective Approach to Agent Adaptation. arXiv:2510.19897 [cs.CL] https://arxiv.org/abs/2510.19897

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Chuanrui Hu, Xingze Gao, Zuyi Zhou, Dannong Xu, Yi Bai, Xintong Li, Hui Zhang, Tong Li, Chong Zhang, Lidong Bing, and Yafeng Deng. 2026. EverMemOS: A Self- Organizing Memory Operating System for Structured Long-Horizon Reasoning. arXiv:2601.02163 [cs.AI] https://arxiv.org/abs/2601.02163

work page arXiv 2026
[9]

Xuechen Liang, Meiling Tao, Yinghui Xia, Jianhui Wang, Kun Li, Yijin Wang, Yangfan He, Jingsong Yang, Tianyu Shi, Yuantao Wang, Miao Zhang, and Xueqian Wang. 2025. SAGE: Self-evolving Agents with Reflective and Memory-augmented Abilities.Neurocomput.647, C (Sept. 2025), 12 pages. doi:10.1016/j.neucom.2025. 130470

work page doi:10.1016/j.neucom.2025 2025
[10]

Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Hongzhang Liu, Ronghao Chen, Yangfan He, Daxin Jiang, Binxing Jiao, Chen Hu, and Huacan Wang. 2025. SE-Agent: Self-Evolution Trajectory Optimiza- tion in Multi-Step Reasoning with LLM-Based Agents. arXiv:2508.02085 [cs.AI] https://arxiv.org/abs/2508.02085

work page arXiv 2025
[11]

Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and LINGMING ZHANG

work page
[12]

InAdvances in Neural Information Processing Systems, A

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. InAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 21558–21572. https://proceedings.neurips.cc/paper_files/paper/2023/file/ 43e...

work page 2023
[13]

Yitao Liu, Chenglei Si, Karthik R Narasimhan, and Shunyu Yao. 2025. Contextual Experience Replay for Self-Improvement of Language Agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Moham- mad Taher Pilehvar (Eds.). Association for Co...

work page doi:10.18653/v1/2025.acl-long.694 2025
[14]

Zibin Liu, Cheng Zhang, Xi Zhao, Yunfei Feng, Bingyu Bai, Dahu Feng, Erhu Feng, Yubin Xia, and Haibo Chen. 2025. Beyond Training: Enabling Self-Evolution of Agents with MOBIMEM. arXiv:2512.15784 [cs.AI] https://arxiv.org/abs/2512. 15784

work page arXiv 2025
[15]

Hongliang Lu, Yuhang Wen, Pengyu Cheng, Ruijin Ding, Jiaqi Guo, Haotian Xu, Chutian Wang, Haonan Chen, Xiaoxi Jiang, and Guanjun Jiang. 2025. Search Self-play: Pushing the Frontier of Agent Capability without Supervision. arXiv:2510.18821 [cs.LG] https://arxiv.org/abs/2510.18821

work page internal anchor Pith review arXiv 2025
[16]

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. 2023. Self-Refine: Iterative Refinement with Self-Feedback. InAdvances in Neural Information Processi...

work page 2023
[17]

Le, Samira Daruki, Xiangru Tang, Vishy Tirumalashetty, George Lee, Mahsan Rofouei, Hangfei Lin, Jiawei Han, Chen-Yu Lee, and Tomas Pfister

Siru Ouyang, Jun Yan, I-Hung Hsu, Yanfei Chen, Ke Jiang, Zifeng Wang, Rujun Han, Long T. Le, Samira Daruki, Xiangru Tang, Vishy Tirumalashetty, George Lee, Mahsan Rofouei, Hangfei Lin, Jiawei Han, Chen-Yu Lee, and Tomas Pfister

work page
[18]

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory. arXiv:2509.25140 [cs.AI] https://arxiv.org/abs/2509.25140

work page internal anchor Pith review Pith/arXiv arXiv
[19]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2024. MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560 [cs.AI] https://arxiv.org/abs/2310.08560

work page internal anchor Pith review Pith/arXiv arXiv 2024
[20]

Bernstein

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, ...

work page arXiv 2023
[21]

Hongjin Qian, Zhao Cao, and Zheng Liu. 2026. MemoBrain: Executive Memory as an Agentic Brain for Reasoning. arXiv:2601.08079 [cs.AI] https://arxiv.org/ abs/2601.08079

work page arXiv 2026
[22]

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: language agents with verbal reinforcement learn- ing. InAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Asso- ciates, Inc., 8634–8652. https://proceedings.neurips...

work page 2023
[23]

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal

work page
[24]

Musique: Multihop questions via single-hop question composition.Transactions of the Association for Computational Linguistics, 10:539–554, 05 2022

MuSiQue: Multihop Questions via Single-hop Question Composition. Transactions of the Association for Computational Linguistics10 (2022), 539–554. doi:10.1162/tacl_a_00475

work page doi:10.1162/tacl_a_00475 2022
[25]

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv:2305.16291 [cs.AI] https://arxiv.org/ abs/2305.16291

work page internal anchor Pith review Pith/arXiv arXiv 2023
[26]

Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre- Trained Transformers. InAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 5776–5788. https://proc...

work page 2020
[27]

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv:2203.11171 [cs.CL] https://arxiv.org/abs/2203.11171

work page internal anchor Pith review Pith/arXiv arXiv 2023
[28]

Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, and Graham Neubig. 2024. Agent Workflow Memory. arXiv:2409.07429 [cs.CL] https://arxiv.org/abs/2409.07429

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems(New Orleans, LA, USA) (NIPS ’22). Curran Associates Inc., Red Hook, NY...

work page 2022
[30]

Rubin Wei, Jiaqi Cao, Jiarui Wang, Jushi Kai, Qipeng Guo, Bowen Zhou, and Zhouhan Lin. 2025. MLP Memory: A Retriever-Pretrained Memory for Large Language Models. arXiv:2508.01832 [cs.CL] https://arxiv.org/abs/2508.01832 , , Yuxin Jin, Siyuan Zhang, Hanchen Wang, Lu Qin, Ying Zhang, and Wenjie Zhang

work page arXiv 2025
[31]

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

Tianxin Wei, Noveen Sachdeva, Benjamin Coleman, Zhankui He, Yuanchen Bei, Xuying Ning, Mengting Ai, Yunzhe Li, Jingrui He, Ed H. Chi, Chi Wang, Shuo Chen, Fernando Pereira, Wang-Cheng Kang, and Derek Zhiyuan Cheng. 2025. Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory. arXiv:2511.20857 [cs.CL] https://arxiv.org/abs/2511.20857

work page internal anchor Pith review Pith/arXiv arXiv 2025
[32]

Rebecca Westhäußer, Wolfgang Minker, and Sebatian Zepf. 2025. Enabling Personalized Long-term Interactions in LLM-based Agents through Persistent Memory and User Profiles. arXiv:2510.07925 [cs.AI] https://arxiv.org/abs/2510. 07925

work page arXiv 2025
[33]

Rong Wu, Xiaoman Wang, Jianbiao Mei, Pinlong Cai, Daocheng Fu, Cheng Yang, Licheng Wen, Xuemeng Yang, Yufan Shen, Yuxin Wang, and Botian Shi. 2025. EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle. arXiv:2510.16079 [cs.CL] https://arxiv.org/abs/2510.16079

work page internal anchor Pith review Pith/arXiv arXiv 2025
[34]

Peng Xia, Kaide Zeng, Jiaqi Liu, Can Qin, Fang Wu, Yiyang Zhou, Caiming Xiong, and Huaxiu Yao. 2025. Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning. arXiv:2511.16043 [cs.LG] https://arxiv.org/abs/ 2511.16043

work page arXiv 2025
[35]

Zidi Xiong, Yuping Lin, Wenya Xie, Pengfei He, Zirui Liu, Jiliang Tang, Himabindu Lakkaraju, and Zhen Xiang. 2025. How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior. arXiv:2505.16067 [cs.AI] https://arxiv.org/abs/2505.16067

work page arXiv 2025
[36]

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang

work page
[37]

A-MEM: Agentic Memory for LLM Agents

A-MEM: Agentic Memory for LLM Agents. arXiv:2502.12110 [cs.CL] https://arxiv.org/abs/2502.12110

work page internal anchor Pith review Pith/arXiv arXiv
[38]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[39]

Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. 2024. Large Language Models as Op- timizers. InInternational Conference on Learning Representations, B. Kim, Y. Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y. Sun (Eds.), Vol. 2024. 12028–12068. https://proceedings.iclr.cc/paper_files/paper/2024/file/ 3339f19c5...

work page 2024
[40]

Cheng Yang, Xuemeng Yang, Licheng Wen, Daocheng Fu, Jianbiao Mei, Rong Wu, Pinlong Cai, Yufan Shen, Nianchen Deng, Botian Shi, Yu Qiao, and Haifeng Li. 2025. Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks. arXiv:2510.08002 [cs.CL] https://arxiv.org/abs/2510.08002

work page arXiv 2025
[41]

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (...

work page doi:10.18653/v1/d18- 2018
[42]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629 [cs.CL] https://arxiv.org/abs/2210.03629

work page internal anchor Pith review Pith/arXiv arXiv 2023
[43]

Yi Yu, Liuyi Yao, Yuexiang Xie, Qingquan Tan, Jiaqi Feng, Yaliang Li, and Libing Wu. 2026. Agentic Memory: Learning Unified Long-Term and Short-Term Mem- ory Management for Large Language Model Agents. arXiv:2601.01885 [cs.CL] https://arxiv.org/abs/2601.01885

work page internal anchor Pith review Pith/arXiv arXiv 2026
[44]

Zhenrui Yue, Kartikeya Upasani, Xianjun Yang, Suyu Ge, Shaoliang Nie, Yuning Mao, Zhe Liu, and Dong Wang. 2026. Dr. Zero: Self-Evolving Search Agents without Training Data. arXiv:2601.07055 [cs.AI] https://arxiv.org/abs/2601.07055

work page arXiv 2026
[45]

Yunpeng Zhai, Shuchang Tao, Cheng Chen, Anni Zou, Ziqian Chen, Qingxu Fu, Shinji Mai, Li Yu, Jiaji Deng, Zouying Cao, Zhaoyang Liu, Bolin Ding, and Jingren Zhou. 2025. AgentEvolver: Towards Efficient Self-Evolving Agent System. arXiv:2511.10395 [cs.LG] https://arxiv.org/abs/2511.10395

work page arXiv 2025
[46]

Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, and Shuicheng Yan. 2025. G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems. arXiv:2506.07398 [cs.MA] https://arxiv.org/abs/2506.07398

work page arXiv 2025
[47]

Guibin Zhang, Haotian Ren, Chong Zhan, Zhenhong Zhou, Junhao Wang, He Zhu, Wangchunshu Zhou, and Shuicheng Yan. 2025. MemEvolve: Meta-Evolution of Agent Memory Systems. arXiv:2512.18746 [cs.CL] https://arxiv.org/abs/2512. 18746

work page internal anchor Pith review Pith/arXiv arXiv 2025
[48]

Kai Zhang, Xiangchao Chen, Bo Liu, Tianci Xue, Zeyi Liao, Zhihan Liu, Xiyao Wang, Yuting Ning, Zhaorun Chen, Xiaohan Fu, Jian Xie, Yuxuan Sun, Boyu Gou, Qi Qi, Zihang Meng, Jianwei Yang, Ning Zhang, Xian Li, Ashish Shah, Dat Huynh, Hengduo Li, Zi Yang, Sara Cao, Lawrence Jang, Shuyan Zhou, Jiacheng Zhu, Huan Sun, Jason Weston, Yu Su, and Yifan Wu. 2025. A...

work page arXiv 2025
[49]

Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, and Kunle Olukotun. 2025. Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models. arXiv:2510.04618 [cs.LG] https://arxiv.org/abs/2510.04618

work page internal anchor Pith review Pith/arXiv arXiv 2025
[50]

Shuyu Zhang, Yujie Liu, Xinru Wang, Cheng Zhang, Yanmin Zhu, and Bin Li

work page
[51]

DarwinTOD: LLM-driven Lifelong Self-evolution for Task-oriented Dialog Systems

DarwinTOD: LLM Driven Lifelong Self Evolution for Task Oriented Dialog Systems. arXiv:2601.07248 [cs.MA] https://arxiv.org/abs/2601.07248

work page internal anchor Pith review Pith/arXiv arXiv
[52]

Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Weinan Zhang, Ying Wen, Zhiyu Li, Feiyu Xiong, Yutao Qi, Bo Tang, and Muning Wen

work page
[53]

MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory. arXiv:2601.03192 [cs.CL] https://arxiv.org/abs/2601.03192

work page internal anchor Pith review Pith/arXiv arXiv
[54]

Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A Survey on the Memory Mechanism of Large Language Model-based Agents.ACM Trans. Inf. Syst.43, 6, Article 155 (Sept. 2025), 47 pages. doi:10.1145/3748302

work page doi:10.1145/3748302 2025
[55]

Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. 2024. ExpeL: LLM agents are experiential learners. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Confer- ence on Innovative Applications of Artificial Intelligence and Fourteenth Sympo- sium on Educational Advances in Artifici...

work page doi:10.1609/aaai.v38i17.29936 2024
[56]

Longtao Zheng, Rundong Wang, Xinrun Wang, and Bo An. 2024. Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control. arXiv:2306.07863 [cs.AI] https://arxiv.org/abs/2306.07863 EXG: Self-Evolving Agents with Experience Graphs , , A Algorithmic Details Algorithm 2 details the procedure for constructing structured ex- perience hints from a r...

work page arXiv 2024

[1] [1]

Qingyao Ai, Yichen Tang, Changyue Wang, Jianming Long, Weihang Su, and Yiqun Liu. 2025. MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems. arXiv:2510.17281 [cs.LG] https://arxiv.org/abs/2510. 17281

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Zhicheng Cai, Xinyuan Guo, Yu Pei, Jiangtao Feng, Jinsong Su, Jiangjie Chen, Ya-Qin Zhang, Wei-Ying Ma, Mingxuan Wang, and Hao Zhou. 2025. FLEX: Continuous Agent Evolution via Forward Learning from Experience. arXiv:2511.06449 [cs.LG] https://arxiv.org/abs/2511.06449

work page arXiv 2025

[3] [3]

Zouying Cao, Jiaji Deng, Li Yu, Weikang Zhou, Zhaoyang Liu, Bolin Ding, and Hai Zhao. 2025. Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution. arXiv:2512.10696 [cs.AI] https://arxiv.org/abs/2512.10696

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page internal anchor Pith review Pith/arXiv arXiv 2021

[5] [5]

Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Huajun Chen, and Ningyu Zhang

work page

[6] [6]

arXiv:2510.18866 [cs.CL] https://arxiv.org/abs/2510.18866

LightMem: Lightweight and Efficient Memory-Augmented Generation. arXiv:2510.18866 [cs.CL] https://arxiv.org/abs/2510.18866

work page arXiv

[7] [7]

Jackson Hassell, Dan Zhang, Hannah Kim, Tom Mitchell, and Estevam Hr- uschka. 2025. Learning from Supervision with Semantic and Episodic Mem- ory: A Reflective Approach to Agent Adaptation. arXiv:2510.19897 [cs.CL] https://arxiv.org/abs/2510.19897

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

Chuanrui Hu, Xingze Gao, Zuyi Zhou, Dannong Xu, Yi Bai, Xintong Li, Hui Zhang, Tong Li, Chong Zhang, Lidong Bing, and Yafeng Deng. 2026. EverMemOS: A Self- Organizing Memory Operating System for Structured Long-Horizon Reasoning. arXiv:2601.02163 [cs.AI] https://arxiv.org/abs/2601.02163

work page arXiv 2026

[9] [9]

Xuechen Liang, Meiling Tao, Yinghui Xia, Jianhui Wang, Kun Li, Yijin Wang, Yangfan He, Jingsong Yang, Tianyu Shi, Yuantao Wang, Miao Zhang, and Xueqian Wang. 2025. SAGE: Self-evolving Agents with Reflective and Memory-augmented Abilities.Neurocomput.647, C (Sept. 2025), 12 pages. doi:10.1016/j.neucom.2025. 130470

work page doi:10.1016/j.neucom.2025 2025

[10] [10]

Jiaye Lin, Yifu Guo, Yuzhen Han, Sen Hu, Ziyi Ni, Licheng Wang, Mingguang Chen, Hongzhang Liu, Ronghao Chen, Yangfan He, Daxin Jiang, Binxing Jiao, Chen Hu, and Huacan Wang. 2025. SE-Agent: Self-Evolution Trajectory Optimiza- tion in Multi-Step Reasoning with LLM-Based Agents. arXiv:2508.02085 [cs.AI] https://arxiv.org/abs/2508.02085

work page arXiv 2025

[11] [11]

Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and LINGMING ZHANG

work page

[12] [12]

InAdvances in Neural Information Processing Systems, A

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation. InAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Glober- son, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 21558–21572. https://proceedings.neurips.cc/paper_files/paper/2023/file/ 43e...

work page 2023

[13] [13]

Yitao Liu, Chenglei Si, Karthik R Narasimhan, and Shunyu Yao. 2025. Contextual Experience Replay for Self-Improvement of Language Agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Moham- mad Taher Pilehvar (Eds.). Association for Co...

work page doi:10.18653/v1/2025.acl-long.694 2025

[14] [14]

Zibin Liu, Cheng Zhang, Xi Zhao, Yunfei Feng, Bingyu Bai, Dahu Feng, Erhu Feng, Yubin Xia, and Haibo Chen. 2025. Beyond Training: Enabling Self-Evolution of Agents with MOBIMEM. arXiv:2512.15784 [cs.AI] https://arxiv.org/abs/2512. 15784

work page arXiv 2025

[15] [15]

Hongliang Lu, Yuhang Wen, Pengyu Cheng, Ruijin Ding, Jiaqi Guo, Haotian Xu, Chutian Wang, Haonan Chen, Xiaoxi Jiang, and Guanjun Jiang. 2025. Search Self-play: Pushing the Frontier of Agent Capability without Supervision. arXiv:2510.18821 [cs.LG] https://arxiv.org/abs/2510.18821

work page internal anchor Pith review arXiv 2025

[16] [16]

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. 2023. Self-Refine: Iterative Refinement with Self-Feedback. InAdvances in Neural Information Processi...

work page 2023

[17] [17]

Le, Samira Daruki, Xiangru Tang, Vishy Tirumalashetty, George Lee, Mahsan Rofouei, Hangfei Lin, Jiawei Han, Chen-Yu Lee, and Tomas Pfister

Siru Ouyang, Jun Yan, I-Hung Hsu, Yanfei Chen, Ke Jiang, Zifeng Wang, Rujun Han, Long T. Le, Samira Daruki, Xiangru Tang, Vishy Tirumalashetty, George Lee, Mahsan Rofouei, Hangfei Lin, Jiawei Han, Chen-Yu Lee, and Tomas Pfister

work page

[18] [18]

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory. arXiv:2509.25140 [cs.AI] https://arxiv.org/abs/2509.25140

work page internal anchor Pith review Pith/arXiv arXiv

[19] [19]

MemGPT: Towards LLMs as Operating Systems

Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. 2024. MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560 [cs.AI] https://arxiv.org/abs/2310.08560

work page internal anchor Pith review Pith/arXiv arXiv 2024

[20] [20]

Bernstein

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, ...

work page arXiv 2023

[21] [21]

Hongjin Qian, Zhao Cao, and Zheng Liu. 2026. MemoBrain: Executive Memory as an Agentic Brain for Reasoning. arXiv:2601.08079 [cs.AI] https://arxiv.org/ abs/2601.08079

work page arXiv 2026

[22] [22]

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: language agents with verbal reinforcement learn- ing. InAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Asso- ciates, Inc., 8634–8652. https://proceedings.neurips...

work page 2023

[23] [23]

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal

work page

[24] [24]

Musique: Multihop questions via single-hop question composition.Transactions of the Association for Computational Linguistics, 10:539–554, 05 2022

MuSiQue: Multihop Questions via Single-hop Question Composition. Transactions of the Association for Computational Linguistics10 (2022), 539–554. doi:10.1162/tacl_a_00475

work page doi:10.1162/tacl_a_00475 2022

[25] [25]

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv:2305.16291 [cs.AI] https://arxiv.org/ abs/2305.16291

work page internal anchor Pith review Pith/arXiv arXiv 2023

[26] [26]

Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre- Trained Transformers. InAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 5776–5788. https://proc...

work page 2020

[27] [27]

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv:2203.11171 [cs.CL] https://arxiv.org/abs/2203.11171

work page internal anchor Pith review Pith/arXiv arXiv 2023

[28] [28]

Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, and Graham Neubig. 2024. Agent Workflow Memory. arXiv:2409.07429 [cs.CL] https://arxiv.org/abs/2409.07429

work page internal anchor Pith review Pith/arXiv arXiv 2024

[29] [29]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems(New Orleans, LA, USA) (NIPS ’22). Curran Associates Inc., Red Hook, NY...

work page 2022

[30] [30]

Rubin Wei, Jiaqi Cao, Jiarui Wang, Jushi Kai, Qipeng Guo, Bowen Zhou, and Zhouhan Lin. 2025. MLP Memory: A Retriever-Pretrained Memory for Large Language Models. arXiv:2508.01832 [cs.CL] https://arxiv.org/abs/2508.01832 , , Yuxin Jin, Siyuan Zhang, Hanchen Wang, Lu Qin, Ying Zhang, and Wenjie Zhang

work page arXiv 2025

[31] [31]

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

Tianxin Wei, Noveen Sachdeva, Benjamin Coleman, Zhankui He, Yuanchen Bei, Xuying Ning, Mengting Ai, Yunzhe Li, Jingrui He, Ed H. Chi, Chi Wang, Shuo Chen, Fernando Pereira, Wang-Cheng Kang, and Derek Zhiyuan Cheng. 2025. Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory. arXiv:2511.20857 [cs.CL] https://arxiv.org/abs/2511.20857

work page internal anchor Pith review Pith/arXiv arXiv 2025

[32] [32]

Rebecca Westhäußer, Wolfgang Minker, and Sebatian Zepf. 2025. Enabling Personalized Long-term Interactions in LLM-based Agents through Persistent Memory and User Profiles. arXiv:2510.07925 [cs.AI] https://arxiv.org/abs/2510. 07925

work page arXiv 2025

[33] [33]

Rong Wu, Xiaoman Wang, Jianbiao Mei, Pinlong Cai, Daocheng Fu, Cheng Yang, Licheng Wen, Xuemeng Yang, Yufan Shen, Yuxin Wang, and Botian Shi. 2025. EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle. arXiv:2510.16079 [cs.CL] https://arxiv.org/abs/2510.16079

work page internal anchor Pith review Pith/arXiv arXiv 2025

[34] [34]

Peng Xia, Kaide Zeng, Jiaqi Liu, Can Qin, Fang Wu, Yiyang Zhou, Caiming Xiong, and Huaxiu Yao. 2025. Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning. arXiv:2511.16043 [cs.LG] https://arxiv.org/abs/ 2511.16043

work page arXiv 2025

[35] [35]

Zidi Xiong, Yuping Lin, Wenya Xie, Pengfei He, Zirui Liu, Jiliang Tang, Himabindu Lakkaraju, and Zhen Xiang. 2025. How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior. arXiv:2505.16067 [cs.AI] https://arxiv.org/abs/2505.16067

work page arXiv 2025

[36] [36]

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang

work page

[37] [37]

A-MEM: Agentic Memory for LLM Agents

A-MEM: Agentic Memory for LLM Agents. arXiv:2502.12110 [cs.CL] https://arxiv.org/abs/2502.12110

work page internal anchor Pith review Pith/arXiv arXiv

[38] [38]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[39] [39]

Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. 2024. Large Language Models as Op- timizers. InInternational Conference on Learning Representations, B. Kim, Y. Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y. Sun (Eds.), Vol. 2024. 12028–12068. https://proceedings.iclr.cc/paper_files/paper/2024/file/ 3339f19c5...

work page 2024

[40] [40]

Cheng Yang, Xuemeng Yang, Licheng Wen, Daocheng Fu, Jianbiao Mei, Rong Wu, Pinlong Cai, Yufan Shen, Nianchen Deng, Botian Shi, Yu Qiao, and Haifeng Li. 2025. Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks. arXiv:2510.08002 [cs.CL] https://arxiv.org/abs/2510.08002

work page arXiv 2025

[41] [41]

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (...

work page doi:10.18653/v1/d18- 2018

[42] [42]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629 [cs.CL] https://arxiv.org/abs/2210.03629

work page internal anchor Pith review Pith/arXiv arXiv 2023

[43] [43]

Yi Yu, Liuyi Yao, Yuexiang Xie, Qingquan Tan, Jiaqi Feng, Yaliang Li, and Libing Wu. 2026. Agentic Memory: Learning Unified Long-Term and Short-Term Mem- ory Management for Large Language Model Agents. arXiv:2601.01885 [cs.CL] https://arxiv.org/abs/2601.01885

work page internal anchor Pith review Pith/arXiv arXiv 2026

[44] [44]

Zhenrui Yue, Kartikeya Upasani, Xianjun Yang, Suyu Ge, Shaoliang Nie, Yuning Mao, Zhe Liu, and Dong Wang. 2026. Dr. Zero: Self-Evolving Search Agents without Training Data. arXiv:2601.07055 [cs.AI] https://arxiv.org/abs/2601.07055

work page arXiv 2026

[45] [45]

Yunpeng Zhai, Shuchang Tao, Cheng Chen, Anni Zou, Ziqian Chen, Qingxu Fu, Shinji Mai, Li Yu, Jiaji Deng, Zouying Cao, Zhaoyang Liu, Bolin Ding, and Jingren Zhou. 2025. AgentEvolver: Towards Efficient Self-Evolving Agent System. arXiv:2511.10395 [cs.LG] https://arxiv.org/abs/2511.10395

work page arXiv 2025

[46] [46]

Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, and Shuicheng Yan. 2025. G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems. arXiv:2506.07398 [cs.MA] https://arxiv.org/abs/2506.07398

work page arXiv 2025

[47] [47]

Guibin Zhang, Haotian Ren, Chong Zhan, Zhenhong Zhou, Junhao Wang, He Zhu, Wangchunshu Zhou, and Shuicheng Yan. 2025. MemEvolve: Meta-Evolution of Agent Memory Systems. arXiv:2512.18746 [cs.CL] https://arxiv.org/abs/2512. 18746

work page internal anchor Pith review Pith/arXiv arXiv 2025

[48] [48]

Kai Zhang, Xiangchao Chen, Bo Liu, Tianci Xue, Zeyi Liao, Zhihan Liu, Xiyao Wang, Yuting Ning, Zhaorun Chen, Xiaohan Fu, Jian Xie, Yuxuan Sun, Boyu Gou, Qi Qi, Zihang Meng, Jianwei Yang, Ning Zhang, Xian Li, Ashish Shah, Dat Huynh, Hengduo Li, Zi Yang, Sara Cao, Lawrence Jang, Shuyan Zhou, Jiacheng Zhu, Huan Sun, Jason Weston, Yu Su, and Yifan Wu. 2025. A...

work page arXiv 2025

[49] [49]

Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, and Kunle Olukotun. 2025. Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models. arXiv:2510.04618 [cs.LG] https://arxiv.org/abs/2510.04618

work page internal anchor Pith review Pith/arXiv arXiv 2025

[50] [50]

Shuyu Zhang, Yujie Liu, Xinru Wang, Cheng Zhang, Yanmin Zhu, and Bin Li

work page

[51] [51]

DarwinTOD: LLM-driven Lifelong Self-evolution for Task-oriented Dialog Systems

DarwinTOD: LLM Driven Lifelong Self Evolution for Task Oriented Dialog Systems. arXiv:2601.07248 [cs.MA] https://arxiv.org/abs/2601.07248

work page internal anchor Pith review Pith/arXiv arXiv

[52] [52]

Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Weinan Zhang, Ying Wen, Zhiyu Li, Feiyu Xiong, Yutao Qi, Bo Tang, and Muning Wen

work page

[53] [53]

MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory

MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory. arXiv:2601.03192 [cs.CL] https://arxiv.org/abs/2601.03192

work page internal anchor Pith review Pith/arXiv arXiv

[54] [54]

Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen. 2025. A Survey on the Memory Mechanism of Large Language Model-based Agents.ACM Trans. Inf. Syst.43, 6, Article 155 (Sept. 2025), 47 pages. doi:10.1145/3748302

work page doi:10.1145/3748302 2025

[55] [55]

Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. 2024. ExpeL: LLM agents are experiential learners. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Confer- ence on Innovative Applications of Artificial Intelligence and Fourteenth Sympo- sium on Educational Advances in Artifici...

work page doi:10.1609/aaai.v38i17.29936 2024

[56] [56]

Longtao Zheng, Rundong Wang, Xinrun Wang, and Bo An. 2024. Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control. arXiv:2306.07863 [cs.AI] https://arxiv.org/abs/2306.07863 EXG: Self-Evolving Agents with Experience Graphs , , A Algorithmic Details Algorithm 2 details the procedure for constructing structured ex- perience hints from a r...

work page arXiv 2024