pith. machine review for the scientific record. sign in

arxiv: 2512.13564 · v2 · submitted 2025-12-15 · 💻 cs.CL · cs.AI

Recognition: 3 theorem links

· Lean Theorem

Memory in the Age of AI Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-11 18:10 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords AI agentsagent memoryfoundation modelsmemory taxonomyfactual memoryexperiential memoryworking memorymemory dynamics
0
0 comments X

The pith

Agent memory research unifies under forms, functions, and dynamics with a new factual-experiential-working taxonomy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

AI agents built on foundation models depend on memory systems that have proliferated rapidly yet remain conceptually scattered across motivations, implementations, and terms. This survey first separates agent memory from nearby ideas such as LLM memory and retrieval-augmented generation. It then applies three consistent lenses—forms, functions, and dynamics—to organize the literature. From the function lens it introduces a finer taxonomy that replaces broad long-term versus short-term labels with factual, experiential, and working memory categories. The result supplies a consolidated reference, a list of benchmarks and open frameworks, and a map of open research directions.

Core claim

This survey delineates the scope of agent memory and examines it through the unified lenses of forms (token-level, parametric, and latent realizations), functions (factual, experiential, and working memory), and dynamics (how memory is formed, evolved, and retrieved). It argues that traditional long/short-term distinctions are insufficient for contemporary agent systems and compiles benchmarks, frameworks, and forward-looking topics such as memory automation, reinforcement-learning integration, multimodal memory, multi-agent memory, and trustworthiness to support memory as a first-class design primitive.

What carries the argument

The three lenses of forms, functions, and dynamics, with the function-based taxonomy that distinguishes factual, experiential, and working memory.

Load-bearing premise

The distinctions among forms, functions, and dynamics form a complete, non-overlapping classification that meaningfully reduces fragmentation in the existing literature.

What would settle it

A later systematic mapping of published agent systems that shows most implementations still fall outside the factual-experiential-working categories or require substantial overlap would falsify the taxonomy's claimed unifying power.

read the original abstract

Memory has emerged, and will continue to remain, a core capability of foundation model-based agents. As research on agent memory rapidly expands and attracts unprecedented attention, the field has also become increasingly fragmented. Existing works that fall under the umbrella of agent memory often differ substantially in their motivations, implementations, and evaluation protocols, while the proliferation of loosely defined memory terminologies has further obscured conceptual clarity. Traditional taxonomies such as long/short-term memory have proven insufficient to capture the diversity of contemporary agent memory systems. This work aims to provide an up-to-date landscape of current agent memory research. We begin by clearly delineating the scope of agent memory and distinguishing it from related concepts such as LLM memory, retrieval augmented generation (RAG), and context engineering. We then examine agent memory through the unified lenses of forms, functions, and dynamics. From the perspective of forms, we identify three dominant realizations of agent memory, namely token-level, parametric, and latent memory. From the perspective of functions, we propose a finer-grained taxonomy that distinguishes factual, experiential, and working memory. From the perspective of dynamics, we analyze how memory is formed, evolved, and retrieved over time. To support practical development, we compile a comprehensive summary of memory benchmarks and open-source frameworks. Beyond consolidation, we articulate a forward-looking perspective on emerging research frontiers, including memory automation, reinforcement learning integration, multimodal memory, multi-agent memory, and trustworthiness issues. We hope this survey serves not only as a reference for existing work, but also as a conceptual foundation for rethinking memory as a first-class primitive in the design of future agentic intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper is a survey on memory systems for foundation model-based AI agents. It argues that the field is fragmented with proliferating terminologies and that traditional long/short-term memory distinctions are insufficient. The authors delineate the scope of agent memory from related concepts such as LLM memory, RAG, and context engineering; propose taxonomies organized by forms (token-level, parametric, latent), functions (factual, experiential, working), and dynamics (formation, evolution, retrieval); compile benchmarks and open-source frameworks; and outline future frontiers including memory automation, RL integration, multimodal memory, multi-agent memory, and trustworthiness issues.

Significance. If the taxonomy is adopted, the survey could meaningfully consolidate a rapidly expanding area by supplying a unified organizational lens that better captures contemporary agent memory systems than prior distinctions. The explicit compilation of benchmarks and frameworks provides immediate practical utility for researchers and developers, while the forward-looking section on emerging frontiers offers a useful roadmap. These elements position the work as a potential reference point for treating memory as a first-class design primitive in agentic systems.

minor comments (2)
  1. [Abstract] The abstract states that the survey compiles 'a comprehensive summary of memory benchmarks and open-source frameworks' but does not indicate selection criteria or coverage scope; adding a short methods paragraph or table in the main text would improve reproducibility and transparency of the consolidation effort.
  2. [Scope delineation] The scope delineation from RAG and context engineering is conceptually useful; a concise comparative table (e.g., in the introduction) listing key differences in motivation, implementation, and evaluation would enhance clarity without altering the central argument.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and constructive review, which highlights the potential of the proposed taxonomy, benchmark compilation, and future directions to consolidate the agent memory literature. We appreciate the recommendation for minor revision.

Circularity Check

0 steps flagged

No significant circularity; survey taxonomy is externally grounded

full rationale

This is a survey paper whose central contribution is an organizational taxonomy of agent memory drawn from analysis of external literature. It delineates scope against related concepts (LLM memory, RAG, context engineering), identifies forms (token-level/parametric/latent), proposes functions (factual/experiential/working), and examines dynamics without any equations, fitted parameters, predictions, or derivations. No load-bearing step reduces to self-definition, self-citation chains, or renaming of known results by construction; the distinctions are explicitly motivated as a response to fragmentation in prior work. The paper is self-contained against external benchmarks and compiles summaries of existing frameworks rather than deriving new results from its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

This is a survey that consolidates and reorganizes existing research rather than deriving new results from first principles or introducing new entities. It relies on the domain assumption that current literature is sufficiently fragmented to require a new taxonomy.

axioms (2)
  • domain assumption Traditional taxonomies such as long/short-term memory are insufficient to capture the diversity of contemporary agent memory systems
    Explicitly stated in the abstract as motivation for the new framework.
  • domain assumption Agent memory is distinct from LLM memory, RAG, and context engineering
    Stated as the starting point for delineating scope.

pith-pipeline@v0.9.0 · 5769 in / 1437 out tokens · 115660 ms · 2026-05-11T18:10:58.826346+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 44 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values

    cs.AI 2026-05 unverdicted novelty 8.0

    Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.

  2. Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

    cs.CR 2026-05 unverdicted novelty 8.0

    Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying ...

  3. ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

    cs.AI 2026-05 conditional novelty 7.0

    ClawForge supplies a generator that turns scenario templates into reproducible command-line tasks testing state conflict handling, where the strongest frontier model scores only 45.3 percent strict accuracy.

  4. EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

    cs.LG 2026-05 unverdicted novelty 7.0

    EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-ben...

  5. ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles

    cs.AI 2026-05 unverdicted novelty 7.0

    ScioMind combines anchoring-based belief updates, hierarchical memory, and dynamic profiles in LLM multi-agent systems to produce more stable, diverse, and psychologically aligned opinion trajectories than prior fixed...

  6. Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory

    cs.AI 2026-05 unverdicted novelty 7.0

    Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.

  7. When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory

    cs.AI 2026-05 unverdicted novelty 7.0

    A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.

  8. SRTJ: Self-Evolving Rule-Driven Training-Free LLM Jailbreaking

    cs.CR 2026-05 unverdicted novelty 7.0

    SRTJ is a training-free jailbreak method that evolves hierarchical attack rules using iterative verifier feedback and ASP-based constraint-aware composition to achieve stable high success rates on HarmBench across mul...

  9. Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

    cs.CL 2026-05 unverdicted novelty 7.0

    MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

  10. HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents

    cs.CL 2026-04 unverdicted novelty 7.0

    HeLa-Mem is a graph-based memory architecture for LLM agents that applies Hebbian learning to episodic associations and distills hubs into semantic knowledge, yielding better results on long-context benchmarks with fe...

  11. When to Forget: A Memory Governance Primitive

    cs.AI 2026-04 unverdicted novelty 7.0

    Memory Worth converges almost surely to the conditional probability of task success given memory retrieval and correlates at rho=0.89 with ground-truth utility in controlled experiments.

  12. MemGround: Long-Term Memory Evaluation Kit for Large Language Models in Gamified Scenarios

    cs.CL 2026-03 unverdicted novelty 7.0

    MemGround is a new benchmark that evaluates LLMs' long-term memory through gamified tasks assessing surface state, temporal association, and reasoning memory.

  13. CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

    cs.CL 2026-03 unverdicted novelty 7.0

    CLAG organizes agent memory into clusters via an SLM router and uses cluster profiles for two-stage retrieval, yielding better answer quality on QA benchmarks than prior memory systems.

  14. Automated Reformulation of Robust Optimization via Memory-Augmented Large Language Models

    cs.AI 2026-05 unverdicted novelty 6.0

    AutoREM augments LLMs with a structured memory of failed reformulation trajectories to improve accuracy and efficiency on robust optimization tasks without parameter updates or expert knowledge.

  15. HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution

    cs.AI 2026-05 unverdicted novelty 6.0

    HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.

  16. MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

    cs.CR 2026-05 unverdicted novelty 6.0

    MemPrivacy uses edge detection of sensitive spans and type-aware placeholders to enable cloud-side memory management for LLM agents without exposing private data, achieving under 1.6% utility loss.

  17. MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

    cs.CR 2026-05 unverdicted novelty 6.0

    MemPrivacy uses edge-side privacy span detection and semantic placeholders to enable cloud memory management for LLM agents while limiting utility loss to 1.6% and outperforming masking baselines.

  18. MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

    cs.CR 2026-05 unverdicted novelty 6.0

    MemPrivacy replaces privacy-sensitive spans with structured placeholders on edge devices to enable effective cloud memory management while limiting utility loss to 1.6% and outperforming general models on privacy extraction.

  19. Tree-based Credit Assignment for Multi-Agent Memory System

    cs.MA 2026-05 unverdicted novelty 6.0

    TreeMem assigns credit to agents in multi-agent memory systems by expanding outputs into a tree and using Monte Carlo averaging of final rewards to optimize each agent's policy.

  20. What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis

    cs.AI 2026-05 unverdicted novelty 6.0

    Circuit analysis reveals that routing circuits for agent memory emerge at 0.6B parameters while content circuits emerge at 4B, with a shared grounding hub and an unsupervised diagnostic achieving 76.2% accuracy for lo...

  21. What Happens Inside Agent Memory? Circuit Analysis from Emergence to Diagnosis

    cs.AI 2026-05 unverdicted novelty 6.0

    In LLM agents, memory routing circuits emerge at 0.6B scale while content circuits appear only at 4B, and write/read operations recruit a pre-existing late-layer context hub instead of creating a new one, enabling a 7...

  22. MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents

    cs.CL 2026-05 unverdicted novelty 6.0

    A lightweight supervised router using frozen-LLM embeddings for memory admission decisions outperforms LLM-based memory managers in both F1 score and latency on the LoCoMo benchmark.

  23. From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction

    cs.AI 2026-04 unverdicted novelty 6.0

    Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.

  24. Contextual Agentic Memory is a Memo, Not True Memory

    cs.AI 2026-04 unverdicted novelty 6.0

    Agentic memory is lookup-based retrieval, not weight-based consolidation, creating a generalization ceiling on novel tasks and structural vulnerability to memory poisoning.

  25. EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory

    cs.CV 2026-04 unverdicted novelty 6.0

    EviMem improves accuracy on temporal and multi-hop questions in long-term conversational memory by iteratively diagnosing and filling evidence gaps, achieving 81.6% and 85.2% judge accuracy on LoCoMo at 4.5x lower lat...

  26. Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents

    cs.AI 2026-04 unverdicted novelty 6.0

    Memanto delivers 89.8% and 87.1% accuracy on LongMemEval and LoCoMo benchmarks using typed semantic memory and information-theoretic retrieval, outperforming hybrid graph and vector systems with a single query and zer...

  27. Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents

    cs.AI 2026-04 conditional novelty 6.0

    The Experience Compression Spectrum unifies memory, skills, and rules in LLM agents along increasing compression levels and identifies the absence of adaptive cross-level compression as the missing diagonal.

  28. GAM: Hierarchical Graph-based Agentic Memory for LLM Agents

    cs.AI 2026-04 unverdicted novelty 6.0

    GAM decouples event-level memory encoding from topic-level consolidation in LLM agents using hierarchical graphs to reduce interference and improve long-term coherence and retrieval.

  29. Quantifying Trust: Financial Risk Management for Trustworthy AI Agents

    cs.AI 2026-04 unverdicted novelty 6.0

    The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.

  30. MemFactory: Unified Inference & Training Framework for Agent Memory

    cs.CL 2026-03 unverdicted novelty 6.0

    MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.

  31. Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses

    cs.CR 2026-03 unverdicted novelty 6.0

    The survey organizes over 400 papers on embodied AI safety into a multi-level taxonomy and flags overlooked issues such as fragile multimodal fusion and unstable planning under jailbreaks.

  32. LiSA: Lifelong Safety Adaptation via Conservative Policy Induction

    cs.LG 2026-05 unverdicted novelty 5.0

    LiSA improves AI guardrails lifelong by inducing conservative policies from sparse noisy failure reports via structured memory, conflict-aware rules, and posterior lower-bound gating.

  33. MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading

    cs.CL 2026-05 unverdicted novelty 5.0

    MemReread improves agent long-context reasoning by triggering rereading on insufficient final memory to recover discarded indirect facts, outperforming baselines at linear complexity.

  34. A Semantic Autonomy Framework for VLM-Integrated Indoor Mobile Robots: Hybrid Deterministic Reasoning and Cross-Robot Adaptive Memory

    cs.RO 2026-05 unverdicted novelty 5.0

    The Semantic Autonomy Stack combines a seven-step parametric resolver handling 88% of instructions in under 0.1 ms with VLM escalation and a five-category cross-robot memory system, achieving 100% accuracy and 103,000...

  35. Towards Self-Improving Error Diagnosis in Multi-Agent Systems

    cs.MA 2026-04 unverdicted novelty 5.0

    ErrorProbe introduces a self-improving pipeline for attributing semantic failures in LLM multi-agent systems to specific agents and steps via anomaly detection, backward tracing, and tool-grounded validation with veri...

  36. Towards Scalable Lightweight GUI Agents via Multi-role Orchestration

    cs.AI 2026-04 unverdicted novelty 5.0

    LAMO uses role-oriented data synthesis and two-stage training (perplexity-weighted supervised fine-tuning plus reinforcement learning) to create scalable lightweight GUI agents that support both single-model and multi...

  37. On the Creativity of AI Agents

    cs.CY 2026-04 unverdicted novelty 5.0

    LLM agents produce outputs that meet basic functional criteria for creativity but lack the process-level, social, and personal elements required for ontological creativity.

  38. Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems

    cs.SE 2026-04 unverdicted novelty 5.0

    Claude Code centers on a model-tool while-loop surrounded by permission systems, context compaction, extensibility hooks, subagent delegation, and session storage; the same design questions yield different answers in ...

  39. Retrieval Is Not Enough: Why Organizational AI Needs Epistemic Infrastructure

    cs.AI 2026-04 unverdicted novelty 5.0

    OIDA adds typed knowledge objects, decay-based importance scores, contradiction edges, and an inverse-decay QUESTION primitive for ignorance to raise epistemic fidelity beyond retrieval.

  40. MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought

    cs.MA 2026-04 unverdicted novelty 5.0

    MemCoT redefines long-context reasoning as iterative stateful search with zoom-in/zoom-out memory perception and dual short-term memories, claiming SOTA results on LoCoMo and LongMemEval-S benchmarks.

  41. MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents

    cs.AI 2026-04 unverdicted novelty 5.0

    MemMachine stores entire conversational episodes and applies contextualized retrieval plus adaptive query routing to achieve 0.9169 accuracy on LoCoMo and 93 percent on LongMemEvalS while using 80 percent fewer tokens...

  42. Reliable AI Needs to Externalize Implicit Knowledge: A Human-AI Collaboration Perspective

    cs.AI 2026-05 unverdicted novelty 4.0

    Reliable AI needs structured Knowledge Objects to externalize and enable human validation of implicit knowledge that current methods cannot verify.

  43. Memory as Metabolism: A Design for Companion Knowledge Systems

    cs.AI 2026-04 unverdicted novelty 4.0

    This paper designs a companion knowledge system with TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, and AUDIT operations plus memory gravity and minority-hypothesis retention to give contradictory evidence a path to updat...

  44. Back to Basics: Let Conversational Agents Remember with Just Retrieval and Generation

    cs.CL 2026-04 unverdicted novelty 4.0

    A minimalist retrieval-and-generation framework using turn isolation and query-driven pruning outperforms complex memory systems by directly addressing signal sparsity and dual-level redundancy in dialogues.

Reference graph

Works this paper leans on

300 extracted references · 300 canonical work pages · cited by 41 Pith papers · 10 internal anchors

  1. [1]

    Detoxifying Large Language Models via Knowledge Editing , booktitle =

    Mengru Wang and Ningyu Zhang and Ziwen Xu and Zekun Xi and Shumin Deng and Yunzhi Yao and Qishen Zhang and Linyi Yang and Jindong Wang and Huajun Chen , editor =. Detoxifying Large Language Models via Knowledge Editing , booktitle =. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.171 , timestamp =

  2. [2]

    Neighboring Perturbations of Knowledge Editing on Large Language Models , booktitle =

    Jun. Neighboring Perturbations of Knowledge Editing on Large Language Models , booktitle =. 2024 , url =

  3. [3]

    Editing Personality For Large Language Models , booktitle =

    Shengyu Mao and Xiaohan Wang and Mengru Wang and Yong Jiang and Pengjun Xie and Fei Huang and Ningyu Zhang , editor =. Editing Personality For Large Language Models , booktitle =. 2024 , url =. doi:10.1007/978-981-97-9434-8\_19 , timestamp =

  4. [4]

    Manning , title =

    Eric Mitchell and Charles Lin and Antoine Bosselut and Chelsea Finn and Christopher D. Manning , title =. The Tenth International Conference on Learning Representations,. 2022 , url =

  5. [5]

    The Twelfth International Conference on Learning Representations,

    Guangxuan Xiao and Yuandong Tian and Beidi Chen and Song Han and Mike Lewis , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

  6. [6]

    Zep: A Temporal Knowledge Graph Architecture for Agent Memory

    Preston Rasmussen and Pavlo Paliychuk and Travis Beauvais and Jack Ryan and Daniel Chalef , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2501.13956 , eprinttype =. 2501.13956 , timestamp =

  7. [7]

    Lightning attention-2: A free lunch for handling unlimited sequence lengths in large language models.arXiv preprint arXiv:2401.04658, 2024

    Zhen Qin and Weigao Sun and Dong Li and Xuyang Shen and Weixuan Sun and Yiran Zhong , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2401.04658 , eprinttype =. 2401.04658 , timestamp =

  8. [8]

    Forty-first International Conference on Machine Learning,

    Zhen Qin and Weigao Sun and Dong Li and Xuyang Shen and Weixuan Sun and Yiran Zhong , title =. Forty-first International Conference on Machine Learning,. 2024 , url =

  9. [9]

    FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision , booktitle =

    Jay Shah and Ganesh Bikshandi and Ying Zhang and Vijay Thakkar and Pradeep Ramani and Tri Dao , editor =. FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision , booktitle =. 2024 , url =

  10. [10]

    The Twelfth International Conference on Learning Representations,

    Tri Dao , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

  11. [11]

    Rethinking Attention with Performers , booktitle =

    Krzysztof Marcin Choromanski and Valerii Likhosherstov and David Dohan and Xingyou Song and Andreea Gane and Tam. Rethinking Attention with Performers , booktitle =. 2021 , url =

  12. [12]

    Big Bird: Transformers for Longer Sequences , booktitle =

    Manzil Zaheer and Guru Guruganesh and Kumar Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Onta. Big Bird: Transformers for Longer Sequences , booktitle =. 2020 , url =

  13. [13]
  14. [14]

    A Machine with Short-Term, Episodic, and Semantic Memory Systems , booktitle =

    Taewoon Kim and Michael Cochez and Vincent Fran. A Machine with Short-Term, Episodic, and Semantic Memory Systems , booktitle =. 2023 , url =. doi:10.1609/AAAI.V37I1.25075 , timestamp =

  15. [15]

    McAuley , title =

    Yu Wang and Xinshuang Liu and Xiusi Chen and Sean O'Brien and Junda Wu and Julian J. McAuley , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

  16. [16]

    Agents: An open-source framework for autonomous lan- guage agents

    Wangchunshu Zhou and Yuchen Eleanor Jiang and Long Li and Jialong Wu and Tiannan Wang and Shi Qiu and Jintian Zhang and Jing Chen and Ruipu Wu and Shuai Wang and Shiding Zhu and Jiyu Chen and Wentao Zhang and Ningyu Zhang and Huajun Chen and Peng Cui and Mrinmaya Sachan , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2309.07870 , eprinttype =...

  17. [17]

    OA gents: An Empirical Study of Building Effective Agents

    Zhu, He and Qin, Tianrui and Zhu, King and Huang, Heyuan and Guan, Yeyi and Xia, Jinxiang and Li, Hanhao and Yao, Yi and Wang, Ningning and Liu, Pai and Peng, Tianhao and Gui, Xin and Xiaowan, Li and Liu, Yuhui and Tang, Xiangru and Yang, Jian and Zhang, Ge and Gao, Xitong and Jiang, Yuchen Eleanor and Zhang, Changwang and Wang, Jun and Liu, Jiaheng and Z...

  18. [18]

    2025 , eprint=

    Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution , author=. 2025 , eprint=

  19. [19]

    2025 , eprint=

    Towards Personalized Deep Research: Benchmarks and Evaluations , author=. 2025 , eprint=

  20. [20]

    2025 , eprint=

    Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL , author=. 2025 , eprint=

  21. [21]

    2023 , eprint=

    RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text , author=. 2023 , eprint=

  22. [22]

    2024 , eprint=

    AI PERSONA: Towards Life-long Personalization of LLMs , author=. 2024 , eprint=

  23. [23]

    Symbolic learning enables self-evolving agents

    Wangchunshu Zhou and Yixin Ou and Shengwei Ding and Long Li and Jialong Wu and Tiannan Wang and Jiamin Chen and Shuai Wang and Xiaohua Xu and Ningyu Zhang and Huajun Chen and Yuchen Eleanor Jiang , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2406.18532 , eprinttype =. 2406.18532 , timestamp =

  24. [24]

    2025 , eprint=

    EvoVLA: Self-Evolving Vision-Language-Action Model , author=. 2025 , eprint=

  25. [25]

    2025 , eprint=

    O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents , author=. 2025 , eprint=

  26. [26]

    Mlp memory: A retriever-pretrained memory for large language models, 2026.https://arxiv.org/abs/2508.01832

    Rubin Wei and Jiaqi Cao and Jiarui Wang and Jushi Kai and Qipeng Guo and Bowen Zhou and Zhouhan Lin , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2508.01832 , eprinttype =. 2508.01832 , timestamp =

  27. [27]

    2025 , eprint=

    Pretraining with hierarchical memories: separating long-tail and common knowledge , author=. 2025 , eprint=

  28. [28]

    Character-LLM:

    Yunfan Shao and Linyang Li and Junqi Dai and Xipeng Qiu , editor =. Character-LLM:. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,. 2023 , url =. doi:10.18653/V1/2023.EMNLP-MAIN.814 , timestamp =

  29. [29]

    CharacterGLM: Customizing social characters with large language models

    Jinfeng Zhou and Zhuang Chen and Dazhen Wan and Bosi Wen and Yi Song and Jifan Yu and Yongkang Huang and Pei Ke and Guanqun Bi and Libiao Peng and Jiaming Yang and Xiyao Xiao and Sahand Sabour and Xiaohan Zhang and Wenjing Hou and Yijia Zhang and Yuxiao Dong and Hongning Wang and Jie Tang and Minlie Huang , editor =. CharacterGLM: Customizing Social Chara...

  30. [30]

    2025 , eprint=

    Agent Learning via Early Experience , author=. 2025 , eprint=

  31. [31]

    Scaling agents via continual pre-training.arXiv preprint arXiv:2509.13310, 2025

    Liangcai Su and Zhen Zhang and Guangyu Li and Zhuo Chen and Chenxi Wang and Maojia Song and Xinyu Wang and Kuan Li and Jialong Wu and Xuanzhong Chen and Zile Qiao and Zhongwang Zhang and Huifeng Yin and Shihao Cai and Runnan Fang and Zhengwei Tao and Wenbiao Yin and Chenxiong Qian and Yong Jiang and Pengjun Xie and Fei Huang and Jingren Zhou , title =. Co...

  32. [32]

    Online Adaptation of Language Models with a Memory of Amortized Contexts , booktitle =

    Jihoon Tack and Jaehyung Kim and Eric Mitchell and Jinwoo Shin and Yee Whye Teh and Jonathan Richard Schwarz , editor =. Online Adaptation of Language Models with a Memory of Amortized Contexts , booktitle =. 2024 , url =

  33. [33]

    Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems.arXiv preprint arXiv:2504.01990, 2025

    Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems , author=. arXiv preprint arXiv:2504.01990 , year=

  34. [34]

    2025 , eprint=

    From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery , author=. 2025 , eprint=

  35. [35]

    Tool learning with large language models: a survey , volume=

    Qu, Changle and Dai, Sunhao and Wei, Xiaochi and Cai, Hengyi and Wang, Shuaiqiang and Yin, Dawei and Xu, Jun and Wen, Ji-rong , year=. Tool learning with large language models: a survey , volume=. Frontiers of Computer Science , publisher=. doi:10.1007/s11704-024-40678-2 , number=

  36. [36]

    2025 , eprint=

    Reinforcement Learning for Reasoning in Large Language Models with One Training Example , author=. 2025 , eprint=

  37. [37]

    2025 , eprint=

    A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems , author=. 2025 , eprint=

  38. [38]

    2024 , eprint=

    Agent AI: Surveying the Horizons of Multimodal Interaction , author=. 2024 , eprint=

  39. [39]

    2024 , eprint=

    Agents in Software Engineering: Survey, Landscape, and Vision , author=. 2024 , eprint=

  40. [40]

    2025 , eprint=

    Deep Research: A Survey of Autonomous Research Agents , author=. 2025 , eprint=

  41. [41]

    2025 , eprint=

    A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications , author=. 2025 , eprint=

  42. [42]

    2025 , eprint=

    Large Language Model Agent: A Survey on Methodology, Applications and Challenges , author=. 2025 , eprint=

  43. [43]

    Zhao, Andrew and Wu, Yiran and Yue, Yang and Wu, Tong and Xu, Quentin and Lin, Matthieu and Wang, Shenzhi and Wu, Qingyun and Zheng, Zilong and Huang, Gao , journal=

  44. [44]

    2025 , eprint=

    Large Language Models: A Survey , author=. 2025 , eprint=

  45. [45]

    2025 , eprint=

    A Survey on Large Language Models with some Insights on their Capabilities and Limitations , author=. 2025 , eprint=

  46. [46]

    Huang, Chengsong and Yu, Wenhao and Wang, Xiaoyang and Zhang, Hongming and Li, Zongxia and Li, Ruosen and Huang, Jiaxin and Mi, Haitao and Yu, Dong , journal=

  47. [47]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

  48. [48]

    First Conference on Language Modeling , year=

    Autogen: Enabling next-gen LLM applications via multi-agent conversations , author=. First Conference on Language Modeling , year=

  49. [49]

    Wang, Yingxu and Liu, Siwei and Fang, Jinyuan and Meng, Zaiqiao , journal=

  50. [50]

    Advances in Neural Information Processing Systems , year =

    Shuofei Qiao and Runnan Fang and Ningyu Zhang and Yuqi Zhu and Xiang Chen and Shumin Deng and Yong Jiang and Pengjun Xie and Fei Huang and Huajun Chen , title =. Advances in Neural Information Processing Systems , year =

  51. [51]

    arXiv preprint arXiv:2507.21407 , year=

    Graph-Augmented Large Language Model Agents: Current Progress and Future Prospects , author=. arXiv preprint arXiv:2507.21407 , year=

  52. [52]

    Yu, Miao and Wang, Shilong and Zhang, Guibin and Mao, Junyuan and Yin, Chenlong and Liu, Qijiong and Wen, Qingsong and Wang, Kun and Wang, Yang , journal=

  53. [53]

    Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

    Yanwei Yue and Guibin Zhang and Boyang Liu and Guancheng Wan and Kun Wang and Dawei Cheng and Yiyan Qi , title =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

  54. [54]

    Zhang, Guibin and Fu, Muxin and Wan, Guancheng and Yu, Miao and Wang, Kun and Yan, Shuicheng , journal=

  55. [55]

    Liu, Siwei and Fang, Jinyuan and Zhou, Han and Wang, Yingxu and Meng, Zaiqiao , journal=

  56. [56]

    arXiv preprint arXiv:2506.10408 , year=

    Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges , author=. arXiv preprint arXiv:2506.10408 , year=

  57. [57]

    Liao, Junwei and Wen, Muning and Wang, Jun and Zhang, Weinan , journal=

  58. [58]

    Ozdaglar and Kaiqing Zhang and Joo

    Chanwoo Park and Seungju Han and Xingzhi Guo and Asuman E. Ozdaglar and Kaiqing Zhang and Joo. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

  59. [59]

    Zhao, Wanjia and Yuksekgonul, Mert and Wu, Shirley and Zou, James , journal=

  60. [60]

    Survey of

    Sarkar, Anjana and Sarkar, Soumyendu , journal=. Survey of

  61. [61]

    CoRR , year=

    Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement , author=. CoRR , year=

  62. [62]

    Yang, Yingxuan and Chai, Huacan and Shao, Shuai and Song, Yuanyi and Qi, Siyuan and Rui, Renting and Zhang, Weinan , journal=

  63. [63]

    A survey of

    Yang, Yingxuan and Chai, Huacan and Song, Yuanyi and Qi, Siyuan and Wen, Muning and Li, Ning and Liao, Junwei and Hu, Haoyi and Lin, Jianghao and Chang, Gaowei and others , journal=. A survey of

  64. [64]

    Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik R and Cao, Yuan , year =

  65. [65]

    2025 , booktitle=

    Multi-agent Architecture Search via Agentic Supernet , author=. 2025 , booktitle=

  66. [66]

    Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems , year =

    Zhang, Guibin and Yue, Yanwei and Li, Zhixun and Yun, Sukwon and Wan, Guancheng and Wang, Kun and Cheng, Dawei and Yu, Jeffrey Xu and Chen, Tianlong , booktitle=. Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems , year =

  67. [67]

    Marft: Multi-agent reinforcement fine-tuning.arXiv preprint arXiv:2504.16129, 2025

    Marft: Multi-agent reinforcement fine-tuning , author=. arXiv preprint arXiv:2504.16129 , year=

  68. [68]

    Findings of the Association for Computational Linguistics , pages =

    Weize Chen and Jiarui Yuan and Chen Qian and Cheng Yang and Zhiyuan Liu and Maosong Sun , title =. Findings of the Association for Computational Linguistics , pages =

  69. [69]

    PromptWizard: Task-aware prompt optimization framework, 2024

    PromptWizard: Task-aware prompt optimization framework , author=. arXiv preprint arXiv:2405.18369 , year=

  70. [70]

    arXiv preprint arXiv:2504.03723 , year=

    Vflow: Discovering optimal agentic workflows for verilog generation , author=. arXiv preprint arXiv:2504.03723 , year=

  71. [71]

    Motwani, Sumeet Ramesh and Smith, Chandler and Das, Rocktim Jyoti and Rafailov, Rafael and Laptev, Ivan and Torr, Philip HS and Pizzati, Fabio and Clark, Ronald and de Witt, Christian Schroeder , journal=

  72. [72]

    Subramaniam, Y

    Multiagent finetuning: Self improvement with diverse reasoning chains , author=. arXiv preprint arXiv:2501.05707 , year=

  73. [73]

    Darwin G

    Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents , author=. arXiv preprint arXiv:2505.22954 , year=

  74. [74]

    2025 , eprint=

    Huxley-Godel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine , author=. 2025 , eprint=

  75. [75]

    Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages =

    Hanwei Xu and Yujun Chen and Yulun Du and Nan Shao and Yanggang Wang and Haiyu Li and Zhilin Yang , title =. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages =

  76. [76]

    Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages =

    Archiki Prasad and Peter Hase and Xiang Zhou and Mohit Bansal , editor =. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics , pages =

  77. [77]

    Findings of the Association for Computational Linguistics , pages =

    Rui Pan and Shuo Xing and Shizhe Diao and Wenhe Sun and Xiang Liu and Kashun Shum and Jipeng Zhang and Renjie Pi and Tong Zhang , title =. Findings of the Association for Computational Linguistics , pages =

  78. [78]

    Automatic Engineering of Long Prompts , booktitle =

    Cho. Automatic Engineering of Long Prompts , booktitle =

  79. [79]

    Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages =

    Yao Lu and Jiayi Wang and Raphael Tang and Sebastian Riedel and Pontus Stenetorp , title =. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages =

  80. [80]

    Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =

    Yongchao Chen and Jacob Arkin and Yilun Hao and Yang Zhang and Nicholas Roy and Chuchu Fan , title =. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , pages =

Showing first 80 references.