pith. machine review for the scientific record. sign in

arxiv: 2510.12635 · v3 · submitted 2025-10-14 · 💻 cs.AI

Recognition: unknown

Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks

Authors on Pith no claims yet
classification 💻 cs.AI
keywords contextmanagementmemorydynamiclong-horizonmemactmodelsoptimization
0
0 comments X
read the original abstract

Long-context Large Language Models, despite their expanded capacity, require careful working memory management to mitigate attention dilution during long-horizon tasks. Yet existing approaches rely on external mechanisms that lack awareness of the agent's reasoning state, leading to suboptimal decisions. We propose Memory-as-Action (MemAct), a framework that treats working memory management as learnable policy actions. By formulating context management as in-place editing operations (deletion, insertion), MemAct enables joint optimization of information retention and task performance through end-to-end reinforcement learning. To address the computational challenges of dynamic context updates, we introduce Dynamic Context Policy Optimization, which restores training efficiency without compromising reasoning integrity. Experiments show that MemAct-RL-14B matches the accuracy of models $16\times$ larger while reducing average context length by 51\%, with learned strategies that adapt to model capabilities and generalize across task complexities.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

    cs.CL 2026-05 unverdicted novelty 8.0

    Agent-BRACE improves LLM agent performance on long-horizon partially observable tasks by 5.3-14.5% through a decoupled belief state of verbalized atomic claims with certainty labels that keeps context length constant.

  2. Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory

    cs.AI 2026-05 unverdicted novelty 7.0

    Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.

  3. Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

    cs.CL 2026-05 unverdicted novelty 7.0

    MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

  4. ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents

    cs.AI 2026-04 unverdicted novelty 7.0

    ClawVM introduces a harness-managed virtual memory system for LLM agents that ensures deterministic residency and durability of state under token budgets by using typed pages and validated writeback.

  5. Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly

    cs.CR 2026-05 unverdicted novelty 6.0

    Policy directives can be lost during context assembly in language model agents, leading to unprompted policy violations that SafeContext can partially prevent.

  6. MemFactory: Unified Inference & Training Framework for Agent Memory

    cs.CL 2026-03 unverdicted novelty 6.0

    MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.