hub

Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks

· 2025 · cs.AI · DOI 10.48550/arxiv.2510.12635 · arXiv 2510.12635

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

open full Pith review browse 17 citing papers arXiv PDF

abstract

Long-context Large Language Models, despite their expanded capacity, require careful working memory management to mitigate attention dilution during long-horizon tasks. Yet existing approaches rely on external mechanisms that lack awareness of the agent's reasoning state, leading to suboptimal decisions. We propose Memory-as-Action (MemAct), a framework that treats working memory management as learnable policy actions. By formulating context management as in-place editing operations (deletion, insertion), MemAct enables joint optimization of information retention and task performance through end-to-end reinforcement learning. To address the computational challenges of dynamic context updates, we introduce Dynamic Context Policy Optimization, which restores training efficiency without compromising reasoning integrity. Experiments show that MemAct-RL-14B matches the accuracy of models $16\times$ larger while reducing average context length by 51\%, with learned strategies that adapt to model capabilities and generalize across task complexities.

hub tools

JSON dossier citing papers JSON publisher DOI arXiv source

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

Agent-BRACE improves LLM agent performance on long-horizon partially observable tasks by 5.3-14.5% through a decoupled belief state of verbalized atomic claims with certainty labels that keeps context length constant.

When Classic Cache Policies Fail: Learning-Augmented Replacement for Semantic Retrieval Buffers

cs.DB · 2026-07-01 · unverdicted · novelty 7.0

SOLAR is a learning-augmented policy for semantic cache replacement that achieves constant competitive ratio 3 and 5-75% gains over FIFO on retrieval workloads.

Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

cs.AI · 2026-06-09 · unverdicted · novelty 7.0

OSL-MR is a learning-augmented framework that casts memory retention as constrained stochastic optimization under partial observability and outperforms heuristic baselines on LoCoMo and LongMemEval.

MemMark: State-Evolution Attribution Watermarking for Agent Long-Term Memory Systems

cs.CR · 2026-05-24 · unverdicted · novelty 7.0

MemMark enables snapshot-only attribution for agent long-term memory by embedding signals via keyed distribution-preserving sampling at memory-write decisions, recovering 40-bit payloads with near-baseline utility.

Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents

cs.AI · 2026-04-11 · unverdicted · novelty 7.0

ClawVM introduces a harness-managed virtual memory system for LLM agents that ensures deterministic residency and durability of state under token budgets by using typed pages and validated writeback.

PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

cs.AI · 2026-03-24 · unverdicted · novelty 7.0

PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.

What to Keep, What to Forget: A Rate--Distortion View of Memory Compaction in LLMs and Agents

cs.LG · 2026-07-09 · conditional · novelty 6.0

KV-cache eviction, prompt compression, recurrent state bounding, and agent memory consolidation are unified as one rate-distortion problem with a shared lower bound, shared failure mode, and transferable mechanisms.

ECHO: Prune To Act, Trace To Learn With Selective Turn Memory In Agentic RL

cs.LG · 2026-06-30 · conditional · novelty 6.0

ECHO stores each completed agent turn as a source-indexed memory, selects memories for bounded context, and uses those same indices to route positive RL credit, improving long-horizon tool-use accuracy.

ACE: Pluggable Adaptive Context Elasticizer across Agents

cs.AI · 2026-06-30 · unverdicted · novelty 6.0

ACE is a pluggable module that elastically orchestrates historical agent steps as raw, abstract, or dropped to maintain compact yet recoverable context for LLM agents handling long trajectories.

Organize then Retrieve: Hierarchical Memory Navigation for Efficient Agents

cs.AI · 2026-06-10 · unverdicted · novelty 6.0

HORMA builds a hierarchical memory structure from agent experiences and trains a lightweight RL navigator to retrieve minimal sufficient context, yielding better task performance with at most 22.17% of baseline token usage on ALFWorld, LoCoMo, and LongMemEval.

MemFactory: Unified Inference & Training Framework for Agent Memory

cs.CL · 2026-03-31 · unverdicted · novelty 6.0

MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

cs.AI · 2025-09-02 · accept · novelty 6.0

Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.

Cortex: A Bidirectionally Aligned Embodied Agent Framework for Long-horizon Manipulation

cs.RO · 2026-07-06 · conditional · novelty 5.0

A dual-system framework with a structured subtask interface, event-balanced training, and inference harness enables VLM-guided long-horizon robotic manipulation, achieving 95.5% on LIBERO-Long and 65% on real-world chemistry tasks.

ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning

cs.AI · 2026-06-09 · unverdicted · novelty 5.0

ActiveMem proposes a heterogeneous distributed memory framework for LLM agents that separates planning from active memory management, reporting SOTA accuracy with lower overhead on BrowseComp-Plus and GAIA.

Ghost in the Context: Policy-Carriage Integrity in LLM Agents

cs.CR · 2026-05-02 · unverdicted · novelty 5.0 · 3 refs

Protected policy placements in LLM agents maintain integrity under replay pressure on AutoGen and OpenHands traces, unlike task-local placements which show eviction or weakening.

citing papers explorer

Showing 17 of 17 citing papers.

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty cs.CL · 2026-05-12 · unverdicted · none · ref 23 · internal anchor
Agent-BRACE improves LLM agent performance on long-horizon partially observable tasks by 5.3-14.5% through a decoupled belief state of verbalized atomic claims with certainty labels that keeps context length constant.
When Classic Cache Policies Fail: Learning-Augmented Replacement for Semantic Retrieval Buffers cs.DB · 2026-07-01 · unverdicted · none · ref 37 · internal anchor
SOLAR is a learning-augmented policy for semantic cache replacement that achieves constant competitive ratio 3 and 5-75% gains over FIFO on retrieval workloads.
Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents cs.AI · 2026-06-09 · unverdicted · none · ref 53 · internal anchor
OSL-MR is a learning-augmented framework that casts memory retention as constrained stochastic optimization under partial observability and outperforms heuristic baselines on LoCoMo and LongMemEval.
MemMark: State-Evolution Attribution Watermarking for Agent Long-Term Memory Systems cs.CR · 2026-05-24 · unverdicted · none · ref 20 · internal anchor
MemMark enables snapshot-only attribution for agent long-term memory by embedding signals via keyed distribution-preserving sampling at memory-write decisions, recovering 40-bit payloads with near-baseline utility.
Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory cs.AI · 2026-05-11 · unverdicted · none · ref 52 · internal anchor
Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory cs.CL · 2026-05-01 · unverdicted · none · ref 185 · internal anchor
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents cs.AI · 2026-04-11 · unverdicted · none · ref 49 · internal anchor
ClawVM introduces a harness-managed virtual memory system for LLM agents that ensures deterministic residency and durability of state under token budgets by using typed pages and validated writeback.
PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments cs.AI · 2026-03-24 · unverdicted · none · ref 83 · internal anchor
PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.
What to Keep, What to Forget: A Rate--Distortion View of Memory Compaction in LLMs and Agents cs.LG · 2026-07-09 · conditional · none · ref 151 · internal anchor
KV-cache eviction, prompt compression, recurrent state bounding, and agent memory consolidation are unified as one rate-distortion problem with a shared lower bound, shared failure mode, and transferable mechanisms.
ECHO: Prune To Act, Trace To Learn With Selective Turn Memory In Agentic RL cs.LG · 2026-06-30 · conditional · none · ref 42 · internal anchor
ECHO stores each completed agent turn as a source-indexed memory, selects memories for bounded context, and uses those same indices to route positive RL credit, improving long-horizon tool-use accuracy.
ACE: Pluggable Adaptive Context Elasticizer across Agents cs.AI · 2026-06-30 · unverdicted · none · ref 15 · internal anchor
ACE is a pluggable module that elastically orchestrates historical agent steps as raw, abstract, or dropped to maintain compact yet recoverable context for LLM agents handling long trajectories.
Organize then Retrieve: Hierarchical Memory Navigation for Efficient Agents cs.AI · 2026-06-10 · unverdicted · none · ref 61 · internal anchor
HORMA builds a hierarchical memory structure from agent experiences and trains a lightweight RL navigator to retrieve minimal sufficient context, yielding better task performance with at most 22.17% of baseline token usage on ALFWorld, LoCoMo, and LongMemEval.
MemFactory: Unified Inference & Training Framework for Agent Memory cs.CL · 2026-03-31 · unverdicted · none · ref 18 · internal anchor
MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey cs.AI · 2025-09-02 · accept · none · ref 136 · internal anchor
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
Cortex: A Bidirectionally Aligned Embodied Agent Framework for Long-horizon Manipulation cs.RO · 2026-07-06 · conditional · none · ref 30 · internal anchor
A dual-system framework with a structured subtask interface, event-balanced training, and inference harness enables VLM-guided long-horizon robotic manipulation, achieving 95.5% on LIBERO-Long and 65% on real-world chemistry tasks.
ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning cs.AI · 2026-06-09 · unverdicted · none · ref 41 · internal anchor
ActiveMem proposes a heterogeneous distributed memory framework for LLM agents that separates planning from active memory management, reporting SOTA accuracy with lower overhead on BrowseComp-Plus and GAIA.
Ghost in the Context: Policy-Carriage Integrity in LLM Agents cs.CR · 2026-05-02 · unverdicted · none · ref 44 · 3 links · internal anchor
Protected policy placements in LLM agents maintain integrity under replay pressure on AutoGen and OpenHands traces, unlike task-local placements which show eviction or weakening.

Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer