{"total":15,"items":[{"citing_arxiv_id":"2607.00394","ref_index":37,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"When Classic Cache Policies Fail: Learning-Augmented Replacement for Semantic Retrieval Buffers","primary_cat":"cs.DB","submitted_at":"2026-07-01T03:38:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SOLAR is a learning-augmented policy for semantic cache replacement that achieves constant competitive ratio 3 and 5-75% gains over FIFO on retrieval workloads.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.31650","ref_index":42,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ECHO: Prune to act, trace to learn with selective turn memory in agentic RL","primary_cat":"cs.LG","submitted_at":"2026-06-30T13:29:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ECHO is a selective turn-memory framework for agentic RL that compresses turns into indexed records, selects them for bounded contexts, and uses source indices to assign outcome credit to supporting evidence, reaching 43.4% accuracy on BrowseComp-Plus versus 28.9% for GRPO and 36.1% for SUPO.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.31564","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ACE: Pluggable Adaptive Context Elasticizer across Agents","primary_cat":"cs.AI","submitted_at":"2026-06-30T12:20:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ACE is a pluggable module that elastically orchestrates historical agent steps as raw, abstract, or dropped to maintain compact yet recoverable context for LLM agents handling long trajectories.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11680","ref_index":61,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Organize then Retrieve: Hierarchical Memory Navigation for Efficient Agents","primary_cat":"cs.AI","submitted_at":"2026-06-10T05:49:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HORMA builds a hierarchical memory structure from agent experiences and trains a lightweight RL navigator to retrieve minimal sufficient context, yielding better task performance with at most 22.17% of baseline token usage on ALFWorld, LoCoMo, and LongMemEval.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.10616","ref_index":53,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents","primary_cat":"cs.AI","submitted_at":"2026-06-09T09:15:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"OSL-MR is a learning-augmented framework that casts memory retention as constrained stochastic optimization under partial observability and outperforms heuristic baselines on LoCoMo and LongMemEval.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.10532","ref_index":41,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning","primary_cat":"cs.AI","submitted_at":"2026-06-09T08:03:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ActiveMem proposes a heterogeneous distributed memory framework for LLM agents that separates planning from active memory management, reporting SOTA accuracy with lower overhead on BrowseComp-Plus and GAIA.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.25002","ref_index":20,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"MemMark: State-Evolution Attribution Watermarking for Agent Long-Term Memory Systems","primary_cat":"cs.CR","submitted_at":"2026-05-24T11:04:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MemMark enables snapshot-only attribution for agent long-term memory by embedding signals via keyed distribution-preserving sampling at memory-write decisions, recovering 40-bit payloads with near-baseline utility.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11436","ref_index":23,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty","primary_cat":"cs.CL","submitted_at":"2026-05-12T02:37:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Agent-BRACE improves LLM agent performance on long-horizon partially observable tasks by 5.3-14.5% through a decoupled belief state of verbalized atomic claims with certainty labels that keeps context length constant.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10870","ref_index":52,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Remember the Decision, Not the Description: A Rate-Distortion Framework for Agent Memory","primary_cat":"cs.AI","submitted_at":"2026-05-11T17:20:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Memory for long-horizon agents should preserve distinctions that affect decisions under a fixed budget, not descriptive features, yielding an exact forgetting boundary and a new online learner DeMem with regret guarantees.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"We report cumulative regret, an empirical memory-distortion curve (varying K), and a mismatch-severity sweep interpolating between aligned and misaligned regimes. Full definitions are in Appendix E.1. benchmarks and baselines.We evaluate onLoCoMo[ 22],LongMemEval[ 41], andMemor- yArena[ 10] againstFullContext,RAG[ 18],LangMem[ 15],Mem0[ 3],Zep[ 30],Nemori[ 24], EMem-G[ 52], andMnemis[ 39]. All methods share the same answering backbones (gpt-4o-mini, gpt-4.1-mini), deterministic decoding. Answers are scored by two held-out LLM judges, gpt-4o-mini and gpt-4.1-mini(binary on LoCoMo, graded on LongMemEval); reported numbers average across judges and instances.A human-agreement study on 150 stratified LoCoMo instances confirms that the LLM judges align well with human majority vote (κ= 0."},{"citing_arxiv_id":"2605.12535","ref_index":44,"ref_count":3,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Ghost in the Context: Policy-Carriage Integrity in LLM Agents","primary_cat":"cs.CR","submitted_at":"2026-05-02T18:07:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Protected policy placements in LLM agents maintain integrity under replay pressure on AutoGen and OpenHands traces, unlike task-local placements which show eviction or weakening.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00702","ref_index":185,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory","primary_cat":"cs.CL","submitted_at":"2026-05-01T14:45:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10352","ref_index":49,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents","primary_cat":"cs.AI","submitted_at":"2026-04-11T21:38:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ClawVM introduces a harness-managed virtual memory system for LLM agents that ensures deterministic residency and durability of state under token budgets by using typed pages and validated writeback.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.29493","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"MemFactory: Unified Inference & Training Framework for Agent Memory","primary_cat":"cs.CL","submitted_at":"2026-03-31T09:38:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.23231","ref_index":83,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments","primary_cat":"cs.AI","submitted_at":"2026-03-24T14:04:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PERMA is a new benchmark using temporally ordered events, text variability, and linguistic alignment to evaluate LLM memory agents on persona consistency beyond simple retrieval.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.02547","ref_index":136,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"The Landscape of Agentic Reinforcement Learning for LLMs: A Survey","primary_cat":"cs.AI","submitted_at":"2025-09-02T17:46:26+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Memory Token [131] Explicit Token Structured memory for reasoning disentanglement ReSum†[132] Explicit Token Turn-wise Interaction summary for ReAct agents Context Folding†[133] Explicit Token Context folding for ReAct agents MemoryLLM [134] Latent Token Latent tokens repeatedly integrated and updated M+ [135] Latent Token Scalable memory tokens for long-context tracking IMM [136] Latent Token Decouples word representations and latent memory Memory [137] Latent Token Forget-resistant memory tokens for evolving context MemGen†[138] Latent Token Context-sensitive latent token as memory carriers Structured Memory Zep [139] Temporal Graph Temporal knowledge graph enabling structured retrieval A-MEM [140] Atomic Memory Notes Symbolic atomic memory units; structured storage"}],"limit":50,"offset":0}