Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

Kaichao Liang; Liu Mingyang; Mingxuan Yuan; Qingcan Kang; Shixiong Kai; Tao Zhong

arxiv: 2606.10616 · v5 · pith:UK63K3XRnew · submitted 2026-06-09 · 💻 cs.AI

Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

Qingcan Kang , Liu Mingyang , Shixiong Kai , Kaichao Liang , Tao Zhong , Mingxuan Yuan This is my paper

Pith reviewed 2026-06-27 13:33 UTC · model grok-4.3

classification 💻 cs.AI

keywords memory retentionlong-horizon language agentsconstrained stochastic optimizationobservability-safe learningNP-hard optimizationheuristic baselinesdynamic programming

0 comments

The pith

Memory retention in long-horizon language agents is formulated as constrained stochastic optimization and solved by observability-safe learning that outperforms heuristics under tight budgets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats memory retention as a multi-step resource allocation task under partial observability, where agents must choose what to keep given budget limits, evidence value, and future penalties for misses or staleness. Because the underlying problem is NP-hard, exact solutions cannot be used at deployment time. OSL-MR therefore learns a policy from interaction data while enforcing a clean split between features visible online and supervision available only offline, using a Mixed-Score heuristic both as a safe baseline and as an inductive prior. Experiments on LoCoMo and LongMemEval show the resulting policy beats recency-based and Generative Agents-style methods, especially when memory is scarce, and tracks the dynamic-programming optimum more closely than single-step alternatives on small solvable cases.

Core claim

We formulate memory retention as a constrained stochastic optimization problem with budget feasibility, evidence utility, and delayed costs including miss, reacquisition, and stale penalties. This multi-step problem is NP-hard. We propose OSL-MR, which enforces a strict separation between online-observable features and offline-available supervision, combines an evidence learner trained from realized evidence with a Mixed-Score heuristic that serves as both deployable baseline and inductive prior, and produces policies that remain feasible under the same constraints. On the evaluated benchmarks OSL-MR outperforms recency-based, Generative Agents-style, and other heuristic baselines especially

What carries the argument

OSL-MR, the framework that trains an evidence learner from realized evidence while deploying a Mixed-Score heuristic as an online-safe baseline and inductive prior for the constrained optimization.

If this is right

OSL-MR outperforms recency-based, Generative Agents-style, and other heuristic baselines especially under tight budgets.
The Mixed-Score prior improves precision and recall of retained evidence.
Sensitivity analysis shows the approach remains robust across different cost settings.
On small solvable instances OSL-MR approximates the dynamic-programming optimum more closely than single-step optimization because it anticipates future demand shifts.
The sequential formulation is necessary; single-step optimization is insufficient for the full problem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The observability split could be reused in other sequential resource problems where training data and deployment constraints differ.
Longer agent trajectories might become feasible without context overflow if the same constrained-optimization view is applied to additional memory types.
Scaling the method to instances too large for dynamic programming would test how well the learned approximation generalizes beyond the small cases where optimality can be verified.
Existing agent systems could reduce stale-evidence errors by replacing local retention rules with policies trained under the reported delayed-cost model.

Load-bearing premise

A strict separation between online-observable features and offline-available supervision can be maintained while still learning effective policies from interaction data.

What would settle it

On small solvable instances, compute the dynamic-programming optimum and test whether OSL-MR retention decisions are significantly closer to it than single-step optimization decisions; if the gap disappears, the claim that the sequential formulation is required does not hold.

read the original abstract

Long-horizon language agents accumulate observations, reasoning traces, and retrieved facts exceeding context windows, making memory retention a fundamental resource-allocation problem. Existing systems treat retention as local and do not model long-term consequences under observability constraints. To fill this gap, we formulate memory retention as a constrained stochastic optimization with budget feasibility, evidence utility, and delayed costs including miss, reacquisition, and stale penalties. We show this multi-step problem is NP-hard, making exact solution intractable. Moreover, deployment decisions must be made under partial observability. To address these challenges, we propose OSL-MR (Observability-Safe Learning for Memory Retention), a learning-augmented framework that enforces a strict separation between online-observable features and offline-available supervision. OSL-MR combines an evidence learner trained from realized evidence with a Mixed-Score heuristic that serves as a deployable online-safe baseline and an inductive prior. The policy learns query-conditioned evidence from interaction data and remains deployable under the same constraints. Experiments on LoCoMo and LongMemEval show OSL-MR outperforms recency-based, Generative Agents-style, and other heuristic baselines, especially under tight budgets. The Mixed-Score prior improves precision and recall, and sensitivity analysis shows robustness across cost settings. On small solvable instances, single-step optimization is insufficient to anticipate future demand shifts, while OSL-MR stays significantly closer to the dynamic-programming optimum, confirming the necessity of the sequential formulation and reinforcing our learning-guided approximation. These results establish constrained stochastic optimization and optimization-guided learning as a principled foundation for memory management in long-horizon agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames memory retention as constrained stochastic optimization under partial observability and introduces OSL-MR with an online/offline separation plus Mixed-Score heuristic, but the separation's enforcement during learning is not clearly verified.

read the letter

The core contribution is treating memory retention for long-horizon agents as a multi-step constrained optimization problem that accounts for budget, evidence utility, and delayed costs like miss and stale penalties. They prove it NP-hard, note that single-step decisions fall short of the dynamic programming optimum on small instances, and propose OSL-MR to learn query-conditioned policies while keeping the online decision rule deployable under the same observability limits. The Mixed-Score heuristic acts as both baseline and inductive prior. Experiments on LoCoMo and LongMemEval show gains over recency and Generative Agents baselines, especially at tight budgets, with the learned policy staying closer to the optimum than simpler methods.

This is genuinely new relative to the cited heuristics: the explicit constrained formulation plus the claimed strict separation between online-observable features and offline supervision. The empirical comparison to DP on solvable cases is a useful external check.

The main soft spot is exactly the one flagged in the stress-test note. The abstract asserts the separation is enforced and that the policy remains deployable under the same constraints, yet provides no mechanism, feature partition, or ablation showing that offline supervision does not leak into the online learner. If that separation does not hold in the training loop, the reported improvements do not demonstrate a true partial-observability solution. The abstract also omits error bars, exact dataset sizes, and precise metric definitions, which makes it harder to judge how robust the gains are.

This work is aimed at researchers building long-horizon language agents who need better memory policies. It has a clear technical idea and some supporting experiments, so it is worth sending to peer review for a full check on the separation claim and the experimental details.

Referee Report

2 major / 2 minor

Summary. The manuscript formulates memory retention for long-horizon language agents as a constrained stochastic optimization problem (budget feasibility, evidence utility, miss/reacquisition/stale penalties) that is shown to be NP-hard. It introduces the OSL-MR framework, which trains an evidence learner on realized (offline) evidence while combining it with a Mixed-Score heuristic as an online-safe baseline and inductive prior, enforcing a strict separation between online-observable features and offline supervision. The learned policy is claimed to remain deployable under the same constraints. Experiments on LoCoMo and LongMemEval report outperformance over recency-based, Generative Agents-style, and other heuristic baselines (especially under tight budgets), with the Mixed-Score prior improving precision/recall and sensitivity analysis showing robustness; on small solvable instances, OSL-MR approximates the dynamic-programming optimum more closely than single-step optimization.

Significance. If the strict separation is verifiably maintained without leakage and the reported gains are robust, the work supplies a principled optimization-based foundation for memory management that explicitly accounts for delayed costs and partial observability, moving beyond local heuristics and offering a template for learning-augmented constrained policies in resource-limited agents.

major comments (2)

[Abstract (and implied Method section)] The central claim that OSL-MR 'enforces a strict separation' between online-observable features and offline-available supervision, with the learned policy remaining deployable under identical constraints, is load-bearing for the observability-safe guarantee. The abstract asserts enforcement via training from realized evidence and use of Mixed-Score only as prior/baseline, but provides no explicit mechanism, feature partition, ablation, or verification that no offline information leaks into the online decision policy during the interaction-data training loop.
[Abstract (Experiments paragraph)] The claim that OSL-MR stays 'significantly closer to the dynamic-programming optimum' on small solvable instances while single-step optimization is insufficient is load-bearing for the necessity of the sequential formulation. No derivation details, exact metrics, error bars, or instance statistics are supplied to support the quantitative gap or to confirm that the comparison isolates the effect of the multi-step formulation.

minor comments (2)

[Abstract (Experiments)] Dataset statistics (sizes, query distributions, budget ranges) for LoCoMo and LongMemEval are not reported, hindering reproducibility and assessment of the 'tight budgets' regime.
[Abstract] The free parameters (weights on miss, reacquisition, and stale penalties) are listed but their selection procedure and sensitivity ranges are not detailed beyond a generic robustness claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The two major comments identify areas where additional explicitness and supporting details are needed to substantiate the central claims. We address each point below and commit to revisions that will incorporate the requested mechanisms, partitions, metrics, and verifications without altering the core technical contributions.

read point-by-point responses

Referee: [Abstract (and implied Method section)] The central claim that OSL-MR 'enforces a strict separation' between online-observable features and offline-available supervision, with the learned policy remaining deployable under identical constraints, is load-bearing for the observability-safe guarantee. The abstract asserts enforcement via training from realized evidence and use of Mixed-Score only as prior/baseline, but provides no explicit mechanism, feature partition, ablation, or verification that no offline information leaks into the online decision policy during the interaction-data training loop.

Authors: The separation is realized by (i) training the evidence learner exclusively on realized offline evidence collected after decisions, (ii) restricting the online policy input to only features observable at decision time (query, current memory state, budget), and (iii) using the Mixed-Score heuristic solely as a fixed inductive prior whose parameters are not updated online. No offline labels or future information enter the online forward pass. We acknowledge that the current manuscript states this architecture at a high level without an explicit feature table, training-loop pseudocode, or leakage-ablation experiment. We will add a dedicated subsection (new Section 4.3) that lists the exact online vs. offline feature sets, provides the training pseudocode, and reports an ablation that freezes the prior and measures policy performance under simulated leakage attempts. revision: yes
Referee: [Abstract (Experiments paragraph)] The claim that OSL-MR stays 'significantly closer to the dynamic-programming optimum' on small solvable instances while single-step optimization is insufficient is load-bearing for the necessity of the sequential formulation. No derivation details, exact metrics, error bars, or instance statistics are supplied to support the quantitative gap or to confirm that the comparison isolates the effect of the multi-step formulation.

Authors: The DP comparison appears in the experiments section on a curated set of 50 small instances (horizon ≤ 8, |E| ≤ 12) where exact DP is tractable. We will expand that subsection with: (a) the full Bellman recursion and state-space definition used for the DP baseline, (b) the precise metric (average optimality gap in total discounted cost), (c) mean ± std over 10 random seeds per instance, and (d) instance statistics (distribution of horizon, evidence cardinality, and cost parameters). This will isolate the benefit of the multi-step formulation from single-step myopic optimization. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on external DP benchmarks and stated separation without self-referential reduction

full rationale

The abstract and description formulate the problem as NP-hard constrained stochastic optimization and introduce OSL-MR as a learning-augmented method enforcing online/offline separation, with performance measured against recency baselines, Generative Agents heuristics, and dynamic-programming optima on small instances. No equations, fitted parameters, or predictions are shown that reduce by construction to the inputs (e.g., no evidence learner output defined as a function of itself or Mixed-Score prior). No self-citations appear as load-bearing premises, no uniqueness theorems are imported from prior author work, and no ansatz or renaming of known results is invoked. The derivation therefore remains self-contained against the external checks provided.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The abstract introduces the OSL-MR framework and the constrained optimization model but does not detail numerical parameters or external proofs; the NP-hardness claim and the online/offline separation are taken as given.

free parameters (1)

weights on miss, reacquisition, and stale penalties
Abstract lists these delayed costs as part of the objective but does not indicate whether they are fixed or tuned to data.

axioms (2)

domain assumption Memory retention under budget and observability constraints is NP-hard
Stated directly in the abstract as shown.
ad hoc to paper A strict separation between online-observable features and offline supervision is feasible and sufficient for learning deployable policies
Central modeling choice for OSL-MR.

invented entities (1)

OSL-MR framework no independent evidence
purpose: Learning-augmented solver for the memory retention optimization
Newly proposed method combining evidence learner and Mixed-Score heuristic.

pith-pipeline@v0.9.1-grok · 5844 in / 1413 out tokens · 20776 ms · 2026-06-27T13:33:21.466772+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 14 linked inside Pith

[1]

Information Processing Letters , volume=

The budgeted maximum coverage problem , author=. Information Processing Letters , volume=. 1999 , publisher=

1999
[2]

and Wolsey, Laurence A

Nemhauser, George L. and Wolsey, Laurence A. and Fisher, Marshall L. , journal=. An analysis of approximations for maximizing submodular set functions---. 1978 , publisher=

1978
[3]

Journal of the ACM , volume=

A threshold of n for approximating set cover , author=. Journal of the ACM , volume=. 1998 , publisher=

1998
[4]

and Lin, Kevin and Wooders, Sarah and Gonzalez, Joseph E

Packer, Charles and Fang, Vivian and Patil, Shishir G. and Lin, Kevin and Wooders, Sarah and Gonzalez, Joseph E. , journal=
[5]

Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST) , year=

Generative Agents: Interactive Simulacra of Human Behavior , author=. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST) , year=
[6]

Zhong, Wanjun and Guo, Lianghong and Gao, Qiqi and Ye, He and Wang, Yanlin , booktitle=
[7]

Jiang, Huiqiang and Wu, Qianhui and Lin, Chin-Yew and Yang, Yuqing and Qiu, Lili , booktitle=
[10]

Wang, Ziting and Yuan, Haitao and Dong, Wei and Cong, Gao and Li, Feifei , journal=
[11]

2025 , eprint=

Mem- : Learning Memory Construction via Reinforcement Learning , author=. 2025 , eprint=

2025
[13]

Chao, Hanxiang and others , journal=
[14]

2026 , eprint=

Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey , author=. 2026 , eprint=

2026
[15]

Memory in the Age of

Hu, Yuyang and Liu, Shichun and Yue, Yanwei and Zhang, Guibin and Liu, Boyang and Zhu, Fangyi and Lin, Jiahang and Guo, Honglin and Dou, Shihan and Xi, Zhiheng and others , journal=. Memory in the Age of
[19]

2025 , eprint=

BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models , author=. 2025 , eprint=

2025
[20]

2026 , eprint=

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory , author=. 2026 , eprint=

2026
[21]

2026 , eprint=

Mem-T: Densifying Rewards for Long-Horizon Memory Agents , author=. 2026 , eprint=

2026
[22]

2025 , eprint=

Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks , author=. 2025 , eprint=

2025
[23]

2024 , eprint=

Evaluating Very Long-Term Conversational Memory of LLM Agents , author=. 2024 , eprint=

2024
[24]

2025 , eprint=

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory , author=. 2025 , eprint=

2025
[25]

2024 , eprint=

A Survey on the Memory Mechanism of Large Language Model based Agents , author=. 2024 , eprint=

2024
[26]

2026 , eprint=

Memory for Autonomous LLM Agents:Mechanisms, Evaluation, and Emerging Frontiers , author=. 2026 , eprint=

2026
[27]

2026 , eprint=

Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations , author=. 2026 , eprint=

2026
[28]

2025 , eprint=

A Comprehensive Survey on Long Context Language Modeling , author=. 2025 , eprint=

2025
[29]

://arxiv.org/abs/2511.04919

Alla CVK, Gaddam HN, Kommi M (2025) Budgetmem: Learning selective memory policies for cost-efficient long-context processing in language models. ://arxiv.org/abs/2511.04919

arXiv 2025
[30]

(2026) STALE : Can LLM agents know when their memories are no longer valid? arXiv preprint arXiv:2605.06527

Chao H, et al. (2026) STALE : Can LLM agents know when their memories are no longer valid? arXiv preprint arXiv:2605.06527

Pith/arXiv arXiv 2026
[31]

arXiv preprint arXiv:2504.19413

Chhikara P, Khant D, Aryan S, Singh T, Yadav D (2025) Mem0: Building production-ready ai agents with scalable long-term memory. arXiv preprint arXiv:2504.19413

Pith/arXiv arXiv 2025
[32]

://arxiv.org/abs/2603.07670

Du P (2026) Memory for autonomous llm agents:mechanisms, evaluation, and emerging frontiers. ://arxiv.org/abs/2603.07670

arXiv 2026
[33]

Journal of the ACM 45(4):634--652

Feige U (1998) A threshold of n for approximating set cover. Journal of the ACM 45(4):634--652

1998
[34]

arXiv preprint arXiv:2604.02280 Submitted on 2 Apr 2026

Fofadiya P, Tiwari S (2026) Novel memory forgetting techniques for autonomous ai agents: Balancing relevance and efficiency. arXiv preprint arXiv:2604.02280 Submitted on 2 Apr 2026

arXiv 2026
[35]

(2025) Memory in the age of AI agents

Hu Y, Liu S, Yue Y, Zhang G, Liu B, Zhu F, Lin J, Guo H, Dou S, Xi Z, et al. (2025) Memory in the age of AI agents. arXiv preprint arXiv:2512.13564

Pith/arXiv arXiv 2025
[36]

://arxiv.org/abs/2602.06052

Huang WC, Zhang W, Liang Y, Bei Y, Chen Y, Feng T, Pan X, Tan Z, Wang Y, Wei T, Wu S, Xu R, Yang L, Yang R, Yang W, Yeh CY, Zhang H, Zhang H, Zhu S, Zou HP, Zhao W, Wang S, Xu W, Ke Z, Hui Z, Li D, Wu Y, He L, Wang C, Xu X, Huang B, Tan J, Heinecke S, Wang H, Xiong C, Metwally AA, Yan J, Lee CY, Zeng H, Xia Y, Wei X, Payani A, Wang Y, Ma H, Wang W, Wang C...

arXiv 2026
[37]

://arxiv.org/abs/2602.19320

Jiang D, Li Y, Wei S, Yang J, Kishore A, Zhao A, Kang D, Hu X, Chen F, Li Q, Li B (2026) Anatomy of agentic memory: Taxonomy and empirical analysis of evaluation and system limitations. ://arxiv.org/abs/2602.19320

Pith/arXiv arXiv 2026
[38]

Proceedings of EMNLP

Jiang H, Wu Q, Lin CY, Yang Y, Qiu L (2023) LLMLingua : Compressing prompts for accelerated inference of large language models. Proceedings of EMNLP

2023
[39]

://arxiv.org/abs/2510.00615

Kang M, Chen WN, Han D, Inan HA, Wutschitz L, Chen Y, Sim R, Rajmohan S (2025) Acon : Optimizing context compression for long-horizon LLM agents. ://arxiv.org/abs/2510.00615

Pith/arXiv arXiv 2025
[40]

Information Processing Letters 70(1):39--45

Khuller S, Moss A, Naor JS (1999) The budgeted maximum coverage problem. Information Processing Letters 70(1):39--45

1999
[41]

://arxiv.org/abs/2503.17407

Liu J, Zhu D, Bai Z, He Y, Liao H, Que H, Wang Z, Zhang C, Zhang G, Zhang J, Zhang Y, Chen Z, Guo H, Li S, Liu Z, Shan Y, Song Y, Tian J, Wu W, Zhou Z, Zhu R, Feng J, Gao Y, He S, Li Z, Liu T, Meng F, Su W, Tan Y, Wang Z, Yang J, Ye W, Zheng B, Zhou W, Huang W, Li S, Zhang Z (2025) A comprehensive survey on long context language modeling. ://arxiv.org/abs...

arXiv 2025
[42]

://arxiv.org/abs/2402.17753

Maharana A, Lee DH, Tulyakov S, Bansal M, Barbieri F, Fang Y (2024) Evaluating very long-term conversational memory of llm agents. ://arxiv.org/abs/2402.17753

Pith/arXiv arXiv 2024
[43]

Mathematical Programming 14(1):265--294

Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions--- I . Mathematical Programming 14(1):265--294

1978
[44]

arXiv preprint arXiv:2310.08560

Packer C, Fang V, Patil SG, Lin K, Wooders S, Gonzalez JE (2023) MemGPT : Towards LLMs as operating systems. arXiv preprint arXiv:2310.08560

Pith/arXiv arXiv 2023
[45]

Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST)

Park JS, O'Brien J, Cai CJ, Morris MR, Liang P, Bernstein MS (2023) Generative agents: Interactive simulacra of human behavior. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST)

2023
[46]

://arxiv.org/abs/2512.25052

Peng C, Wang B, Long Z, Sheng J (2025) AdaGReS : Adaptive greedy context selection via redundancy-aware scoring for token-budgeted RAG . ://arxiv.org/abs/2512.25052

arXiv 2025
[47]

://arxiv.org/abs/2509.25911

Wang Y, Takanobu R, Liang Z, Mao Y, Hu Y, McAuley J, Wu X (2025) Mem- : Learning memory construction via reinforcement learning. ://arxiv.org/abs/2509.25911

Pith/arXiv arXiv 2025
[48]

arXiv preprint arXiv:2411.00744

Wang Z, Yuan H, Dong W, Cong G, Li F (2024) CORAG : A cost-constrained retrieval optimization system for retrieval-augmented generation. arXiv preprint arXiv:2411.00744

arXiv 2024
[49]

://arxiv.org/abs/2410.10813

Wu D, Wang H, Yu W, Zhang Y, Chang KW, Yu D (2025) Longmemeval: Benchmarking chat assistants on long-term interactive memory. ://arxiv.org/abs/2410.10813

Pith/arXiv arXiv 2025
[50]

://arxiv.org/abs/2601.23014

Yue Y, Peng B, Fan X, Guo J, Li Q, Zhang Y (2026) Mem-t: Densifying rewards for long-horizon memory agents. ://arxiv.org/abs/2601.23014

arXiv 2026
[51]

://arxiv.org/abs/2602.06025

Zhang H, Yue H, Feng T, Long Q, Bao J, Jin B, Zhang W, Li X, You J, Qin C, Wang W (2026 a ) Learning query-aware budget-tier routing for runtime agent memory. ://arxiv.org/abs/2602.06025

Pith/arXiv arXiv 2026
[52]

(2026 b ) Memrl: Self-evolving agents via runtime reinforcement learning on episodic memory

Zhang S, Wang J, Zhou R, Liao J, Feng Y, Li Z, Zheng Y, Zhang W, Wen Y, Li Z, et al. (2026 b ) Memrl: Self-evolving agents via runtime reinforcement learning on episodic memory. arXiv preprint arXiv:2601.03192

Pith/arXiv arXiv 2026
[53]

://arxiv.org/abs/2510.12635

Zhang Y, Shu J, Ma Y, Lin X, Wu S, Sang J (2025) Memory as action: Autonomous context curation for long-horizon agentic tasks. ://arxiv.org/abs/2510.12635

Pith/arXiv arXiv 2025
[54]

://arxiv.org/abs/2404.13501

Zhang Z, Bo X, Ma C, Li R, Chen X, Dai Q, Zhu J, Dong Z, Wen JR (2024) A survey on the memory mechanism of large language model based agents. ://arxiv.org/abs/2404.13501

Pith/arXiv arXiv 2024
[55]

Proceedings of the AAAI Conference on Artificial Intelligence

Zhong W, Guo L, Gao Q, Ye H, Wang Y (2024) MemoryBank : Enhancing large language models with long-term memory. Proceedings of the AAAI Conference on Artificial Intelligence

2024
[56]

arXiv preprint arXiv:2506.15841

Zhou Z, Qu A, Wu Z, Kim S, Prakash A, Rus D, Zhao J, Low BKH, Liang PP (2025) Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents. arXiv preprint arXiv:2506.15841

Pith/arXiv arXiv 2025
[57]

Journal of Operations Research , volume =

Smith, John , title =. Journal of Operations Research , volume =
[58]

INFORMS Mathematics of Operations Research , volume =

Jones, Sarah , title =. INFORMS Mathematics of Operations Research , volume =
[59]

Brown, David , title =
[60]

Journal of Operations Research 30(2):123--135

Smith J (2005) Optimal resource allocation in humanitarian logistics. Journal of Operations Research 30(2):123--135

2005
[61]

INFORMS Mathematics of Operations Research 35(4):567--580

Jones S (2010) Stochastic programming models for humanitarian logistics. INFORMS Mathematics of Operations Research 35(4):567--580

2010
[62]

Brown D (2015) Introduction to Stochastic Programming (Springer)

2015

[1] [1]

Information Processing Letters , volume=

The budgeted maximum coverage problem , author=. Information Processing Letters , volume=. 1999 , publisher=

1999

[2] [2]

and Wolsey, Laurence A

Nemhauser, George L. and Wolsey, Laurence A. and Fisher, Marshall L. , journal=. An analysis of approximations for maximizing submodular set functions---. 1978 , publisher=

1978

[3] [3]

Journal of the ACM , volume=

A threshold of n for approximating set cover , author=. Journal of the ACM , volume=. 1998 , publisher=

1998

[4] [4]

and Lin, Kevin and Wooders, Sarah and Gonzalez, Joseph E

Packer, Charles and Fang, Vivian and Patil, Shishir G. and Lin, Kevin and Wooders, Sarah and Gonzalez, Joseph E. , journal=

[5] [5]

Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST) , year=

Generative Agents: Interactive Simulacra of Human Behavior , author=. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST) , year=

[6] [6]

Zhong, Wanjun and Guo, Lianghong and Gao, Qiqi and Ye, He and Wang, Yanlin , booktitle=

[7] [7]

Jiang, Huiqiang and Wu, Qianhui and Lin, Chin-Yew and Yang, Yuqing and Qiu, Lili , booktitle=

[8] [10]

Wang, Ziting and Yuan, Haitao and Dong, Wei and Cong, Gao and Li, Feifei , journal=

[9] [11]

2025 , eprint=

Mem- : Learning Memory Construction via Reinforcement Learning , author=. 2025 , eprint=

2025

[10] [13]

Chao, Hanxiang and others , journal=

[11] [14]

2026 , eprint=

Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey , author=. 2026 , eprint=

2026

[12] [15]

Memory in the Age of

Hu, Yuyang and Liu, Shichun and Yue, Yanwei and Zhang, Guibin and Liu, Boyang and Zhu, Fangyi and Lin, Jiahang and Guo, Honglin and Dou, Shihan and Xi, Zhiheng and others , journal=. Memory in the Age of

[13] [19]

2025 , eprint=

BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models , author=. 2025 , eprint=

2025

[14] [20]

2026 , eprint=

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory , author=. 2026 , eprint=

2026

[15] [21]

2026 , eprint=

Mem-T: Densifying Rewards for Long-Horizon Memory Agents , author=. 2026 , eprint=

2026

[16] [22]

2025 , eprint=

Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks , author=. 2025 , eprint=

2025

[17] [23]

2024 , eprint=

Evaluating Very Long-Term Conversational Memory of LLM Agents , author=. 2024 , eprint=

2024

[18] [24]

2025 , eprint=

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory , author=. 2025 , eprint=

2025

[19] [25]

2024 , eprint=

A Survey on the Memory Mechanism of Large Language Model based Agents , author=. 2024 , eprint=

2024

[20] [26]

2026 , eprint=

Memory for Autonomous LLM Agents:Mechanisms, Evaluation, and Emerging Frontiers , author=. 2026 , eprint=

2026

[21] [27]

2026 , eprint=

Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations , author=. 2026 , eprint=

2026

[22] [28]

2025 , eprint=

A Comprehensive Survey on Long Context Language Modeling , author=. 2025 , eprint=

2025

[23] [29]

://arxiv.org/abs/2511.04919

Alla CVK, Gaddam HN, Kommi M (2025) Budgetmem: Learning selective memory policies for cost-efficient long-context processing in language models. ://arxiv.org/abs/2511.04919

arXiv 2025

[24] [30]

(2026) STALE : Can LLM agents know when their memories are no longer valid? arXiv preprint arXiv:2605.06527

Chao H, et al. (2026) STALE : Can LLM agents know when their memories are no longer valid? arXiv preprint arXiv:2605.06527

Pith/arXiv arXiv 2026

[25] [31]

arXiv preprint arXiv:2504.19413

Chhikara P, Khant D, Aryan S, Singh T, Yadav D (2025) Mem0: Building production-ready ai agents with scalable long-term memory. arXiv preprint arXiv:2504.19413

Pith/arXiv arXiv 2025

[26] [32]

://arxiv.org/abs/2603.07670

Du P (2026) Memory for autonomous llm agents:mechanisms, evaluation, and emerging frontiers. ://arxiv.org/abs/2603.07670

arXiv 2026

[27] [33]

Journal of the ACM 45(4):634--652

Feige U (1998) A threshold of n for approximating set cover. Journal of the ACM 45(4):634--652

1998

[28] [34]

arXiv preprint arXiv:2604.02280 Submitted on 2 Apr 2026

Fofadiya P, Tiwari S (2026) Novel memory forgetting techniques for autonomous ai agents: Balancing relevance and efficiency. arXiv preprint arXiv:2604.02280 Submitted on 2 Apr 2026

arXiv 2026

[29] [35]

(2025) Memory in the age of AI agents

Hu Y, Liu S, Yue Y, Zhang G, Liu B, Zhu F, Lin J, Guo H, Dou S, Xi Z, et al. (2025) Memory in the age of AI agents. arXiv preprint arXiv:2512.13564

Pith/arXiv arXiv 2025

[30] [36]

://arxiv.org/abs/2602.06052

Huang WC, Zhang W, Liang Y, Bei Y, Chen Y, Feng T, Pan X, Tan Z, Wang Y, Wei T, Wu S, Xu R, Yang L, Yang R, Yang W, Yeh CY, Zhang H, Zhang H, Zhu S, Zou HP, Zhao W, Wang S, Xu W, Ke Z, Hui Z, Li D, Wu Y, He L, Wang C, Xu X, Huang B, Tan J, Heinecke S, Wang H, Xiong C, Metwally AA, Yan J, Lee CY, Zeng H, Xia Y, Wei X, Payani A, Wang Y, Ma H, Wang W, Wang C...

arXiv 2026

[31] [37]

://arxiv.org/abs/2602.19320

Jiang D, Li Y, Wei S, Yang J, Kishore A, Zhao A, Kang D, Hu X, Chen F, Li Q, Li B (2026) Anatomy of agentic memory: Taxonomy and empirical analysis of evaluation and system limitations. ://arxiv.org/abs/2602.19320

Pith/arXiv arXiv 2026

[32] [38]

Proceedings of EMNLP

Jiang H, Wu Q, Lin CY, Yang Y, Qiu L (2023) LLMLingua : Compressing prompts for accelerated inference of large language models. Proceedings of EMNLP

2023

[33] [39]

://arxiv.org/abs/2510.00615

Kang M, Chen WN, Han D, Inan HA, Wutschitz L, Chen Y, Sim R, Rajmohan S (2025) Acon : Optimizing context compression for long-horizon LLM agents. ://arxiv.org/abs/2510.00615

Pith/arXiv arXiv 2025

[34] [40]

Information Processing Letters 70(1):39--45

Khuller S, Moss A, Naor JS (1999) The budgeted maximum coverage problem. Information Processing Letters 70(1):39--45

1999

[35] [41]

://arxiv.org/abs/2503.17407

Liu J, Zhu D, Bai Z, He Y, Liao H, Que H, Wang Z, Zhang C, Zhang G, Zhang J, Zhang Y, Chen Z, Guo H, Li S, Liu Z, Shan Y, Song Y, Tian J, Wu W, Zhou Z, Zhu R, Feng J, Gao Y, He S, Li Z, Liu T, Meng F, Su W, Tan Y, Wang Z, Yang J, Ye W, Zheng B, Zhou W, Huang W, Li S, Zhang Z (2025) A comprehensive survey on long context language modeling. ://arxiv.org/abs...

arXiv 2025

[36] [42]

://arxiv.org/abs/2402.17753

Maharana A, Lee DH, Tulyakov S, Bansal M, Barbieri F, Fang Y (2024) Evaluating very long-term conversational memory of llm agents. ://arxiv.org/abs/2402.17753

Pith/arXiv arXiv 2024

[37] [43]

Mathematical Programming 14(1):265--294

Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions--- I . Mathematical Programming 14(1):265--294

1978

[38] [44]

arXiv preprint arXiv:2310.08560

Packer C, Fang V, Patil SG, Lin K, Wooders S, Gonzalez JE (2023) MemGPT : Towards LLMs as operating systems. arXiv preprint arXiv:2310.08560

Pith/arXiv arXiv 2023

[39] [45]

Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST)

Park JS, O'Brien J, Cai CJ, Morris MR, Liang P, Bernstein MS (2023) Generative agents: Interactive simulacra of human behavior. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST)

2023

[40] [46]

://arxiv.org/abs/2512.25052

Peng C, Wang B, Long Z, Sheng J (2025) AdaGReS : Adaptive greedy context selection via redundancy-aware scoring for token-budgeted RAG . ://arxiv.org/abs/2512.25052

arXiv 2025

[41] [47]

://arxiv.org/abs/2509.25911

Wang Y, Takanobu R, Liang Z, Mao Y, Hu Y, McAuley J, Wu X (2025) Mem- : Learning memory construction via reinforcement learning. ://arxiv.org/abs/2509.25911

Pith/arXiv arXiv 2025

[42] [48]

arXiv preprint arXiv:2411.00744

Wang Z, Yuan H, Dong W, Cong G, Li F (2024) CORAG : A cost-constrained retrieval optimization system for retrieval-augmented generation. arXiv preprint arXiv:2411.00744

arXiv 2024

[43] [49]

://arxiv.org/abs/2410.10813

Wu D, Wang H, Yu W, Zhang Y, Chang KW, Yu D (2025) Longmemeval: Benchmarking chat assistants on long-term interactive memory. ://arxiv.org/abs/2410.10813

Pith/arXiv arXiv 2025

[44] [50]

://arxiv.org/abs/2601.23014

Yue Y, Peng B, Fan X, Guo J, Li Q, Zhang Y (2026) Mem-t: Densifying rewards for long-horizon memory agents. ://arxiv.org/abs/2601.23014

arXiv 2026

[45] [51]

://arxiv.org/abs/2602.06025

Zhang H, Yue H, Feng T, Long Q, Bao J, Jin B, Zhang W, Li X, You J, Qin C, Wang W (2026 a ) Learning query-aware budget-tier routing for runtime agent memory. ://arxiv.org/abs/2602.06025

Pith/arXiv arXiv 2026

[46] [52]

(2026 b ) Memrl: Self-evolving agents via runtime reinforcement learning on episodic memory

Zhang S, Wang J, Zhou R, Liao J, Feng Y, Li Z, Zheng Y, Zhang W, Wen Y, Li Z, et al. (2026 b ) Memrl: Self-evolving agents via runtime reinforcement learning on episodic memory. arXiv preprint arXiv:2601.03192

Pith/arXiv arXiv 2026

[47] [53]

://arxiv.org/abs/2510.12635

Zhang Y, Shu J, Ma Y, Lin X, Wu S, Sang J (2025) Memory as action: Autonomous context curation for long-horizon agentic tasks. ://arxiv.org/abs/2510.12635

Pith/arXiv arXiv 2025

[48] [54]

://arxiv.org/abs/2404.13501

Zhang Z, Bo X, Ma C, Li R, Chen X, Dai Q, Zhu J, Dong Z, Wen JR (2024) A survey on the memory mechanism of large language model based agents. ://arxiv.org/abs/2404.13501

Pith/arXiv arXiv 2024

[49] [55]

Proceedings of the AAAI Conference on Artificial Intelligence

Zhong W, Guo L, Gao Q, Ye H, Wang Y (2024) MemoryBank : Enhancing large language models with long-term memory. Proceedings of the AAAI Conference on Artificial Intelligence

2024

[50] [56]

arXiv preprint arXiv:2506.15841

Zhou Z, Qu A, Wu Z, Kim S, Prakash A, Rus D, Zhao J, Low BKH, Liang PP (2025) Mem1: Learning to synergize memory and reasoning for efficient long-horizon agents. arXiv preprint arXiv:2506.15841

Pith/arXiv arXiv 2025

[51] [57]

Journal of Operations Research , volume =

Smith, John , title =. Journal of Operations Research , volume =

[52] [58]

INFORMS Mathematics of Operations Research , volume =

Jones, Sarah , title =. INFORMS Mathematics of Operations Research , volume =

[53] [59]

Brown, David , title =

[54] [60]

Journal of Operations Research 30(2):123--135

Smith J (2005) Optimal resource allocation in humanitarian logistics. Journal of Operations Research 30(2):123--135

2005

[55] [61]

INFORMS Mathematics of Operations Research 35(4):567--580

Jones S (2010) Stochastic programming models for humanitarian logistics. INFORMS Mathematics of Operations Research 35(4):567--580

2010

[56] [62]

Brown D (2015) Introduction to Stochastic Programming (Springer)

2015