arxiv: 2604.12007 · v1 · submitted 2026-04-13 · 💻 cs.AI

Recognition: unknown

When to Forget: A Memory Governance Primitive

Baris Simsek

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:02 UTC · model grok-4.3

classification 💻 cs.AI

keywords memory governanceagent memorymemory worthconditional success probabilitystaleness detectionretrieval suppressionoutcome feedbackconvergence proof

0 comments

The pith

A two-counter signal per memory converges almost surely to the probability of task success given its retrieval.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Memory Worth as an operational metric for deciding which memories to trust, suppress, or discard in agent systems. It defines the signal using two simple counters that accumulate successes and failures whenever a memory is retrieved and an episode outcome is observed. Under a stationary retrieval process with enough exploration, the ratio of these counters converges to the conditional probability that the task succeeds whenever that memory appears in the retrieved set. This gives agents a lightweight, outcome-driven way to detect staleness without relying on static scores or external judgments. The approach is validated in synthetic tasks where true utilities are known and in a text-retrieval micro-experiment showing stale items dropping while useful ones stay high.

Core claim

Memory Worth (MW) maintains two counters per memory unit: one incremented on successful outcomes when the memory is retrieved and one on failures. The estimator is the ratio of the success counter to the total. The paper proves that MW converges almost surely to p+(m) = Pr[y_t = +1 | m in M_t] under a stationary retrieval regime with a minimum exploration condition. The quantity p+(m) measures associational co-occurrence of the memory with success rather than any causal contribution of the memory itself. The estimator uses only scalar counters and can be added to any architecture that already records retrieval events and episode outcomes.

What carries the argument

Memory Worth, the two-counter per-memory estimator of conditional success probability given retrieval.

If this is right

Memory systems can suppress retrieval of items whose MW falls below a chosen threshold.
Deprecation policies can remove memories that remain below the low-value threshold for a sustained period.
The same counters provide a running estimate that updates automatically as the agent's task distribution shifts.
Only retrieval logs and binary outcome signals are needed, keeping the overhead to two scalars per memory.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

In non-stationary environments the convergence guarantee would not hold, suggesting the need for exponential decay or windowed counters as a practical extension.
The associational nature of p+(m) means MW could be combined with intervention experiments to test whether a memory actually causes the observed outcomes.
High-MW memories could be prioritized in retrieval ranking to improve sample efficiency in downstream learning.
The method might generalize to continuous-valued outcomes by replacing the binary counters with summed rewards.

Load-bearing premise

Retrieval must follow a stationary distribution with a minimum amount of exploration so that every memory is sampled often enough for the counters to converge.

What would settle it

Run a controlled stationary retrieval process with known exploration rate and known true conditional success probabilities for each memory; after many episodes the MW values should match those probabilities within sampling error, and any systematic deviation would falsify the convergence claim.

Figures

Figures reproduced from arXiv: 2604.12007 by Baris Simsek.

**Figure 1.** Figure 1: Memory Worth calibration over episodes. Spearman ρ between MWT (m) and U ∗ (m) for all four weighting strategies, averaged over 20 seeds (shaded regions: ±1 std). The no-feedback baseline stays at ρ = 0 throughout. All three updating strategies converge to ρ ≈ 0.89 by episode 10,000 [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: Experiment 2: Task-difficulty confound (20 seeds, mean ± std). Global MW (orange) remains negatively correlated with true utility (ρ ≈ −0.33) because specialist memories (U ∗ = 0.85) appear only on low-success hard tasks and are penalised by task difficulty. A weighted-average conditional MW (yellow) fails equally: mixing easy- and hard-task counts preserves the same base-rate confound. When MW is conditi… view at source ↗

**Figure 4.** Figure 4: Experiment 4: Per-memory MW trajectories for anchor (U ∗ = 0.90) and hitchhiker (U ∗ = 0.05) under varying independence fractions. Dashed lines mark true U ∗ . With 0% independent retrievals, both memories converge to identical MW ≈ 0.49, indistinguishable despite a 17× difference in true utility. Separation requires at least 30% independent episodes, and full convergence to true utility only with 100% i… view at source ↗

**Figure 5.** Figure 5: Experiment 5: MW trajectories in a text-based retrieval agent (all-MiniLM-L6-v2 embedding retrieval, 3,000 episodes, 20 seeds, mean ± std). The stale memory (dashed orange) peaks at ≈ 0.97 in Phase 1, drops sharply after episode 100, crosses θL = 0.40 near episode 300, and ends at 0.17—well into the low-value category, with no sign of convergence, warranting deprecation. The specialist (solid green) sta… view at source ↗

read the original abstract

Agent memory systems accumulate experience but currently lack a principled operational metric for memory quality governance -- deciding which memories to trust, suppress, or deprecate as the agent's task distribution shifts. Write-time importance scores are static; dynamic management systems use LLM judgment or structural heuristics rather than outcome feedback. This paper proposes Memory Worth (MW): a two-counter per-memory signal that tracks how often a memory co-occurs with successful versus failed outcomes, providing a lightweight, theoretically grounded foundation for staleness detection, retrieval suppression, and deprecation decisions. We prove that MW converges almost surely to the conditional success probability p+(m) = Pr[y_t = +1 | m in M_t] -- the probability of task success given that memory m is retrieved -- under a stationary retrieval regime with a minimum exploration condition. Importantly, p+(m) is an associational quantity, not a causal one: it measures outcome co-occurrence rather than causal contribution. We argue this is still a useful operational signal for memory governance, and we validate it empirically in a controlled synthetic environment where ground-truth utility is known: after 10,000 episodes, the Spearman rank-correlation between Memory Worth and true utilities reaches rho = 0.89 +/- 0.02 across 20 independent seeds, compared to rho = 0.00 for systems that never update their assessments. A retrieval-realistic micro-experiment with real text and neural embedding retrieval (all-MiniLM-L6-v2) further shows stale memories crossing the low-value threshold (MW = 0.17) while specialist memories remain high-value (MW = 0.77) across 3,000 episodes. The estimator requires only two scalar counters per memory unit and can be added to architectures that already log retrievals and episode outcomes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Memory Worth (MW), a lightweight two-counter per-memory primitive that accumulates co-occurrences with successful versus failed episode outcomes. It proves that MW converges almost surely to the associational quantity p+(m) = Pr[y_t = +1 | m in M_t] under a stationary retrieval regime plus a minimum exploration condition, and reports empirical Spearman correlations of 0.89 with ground-truth utilities in a synthetic environment plus differentiation of stale versus specialist memories in a neural-embedding micro-experiment.

Significance. If the convergence result and empirical validation hold, MW supplies a simple, parameter-free, outcome-driven signal that can be added to any architecture already logging retrievals and rewards. This addresses a genuine gap between static write-time scores and heuristic or LLM-based dynamic memory management, with clear potential for staleness detection and deprecation policies. The explicit caveat that p+ is associational rather than causal is a positive feature that appropriately bounds expectations.

major comments (2)

[§1 and convergence theorem] §1 (Introduction) and the convergence theorem (presumably §4): the motivation centers on shifting task distributions that perturb retrieval frequencies, yet the almost-sure convergence guarantee is stated only for a stationary retrieval regime. In non-stationary regimes MW may lag or converge to a stale value, directly undermining the staleness-detection and deprecation use cases that motivate the work. A tracking-error bound or non-stationary extension is required for the central claim to support the intended applications.
[Empirical validation] Empirical section (synthetic environment): the reported rho = 0.89 +/- 0.02 is obtained after 10,000 episodes, but it is unclear whether the simulation enforces the minimum exploration condition required by the theorem or how ground-truth utilities are defined independently of the MW counters. Without these details the experiment cannot be confirmed to validate the theorem's assumptions rather than merely showing correlation under favorable conditions.

minor comments (2)

The two-counter update rule would benefit from an explicit equation (e.g., MW_t(m) = N_+(m) / (N_+(m) + N_-(m))) placed in the main text rather than only in prose.
The micro-experiment with all-MiniLM-L6-v2 embeddings would be clearer if the exact retrieval threshold and episode-outcome logging protocol were stated.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback highlighting the scope of the theoretical result and the need for greater clarity in the empirical validation. We address each major comment below, indicating the revisions we will incorporate.

read point-by-point responses

Referee: [§1 and convergence theorem] §1 (Introduction) and the convergence theorem (presumably §4): the motivation centers on shifting task distributions that perturb retrieval frequencies, yet the almost-sure convergence guarantee is stated only for a stationary retrieval regime. In non-stationary regimes MW may lag or converge to a stale value, directly undermining the staleness-detection and deprecation use cases that motivate the work. A tracking-error bound or non-stationary extension is required for the central claim to support the intended applications.

Authors: We agree that the almost-sure convergence is proven only under a stationary retrieval regime with the minimum exploration condition. The introduction motivates the work via shifting distributions, and MW is intended to serve as an online signal that updates with new outcomes; a shift in task distribution will cause MW to drift from its previously converged value, providing a detectable change for staleness. However, we do not derive a tracking-error bound or non-stationary extension, which would require additional assumptions on shift rates. We will add explicit discussion in the revised §1 and §5 clarifying the stationary assumption and noting that MW functions as a tracking estimator whose updates can flag distribution changes. revision: partial
Referee: [Empirical validation] Empirical section (synthetic environment): the reported rho = 0.89 +/- 0.02 is obtained after 10,000 episodes, but it is unclear whether the simulation enforces the minimum exploration condition required by the theorem or how ground-truth utilities are defined independently of the MW counters. Without these details the experiment cannot be confirmed to validate the theorem's assumptions rather than merely showing correlation under favorable conditions.

Authors: Ground-truth utilities are defined as fixed, independent per-memory success probabilities p_true(m) that are used to sample episode outcomes y_t ~ Bernoulli(p_true(m)) upon retrieval; these p_true values are set at environment initialization and never depend on the MW counters, which begin at zero. The minimum exploration condition is enforced by ensuring each memory is retrieved at least once every fixed interval with probability epsilon > 0, satisfying the theorem hypothesis. We will revise the experimental section to document these design choices, include pseudocode verifying the condition holds across the 10,000 episodes, and state the independent definition of p_true. revision: yes

standing simulated objections not resolved

A formal tracking-error bound or non-stationary extension of the convergence theorem under shifting task distributions.

Circularity Check

0 steps flagged

MW estimator defined from counters converges to explicitly stated conditional probability via standard LLN; no reduction to inputs by construction

full rationale

The paper defines Memory Worth directly from two per-memory counters (success and failure co-occurrences) and proves almost-sure convergence to the independently defined associational quantity p+(m) = Pr[y_t = +1 | m in M_t] under stationarity plus minimum exploration. This is a standard application of the strong law of large numbers to the indicator sequence and does not equate the result to its own inputs or fitted parameters. No self-citation chains, ansatzes, or uniqueness theorems are invoked for the central claim. The empirical sections compare MW against ground-truth utilities in controlled environments, supplying independent content. The stationarity assumption is explicitly flagged as required, consistent with the derivation rather than hidden.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the definition of MW from two counters and the convergence theorem that invokes a stationary retrieval regime plus minimum exploration; no free parameters are fitted and no new physical entities are postulated.

axioms (1)

domain assumption stationary retrieval regime with a minimum exploration condition
Invoked to guarantee almost-sure convergence of MW to p+(m)

invented entities (1)

Memory Worth (MW) no independent evidence
purpose: Operational signal for memory quality governance based on success/failure co-occurrence
New two-counter primitive introduced in the paper; independent evidence is the reported empirical correlation rather than external falsification

pith-pipeline@v0.9.0 · 5613 in / 1197 out tokens · 53617 ms · 2026-05-10T16:02:27.635820+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 7 canonical work pages · 3 internal anchors

[1]

A., Gillhofer, M., Widrich, M., Unterthiner, T., Brandstetter, J., and Hochreiter, S

Arjona-Medina, J. A., Gillhofer, M., Widrich, M., Unterthiner, T., Brandstetter, J., and Hochreiter, S. RUDDER : Return decomposition for delayed rewards. In Advances in Neural Information Processing Systems, volume 32, pp.\ 13544--13555, 2019

2019
[2]

and Heyde, C

Hall, P. and Heyde, C. C. Martingale Limit Theory and Its Application. Academic Press, New York, 1980

1980
[3]

G., Piot, B., Heess, N., van Hasselt , H., Wayne, G., Singh, S., Precup, D., and Munos, R

Harutyunyan, A., Dabney, W., Mesnard, T., Azar, M. G., Piot, B., Heess, N., van Hasselt , H., Wayne, G., Singh, S., Precup, D., and Munos, R. Hindsight credit assignment. In Advances in Neural Information Processing Systems, volume 32, pp.\ 12467--12476, 2019

2019
[4]

Steps toward artificial intelligence

Minsky, M. Steps toward artificial intelligence. Proceedings of the IRE, 49 0 (1): 0 8--30, 1961. https://doi.org/10.1109/JRPROC.1961.287775

work page doi:10.1109/jrproc.1961.287775 1961
[5]

MemGPT: Towards LLMs as Operating Systems

Packer, C., Wooders, S., Lin, K., Fang, V., Patil, S. G., Stoica, I., and Gonzalez, J. E. MemGPT : Towards LLM s as operating systems. arXiv preprint arXiv:2310.08560, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

O’Brien, Carrie J

Park, J. S., O'Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., and Bernstein, M. S. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, UIST '23, pp.\ 1--22. ACM, 2023. https://doi.org/10.1145/3586183.3606763

work page doi:10.1145/3586183.3606763 2023
[7]

Reflexion: Language agents with verbal reinforcement learning

Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., and Yao, S. Reflexion: Language agents with verbal reinforcement learning. In Advances in Neural Information Processing Systems, volume 36, pp.\ 8634--8652, 2023

2023
[8]

Memory that knows what it knows: Feedback-learned confidence and credit assignment for agent memory

Simsek, B. Memory that knows what it knows: Feedback-learned confidence and credit assignment for agent memory. Under review, 2025

2025
[9]

Sutton, R. S. and Barto, A. G. Reinforcement Learning: An Introduction. The MIT Press, Cambridge, Massachusetts, 2nd edition, 2018. ISBN 978-0-262-03924-6

2018
[10]

A-MEM: Agentic Memory for LLM Agents

Xu, W., Zhu, Z., Zhan, Z., Wang, Z., Yang, Y., Zhou, J., and Liang, X. A-MEM : Agentic memory for LLM agents. In Advances in Neural Information Processing Systems, volume 38, 2025. arXiv:2502.12110

work page internal anchor Pith review arXiv 2025
[11]

Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, and Ji-Rong Wen

Zhang, G., and co-authors. Adaptive memory admission control for LLM agents. arXiv preprint arXiv:2603.04549. Cited as 2026 reflecting the arXiv submission date; not yet formally published, 2026

work page arXiv 2026
[12]

MemoryBank : Enhancing large language models with long-term memory

Zhong, W., Guo, L., Gao, Q., Ye, H., and Wang, Y. MemoryBank : Enhancing large language models with long-term memory. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.\ 19724--19731, 2024

2024
[13]

Evaluating memory in llm agents via incremental multi-turn interactions.arXiv preprint arXiv:2507.05257, 2025

Hu, Y., Liu, S., and Yue, Y. Evaluating memory in LLM agents via incremental multi-turn interactions. arXiv preprint arXiv:2507.05257, 2025

work page arXiv 2025
[14]

Memory in the Age of AI Agents

Zhang, G., and co-authors. Memory in the age of AI agents: A survey. arXiv preprint arXiv:2512.13564, 2025

work page internal anchor Pith review arXiv 2025
[15]

D., Raghavan, P., and Sch\"utze, H

Manning, C. D., Raghavan, P., and Sch\"utze, H. Introduction to Information Retrieval. Cambridge University Press, 2008

2008
[16]

and Gurevych, I

Reimers, N. and Gurevych, I. Sentence- BERT : Sentence embeddings using S iamese BERT -networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pp.\ 3982--3992, 2019

2019