Memory Contagion: Cross-Temporal Propagation of Evaluator Bias via Agent Memory

Zewen Liu

arxiv: 2606.23195 · v2 · pith:T3QOWOWRnew · submitted 2026-06-22 · 💻 cs.LG · cs.AI· cs.CL

Memory Contagion: Cross-Temporal Propagation of Evaluator Bias via Agent Memory

Zewen Liu This is my paper

Pith reviewed 2026-06-26 08:40 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CL

keywords memory contagionLLM agentsevaluator biasbias propagationagent memorycross-temporal effectsconsolidation

0 comments

The pith

Biased evaluator experiences can propagate through shared agent memory to shape the behavior of future agents even under perfect consolidation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language model agents use memory to sustain coherence across interactions. The paper establishes that when early agents receive guidance from biased evaluators, their stored trajectories carry that bias forward. Later agents retrieving from the same memory exhibit the bias themselves, as shown for length preference on older models at contamination levels down to 20 percent. Authority bias shows no such transfer in controlled tests. The result indicates that current memory designs can transmit certain evaluator effects across time without requiring faulty consolidation.

Core claim

Memory Contagion is the cross-temporal propagation of evaluator bias through agent memory. When agents trained or guided by biased evaluators produce trajectories that are stored and consolidated, the bias transfers to future agents that retrieve from the shared store. Experiments across length preference and authority bias, four phases, and multiple models demonstrate that length bias propagates on older models even with oracle consolidation, while authority bias produces zero propagation in all fifteen multi-seed runs, and the effect is absent below certain model generations.

What carries the argument

Memory Contagion, the mechanism by which biased agent trajectories enter a shared memory store and later alter the outputs of agents that retrieve from it.

If this is right

Length preference bias transfers to future agents on vulnerable models even when consolidation is perfect.
Contagion occurs at contamination rates as low as 20 percent with no observed safe threshold.
Authority bias does not propagate through the tested memory architectures in any of the fifteen experiments.
Newer models can remain unaffected while older models exhibit the effect.
Memory systems without bias filtering allow cross-temporal transfer of at least some evaluator biases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Shared memory pools used by successive agents may accumulate and amplify certain biases over multiple generations.
Auditing or filtering steps before consolidation could interrupt the propagation path for length-type biases.
The model-generation dependence suggests that base-model updates might change whether contagion occurs for a given bias.
Similar experiments with other common evaluator biases could map which ones cross temporal boundaries.

Load-bearing premise

Observed differences in later agent behavior result only from the biased content stored in memory rather than from differences in prompts, retrieval processes, or model handling of the inputs.

What would settle it

A controlled run in which agents given access to length-biased memory at p=0.2 produce identical behavior distributions to agents given clean memory under matched retrieval and prompting conditions.

Figures

Figures reproduced from arXiv: 2606.23195 by Zewen Liu.

**Figure 1.** Figure 1: Phase 2.5: Core ablation. ΓA (oracle consolidation) is significantly greater than 0 for length bias (p < 0.01, permutation test, 3 seeds). ΓB = 2.03 shows 84% attenuation via LLM consolidation. Authority bias (ΓA = 0.00, 15 controlled runs) does not propagate; † see [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: Phase 3: Mechanism decomposition (E1–E3) and debiasing evaluation (E4). E1 (content) and E2 [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Phase 4: Dose-response analysis. Memory Contagion is detected at contamination rates as low as [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Large Language Model (LLM) agents increasingly rely on memory systems to maintain long-term coherence. Recent work shows that agent memories degrade during continuous consolidation. However, existing research assumes memories are derived from unbiased experiences. In this work, we identify and formalize a novel phenomenon: Memory Contagion -- the cross-temporal propagation of evaluator bias through agent memory. We show that when agents are trained or guided by biased evaluators, their experiences become biased; when these trajectories are stored and consolidated into memory, the bias propagates to future agents retrieving from the same memory store, even when consolidation is perfect (oracle). Across two bias types (length preference, authority bias) and four experimental phases, we demonstrate: (1) Memory Contagion occurs for length bias even with perfect consolidation on older models (Gamma_A = 13.18, DeepSeek V4-Chat), while newer models (V4-Pro, Claude) are immune, proving both that biased input is a sufficient cause and that contagion is model-generation-dependent; (2) authority bias fails to propagate in all 15 controlled multi-seed experiments (Gamma_A = 0.00), revealing that not all evaluator biases can cross temporal boundaries through current memory architectures; (3) No observed safe threshold: length bias propagation is detected at contamination rates as low as p=0.2. Our findings expose a critical but contingent vulnerability in current agent memory designs and provide formal tools for measuring cross-temporal bias propagation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags a plausible vulnerability in agent memory but the experiments do not isolate memory content as the cause.

read the letter

The main takeaway is that the authors describe a cross-temporal bias effect they call Memory Contagion, where evaluator bias in past trajectories leaks into future agents via shared memory. They report it appears for length bias on older models even under oracle consolidation, vanishes for authority bias, and shows no safe threshold down to 20% contamination. Newer models appear unaffected.

What stands out is the attempt to test two distinct bias types across model generations and to note that the effect is not universal. That distinction is useful and worth checking. The work also tries to measure propagation with a Gamma_A metric and multi-seed runs.

The soft spot is exactly the one raised in the stress-test. The design does not appear to hold retrieval, prompt templates, and model fixed while varying only the memory store contents. Observed differences could come from how a given model processes the same memory-augmented input rather than from the stored trajectories themselves. Without that isolation the causal claim does not land. The abstract also gives no statistical tests, raw data, or controls, so the numbers cannot be verified from what is shown.

This is for people building long-horizon LLM agents who need to think about memory reliability. A reader working on agent safety or evaluation might pick up the idea, but the current evidence is too loose to treat as settled.

I would not send this to peer review in its present form; the central attribution needs tighter controls before referee time is spent.

Referee Report

1 major / 1 minor

Summary. The paper claims to identify and formalize 'Memory Contagion' as the cross-temporal propagation of evaluator bias through LLM agent memory stores. It reports that biased evaluator experiences lead to biased trajectories that, when consolidated (even perfectly via oracle), propagate bias to future agents retrieving from the shared memory. Experiments across length preference and authority bias, four phases, and 15 multi-seed runs show propagation for length bias on older models (Gamma_A=13.18 for DeepSeek V4-Chat) but not newer models or authority bias (Gamma_A=0.00), with effects at contamination rates as low as p=0.2, implying no safe threshold and model-generation dependence.

Significance. If the central causal claim holds after controls, the work would highlight a previously unexamined vulnerability in agent memory systems: that biased input trajectories can contaminate future agent behavior across temporal boundaries even under perfect consolidation. The reported model- and bias-type specificity, plus the low-contamination threshold, would provide concrete, falsifiable evidence for revising memory architectures in deployed LLM agents. The multi-seed experimental reporting is a strength.

major comments (1)

[Abstract / Experimental phases] The experimental design does not isolate memory content as the cause of observed bias propagation. The abstract and described setup report effects only when varying evaluator bias but do not hold retrieval mechanisms, prompt templates, and model fixed while varying solely the consolidated memory trajectories; observed differences could therefore arise from model-specific processing of memory-augmented inputs rather than the memory store itself. This directly undermines the claim that biased input is a sufficient cause of cross-temporal propagation (see abstract results on Gamma_A values and the oracle consolidation condition).

minor comments (1)

[Abstract] The abstract states quantitative results (Gamma_A values, p=0.2 threshold) without defining Gamma_A, describing the statistical test used, or providing raw data or controls, which hinders immediate verification even if full methods appear later.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address the single major comment below. Our response maintains that the oracle consolidation and fixed downstream components isolate memory content as the causal factor, while agreeing to add explicit clarifications in revision.

read point-by-point responses

Referee: [Abstract / Experimental phases] The experimental design does not isolate memory content as the cause of observed bias propagation. The abstract and described setup report effects only when varying evaluator bias but do not hold retrieval mechanisms, prompt templates, and model fixed while varying solely the consolidated memory trajectories; observed differences could therefore arise from model-specific processing of memory-augmented inputs rather than the memory store itself. This directly undermines the claim that biased input is a sufficient cause of cross-temporal propagation (see abstract results on Gamma_A values and the oracle consolidation condition).

Authors: We respectfully disagree that the design fails to isolate memory content. The four-phase protocol holds retrieval mechanisms, prompt templates, and the phase-4 agent model fixed while the sole difference between conditions is the content of the oracle-consolidated memory store (trajectories generated under biased vs. unbiased evaluators). The oracle condition ensures the memory contains exactly those trajectories with no other alterations. Model- and bias-type specificity (Gamma_A = 13.18 for length bias on DeepSeek V4-Chat; Gamma_A = 0.00 for authority bias and newer models) is consistent with memory content interacting with model processing rather than a confound. To strengthen clarity we will revise the methods to state the fixed components explicitly and update the abstract to note that all non-memory elements remain constant across conditions. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical claims from experiments, no derivations or self-referential fits

full rationale

The paper reports experimental results on bias propagation through agent memory across bias types, models, and contamination rates. No equations, derivations, or first-principles predictions appear in the abstract or described setup. Claims rest on observed differences (e.g., Gamma_A values) from multi-seed runs rather than quantities defined in terms of parameters fitted to the target data. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results are present. The work is self-contained as an empirical study; external benchmarks (model behavior, consolidation oracles) are independent of any internal fit.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Abstract-only review provides no information on fitted parameters; the central claim rests on domain assumptions about memory consolidation and retrieval that are stated at a high level.

axioms (2)

domain assumption Agent experiences are generated under the guidance of evaluators that may carry bias
Required for the setup in which biased trajectories enter memory.
domain assumption Memory consolidation can be performed perfectly by an oracle
Used to isolate memory content as the source of propagation.

invented entities (1)

Memory Contagion no independent evidence
purpose: Names the cross-temporal bias propagation effect
Newly formalized phenomenon introduced to organize the experimental observations.

pith-pipeline@v0.9.1-grok · 5796 in / 1415 out tokens · 36503 ms · 2026-06-26T08:40:29.413118+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 3 linked inside Pith

[1]

arXiv preprint arXiv:2605.12978 , year=

Useful Memories Become Faulty When Continuously Updated by LLMs , author=. arXiv preprint arXiv:2605.12978 , year=

Pith/arXiv arXiv
[2]

arXiv preprint , year=

A Survey on LLM-Based Multi-Agent Systems: Architecture, Challenges, and Applications , author=. arXiv preprint , year=
[3]

arXiv preprint , year=

Agent Memory in the Age of LLMs: A Survey , author=. arXiv preprint , year=
[4]

NeurIPS , year=

Deep reinforcement learning from human preferences , author=. NeurIPS , year=
[5]

NeurIPS , volume=

Training language models to follow instructions with human feedback , author=. NeurIPS , volume=
[6]

arXiv preprint arXiv:2306.05685 , year=

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , author=. arXiv preprint arXiv:2306.05685 , year=

Pith/arXiv arXiv
[7]

arXiv preprint arXiv:2305.17926 , year=

Large Language Models are Not Fair Evaluators , author=. arXiv preprint arXiv:2305.17926 , year=

Pith/arXiv arXiv
[8]

arXiv preprint , year=

Debiasing Large Language Models via In-Context Learning , author=. arXiv preprint , year=
[9]

ACL , year=

StereoSet: Measuring stereotypical bias in pretrained language models , author=. ACL , year=
[10]

AAMAS , year=

Social Contagion in Multi-Agent Systems , author=. AAMAS , year=
[11]

MM-EPC: Bias Contagion Between Modalities in Multimodal LLMs , author=
[12]

Contagion Networks: Bias Propagation Between Agents in Multi-Agent Systems , author=
[13]

arXiv preprint , year=

Reward Model Misspecification in RLHF , author=. arXiv preprint , year=
[14]

The Wasserstein Distance and Its Applications , author=
[15]

NeurIPS , year=

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , author=. NeurIPS , year=
[16]

arXiv preprint arXiv:2508.06401 , year=

A Systematic Literature Review of Retrieval-Augmented Generation , author=. arXiv preprint arXiv:2508.06401 , year=

arXiv
[17]

EMNLP Findings , year=

Revisiting Catastrophic Forgetting in Large Language Models , author=. EMNLP Findings , year=
[18]

AAAI , year=

MemoryBank: Enhancing Large Language Models with Long-Term Memory , author=. AAAI , year=
[19]

ACL Findings , year=

Discovering Language Model Behaviors with Model-Written Evaluations , author=. ACL Findings , year=
[20]

ICLR , year=

Towards Understanding Sycophancy in Language Models , author=. ICLR , year=
[21]

Locating and Editing Factual Associations in

Meng, Kevin and others , booktitle=. Locating and Editing Factual Associations in
[22]

arXiv preprint arXiv:2110.11309 , year=

Model Editing for Large Language Models , author=. arXiv preprint arXiv:2110.11309 , year=

arXiv
[23]

American Journal of Sociology , volume=

Complex Contagions and the Weakness of Long Ties , author=. American Journal of Sociology , volume=
[24]

UIST , year=

Generative Agents: Interactive Simulacra of Human Behavior , author=. UIST , year=
[25]

Proceedings of the National Academy of Sciences , volume=

Overcoming Catastrophic Forgetting in Neural Networks , author=. Proceedings of the National Academy of Sciences , volume=

[1] [1]

arXiv preprint arXiv:2605.12978 , year=

Useful Memories Become Faulty When Continuously Updated by LLMs , author=. arXiv preprint arXiv:2605.12978 , year=

Pith/arXiv arXiv

[2] [2]

arXiv preprint , year=

A Survey on LLM-Based Multi-Agent Systems: Architecture, Challenges, and Applications , author=. arXiv preprint , year=

[3] [3]

arXiv preprint , year=

Agent Memory in the Age of LLMs: A Survey , author=. arXiv preprint , year=

[4] [4]

NeurIPS , year=

Deep reinforcement learning from human preferences , author=. NeurIPS , year=

[5] [5]

NeurIPS , volume=

Training language models to follow instructions with human feedback , author=. NeurIPS , volume=

[6] [6]

arXiv preprint arXiv:2306.05685 , year=

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , author=. arXiv preprint arXiv:2306.05685 , year=

Pith/arXiv arXiv

[7] [7]

arXiv preprint arXiv:2305.17926 , year=

Large Language Models are Not Fair Evaluators , author=. arXiv preprint arXiv:2305.17926 , year=

Pith/arXiv arXiv

[8] [8]

arXiv preprint , year=

Debiasing Large Language Models via In-Context Learning , author=. arXiv preprint , year=

[9] [9]

ACL , year=

StereoSet: Measuring stereotypical bias in pretrained language models , author=. ACL , year=

[10] [10]

AAMAS , year=

Social Contagion in Multi-Agent Systems , author=. AAMAS , year=

[11] [11]

MM-EPC: Bias Contagion Between Modalities in Multimodal LLMs , author=

[12] [12]

Contagion Networks: Bias Propagation Between Agents in Multi-Agent Systems , author=

[13] [13]

arXiv preprint , year=

Reward Model Misspecification in RLHF , author=. arXiv preprint , year=

[14] [14]

The Wasserstein Distance and Its Applications , author=

[15] [15]

NeurIPS , year=

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , author=. NeurIPS , year=

[16] [16]

arXiv preprint arXiv:2508.06401 , year=

A Systematic Literature Review of Retrieval-Augmented Generation , author=. arXiv preprint arXiv:2508.06401 , year=

arXiv

[17] [17]

EMNLP Findings , year=

Revisiting Catastrophic Forgetting in Large Language Models , author=. EMNLP Findings , year=

[18] [18]

AAAI , year=

MemoryBank: Enhancing Large Language Models with Long-Term Memory , author=. AAAI , year=

[19] [19]

ACL Findings , year=

Discovering Language Model Behaviors with Model-Written Evaluations , author=. ACL Findings , year=

[20] [20]

ICLR , year=

Towards Understanding Sycophancy in Language Models , author=. ICLR , year=

[21] [21]

Locating and Editing Factual Associations in

Meng, Kevin and others , booktitle=. Locating and Editing Factual Associations in

[22] [22]

arXiv preprint arXiv:2110.11309 , year=

Model Editing for Large Language Models , author=. arXiv preprint arXiv:2110.11309 , year=

arXiv

[23] [23]

American Journal of Sociology , volume=

Complex Contagions and the Weakness of Long Ties , author=. American Journal of Sociology , volume=

[24] [24]

UIST , year=

Generative Agents: Interactive Simulacra of Human Behavior , author=. UIST , year=

[25] [25]

Proceedings of the National Academy of Sciences , volume=

Overcoming Catastrophic Forgetting in Neural Networks , author=. Proceedings of the National Academy of Sciences , volume=