{"paper":{"title":"State Contamination in Memory-Augmented LLM Agents","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Toxic information can be compressed into memory summaries that pass toxicity detectors but still raise the chance of harmful future outputs in LLM agents.","cross_cats":["cs.LG"],"primary_cat":"cs.AI","authors_text":"Agam Goyal, Hari Sundaram, Yian Wang, Yuen Chen","submitted_at":"2026-05-16T01:55:06Z","abstract_excerpt":"LLM agents increasingly rely on persistent state, including transcripts, summaries, retrieved context, and memory buffers, to support long-horizon interaction. This makes safety depend not only on individual model outputs, but also on what an agent stores and later reuses. We study a failure mode we call memory laundering: toxic or adversarial context can be compressed into memory summaries that no longer appear toxic under standard detectors, while still preserving hostile framing or conflict structure that influences future generations. Using paired counterfactual multi-agent rollouts, we sh"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"toxic-origin memory summaries can remain below common toxicity thresholds while nevertheless increasing downstream toxicity relative to matched neutral baselines","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The paired counterfactual multi-agent rollouts successfully isolate the causal effect of memory state on downstream toxicity without confounding variables from agent behavior or prompt differences.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Toxic context can be laundered into memory summaries that stay below toxicity thresholds while still driving higher downstream toxicity in LLM agents compared to neutral baselines.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Toxic information can be compressed into memory summaries that pass toxicity detectors but still raise the chance of harmful future outputs in LLM agents.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"bdb868d679da59731ee08eeb66913102f2da0f8bf67db05d071a505a7b6fdaf3"},"source":{"id":"2605.16746","kind":"arxiv","version":1},"verdict":{"id":"79521371-db6d-4111-afe9-251aed124cff","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T21:23:21.706686Z","strongest_claim":"toxic-origin memory summaries can remain below common toxicity thresholds while nevertheless increasing downstream toxicity relative to matched neutral baselines","one_line_summary":"Toxic context can be laundered into memory summaries that stay below toxicity thresholds while still driving higher downstream toxicity in LLM agents compared to neutral baselines.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The paired counterfactual multi-agent rollouts successfully isolate the causal effect of memory state on downstream toxicity without confounding variables from agent behavior or prompt differences.","pith_extraction_headline":"Toxic information can be compressed into memory summaries that pass toxicity detectors but still raise the chance of harmful future outputs in LLM agents."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.16746/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_title_agreement","ran_at":"2026-05-19T21:31:19.379059Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T21:31:14.506483Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T19:01:56.330261Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T18:33:26.460448Z","status":"skipped","version":"1.0.0","findings_count":0}],"snapshot_sha256":"e85683837d864de5b7f825e379665f5adea78792ddc6108a9627ac6ed0dce6f8"},"references":{"count":26,"sample":[{"doi":"","year":null,"title":"MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems","work_id":"eaff38e0-7f83-4800-aa64-f8687ace2302","ref_index":1,"cited_arxiv_id":"2510.17281","is_internal_anchor":true},{"doi":"","year":null,"title":"Position: Safety and Fairness in Agentic AI Depend on Interaction Topology, Not on Model Scale or Alignment","work_id":"85ff38c7-a8b1-41be-82f2-0990356babb6","ref_index":2,"cited_arxiv_id":"2605.01147","is_internal_anchor":true},{"doi":"","year":null,"title":"Ai agents need memory control over more context.arXiv preprint arXiv:2601.11653,","work_id":"4bec663a-1fd9-4335-9c5a-b5261fec0cec","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"CoRR abs/2502.20383(2025) PIIGuard: Mitigating PII Harvesting under Adversarial Sanitization 17","work_id":"818e90ef-0523-46c1-b07b-91ceb84aba21","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Ai safety in generative ai large language models: A survey.arXiv preprint arXiv:2407.18369,","work_id":"0cf14c98-db8c-47ed-b21b-cfe4bf84e274","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":26,"snapshot_sha256":"26d0848edd34922f55c156c4c5e8858489806657044ba52eedc90233a775b9aa","internal_anchors":7},"formal_canon":{"evidence_count":1,"snapshot_sha256":"34824680f79c034d078c4c58c136a4ace0e7601dbc4e8776852a1da8e3e0ea48"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}