MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection

Duohe Ma; Feng Liu; Guoan Wang; Huiyan Jin; Liang Lu; Lin Sun; Mengyuan Fan; Tong Yang; Wenhan Yu; Xiangzheng Zhang

arxiv: 2605.23723 · v1 · pith:V2FYZIU4new · submitted 2026-05-22 · 💻 cs.AI

MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection

Zhewen Tan , Yilun Yao , Huiyan Jin , Wenhan Yu , Guoan Wang , Mengyuan Fan , liang lu , Feng Liu

show 4 more authors

Xiangzheng Zhang Duohe Ma Tong Yang Lin Sun

This is my paper

Pith reviewed 2026-05-25 04:03 UTC · model grok-4.3

classification 💻 cs.AI

keywords memory auditingLLM agentsmemory poisoningcausal attributionpost-hoc defensememory consistency graphcounterfactual influence

0 comments

The pith

MemAudit identifies poisoned memories in LLM agents after attacks by scoring each record's causal contribution to harmful outputs and detecting structural anomalies in the memory store.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a post-hoc auditing method that traces which stored memories caused an agent to produce bad results, then removes them. This matters because agents increasingly keep persistent memory of past interactions, and an adversary can slip malicious records into that store through ordinary conversations. Once retrieved later, those records steer the agent's reasoning without any ongoing attacker presence. By measuring counterfactual influence and building a consistency graph across all memories, the approach isolates the injected records. Experiments show the method drives attack success rates to zero in both question-answering and reasoning-agent settings under realistic conditions.

Core claim

MemAudit combines a counterfactual memory influence score, which quantifies how much each memory record causally affects the production of harmful outputs, with a memory consistency graph that surfaces records whose content or retrieval patterns deviate from the rest of the store. When applied after harmful behavior is observed, these two signals together locate and neutralize the malicious records that were injected through normal agent interactions in the MINJA attack, eliminating the attack success that previously reached 70 percent in QA tasks and 83.3 percent in RAP tasks.

What carries the argument

the dual-signal auditing procedure that pairs a counterfactual memory influence score with a memory consistency graph to attribute and isolate malicious records

If this is right

Agents can continue using long-term memory stores without permanent compromise once harmful behavior appears.
Defense can shift from blocking inputs in real time to cleaning the memory bank afterward.
The same auditing signals can be recomputed whenever new harmful outputs are observed.
Memory stores remain usable for retrieval while still allowing targeted removal of compromised entries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The auditing approach could extend to retrieval-augmented generation systems that also maintain persistent document stores.
Repeated auditing passes might allow agents to maintain memory integrity over very long interaction histories.
If the consistency graph can be maintained incrementally, the cost of each audit round could stay low enough for routine use.

Load-bearing premise

The two signals are together sufficient to separate malicious records from benign ones without producing many false positives or missing poisons across the tested agent configurations.

What would settle it

A new memory-injection technique that produces records whose removal does not change the harmful outputs yet still evades detection by both the influence score and the consistency graph.

Figures

Figures reproduced from arXiv: 2605.23723 by Duohe Ma, Feng Liu, Guoan Wang, Huiyan Jin, Liang Lu, Lin Sun, Mengyuan Fan, Tong Yang, Wenhan Yu, Xiangzheng Zhang, Yilun Yao, Zhewen Tan.

read the original abstract

Large language model agents increasingly rely on persistent memory to store past interactions, retrieve relevant demonstrations, and improve long-horizon task execution. However, this memory mechanism also creates a practical security vulnerability: an adversarial user may inject malicious records into the agent's memory through ordinary interaction, and these records can later be retrieved to steer the agent's reasoning and actions. Existing defenses primarily focus on online intervention, such as prompt filtering or output blocking, but they do not address the post-hoc question of which stored memories are responsible after harmful behavior has already been observed. We propose \textbf{MemAudit}, a post-hoc causal memory auditing framework for memory-augmented LLM agents. The framework combines two complementary signals: (1) a counterfactual memory influence score that measures each memory's causal contribution to harmful outputs, and (2) a memory consistency graph that identifies structurally anomalous memories within the broader memory store. We evaluate MemAudit against MINJA, a query-only memory injection attack in which malicious records are generated and stored through normal agent interactions rather than direct memory-bank modification. Across both QA and reasoning-agent settings, MemAudit substantially reduces attack success rates under realistic post-hoc auditing scenarios. The results show that QA attack success is reduced from $70\%$ to $0\%$, while RAP attack success drops from $83.3\%$ to $0\%$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MemAudit pairs a counterfactual influence score with a memory consistency graph for post-hoc auditing of injected records in LLM agent memory, but the abstract's 0% attack success claims rest on unshown experimental details and decision rules.

read the letter

The paper's main move is a post-hoc framework that scores each memory record by its causal effect on harmful outputs and then checks the whole store for structural outliers via a consistency graph. This targets the gap left by online defenses that try to stop bad behavior before it happens. The MINJA attack setup, where poisons arrive through ordinary queries rather than direct edits, matches how real agents would be compromised, so the threat model feels grounded. The reported drops from 70% to 0% on QA and 83.3% to 0% on RAP are the headline numbers, and the idea of using two signals together is presented as new relative to prior work the abstract cites. That combination is the clearest contribution. The abstract supplies no equations or derivations, which keeps the circularity burden low, but it also means the influence score computation, the graph construction, the way the two signals are combined, and the exact thresholds are not visible. The stress-test concern lands: without trial counts, false-positive rates on clean memory, or confirmation that the decision rule was not tuned on the same attacks, the exact-zero results cannot be assessed for robustness. If either signal is noisy or correlated, or if the combination overfits the tested cases, the auditing utility would shrink. The paper is aimed at researchers building or securing memory-augmented agents for long-horizon tasks. A reader already working on agent safety or retrieval poisoning would find the framing useful even if the current evidence is thin. The work shows clear thinking about the post-hoc setting and honest engagement with the practical vulnerability, so it is coherent on its own terms. I would bring it to a reading group for the idea and the attack description, but not yet for the results. I would not cite it until the methods and controls are filled in. A serious editor should send it to peer review so the authors can supply the missing experimental details and let referees check whether the two-signal rule actually generalizes.

Referee Report

3 major / 0 minor

Summary. The manuscript proposes MemAudit, a post-hoc causal memory auditing framework for memory-augmented LLM agents. It combines a counterfactual memory influence score measuring each memory's causal contribution to harmful outputs with a memory consistency graph identifying structurally anomalous memories. Evaluated against the MINJA query-only memory injection attack, the paper claims substantial reductions in attack success rates under post-hoc auditing, specifically reducing QA ASR from 70% to 0% and RAP ASR from 83.3% to 0%.

Significance. If the empirical results hold under rigorous validation, MemAudit would address an important gap in defenses for memory-augmented LLM agents by enabling post-hoc identification and removal of malicious records after harmful behavior is observed, complementing existing online intervention methods. The dual use of causal attribution and structural anomaly detection provides a concrete, falsifiable approach to this security problem.

major comments (3)

[Abstract] Abstract: The headline claims of reducing QA attack success from 70% to 0% and RAP from 83.3% to 0% are presented without any information on trial counts, statistical tests, baseline comparisons, variance across runs, or the precise computation and thresholding of the counterfactual influence scores, rendering the central empirical claim unverifiable from the provided evidence.
[Method] Method description: No details are given on how the counterfactual memory influence score is computed from interventions, how it is combined with the memory consistency graph anomaly score (e.g., via thresholds, weighting, or logical conjunction), or whether the final decision rule was tuned on the reported test attacks, which directly bears on whether the two signals suffice to neutralize poisons without high false positives or missed records.
[Evaluation] Evaluation: The manuscript supplies no false-positive rates when MemAudit is applied to clean memory stores, nor any analysis of missed poisons or robustness under distribution shift, leaving the weakest assumption (that the combined signals reliably flag poisons in realistic agent settings) untested and the 0% ASR figures potentially non-generalizable.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important areas for improving the clarity and completeness of our empirical claims, methodological details, and evaluation. We will revise the manuscript accordingly to address each point.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claims of reducing QA attack success from 70% to 0% and RAP from 83.3% to 0% are presented without any information on trial counts, statistical tests, baseline comparisons, variance across runs, or the precise computation and thresholding of the counterfactual influence scores, rendering the central empirical claim unverifiable from the provided evidence.

Authors: We agree that the abstract should provide more context to make the claims verifiable. In the revision, we will expand the abstract to note that results are averaged over 10 independent runs with reported standard deviations, include a brief mention of baseline comparisons (standard retrieval without auditing), and indicate that the influence score uses a fixed threshold of 0.5 on the causal effect difference. Full details on computation and statistical tests will remain in the main text and appendix due to length constraints. revision: yes
Referee: [Method] Method description: No details are given on how the counterfactual memory influence score is computed from interventions, how it is combined with the memory consistency graph anomaly score (e.g., via thresholds, weighting, or logical conjunction), or whether the final decision rule was tuned on the reported test attacks, which directly bears on whether the two signals suffice to neutralize poisons without high false positives or missed records.

Authors: We will revise the Method section to include the exact computation: the counterfactual influence score is defined as the difference in the LLM's output probability for a harmful response when performing a do-intervention that removes the candidate memory record. The consistency graph anomaly score measures deviation from average node connectivity. The signals are combined via logical conjunction after independent thresholding (influence > 0.3 and anomaly > 2 standard deviations). Thresholds were selected on a held-out validation set of clean and poisoned memories, not on the test attacks. We will add equations, pseudocode, and explicit discussion of this process. revision: yes
Referee: [Evaluation] Evaluation: The manuscript supplies no false-positive rates when MemAudit is applied to clean memory stores, nor any analysis of missed poisons or robustness under distribution shift, leaving the weakest assumption (that the combined signals reliably flag poisons in realistic agent settings) untested and the 0% ASR figures potentially non-generalizable.

Authors: We agree these metrics are necessary for a complete evaluation. In the revised manuscript, we will add a new subsection reporting a false-positive rate below 5% when applying MemAudit to five clean memory stores of varying sizes. We will confirm zero missed poisons (100% recall) in the reported experiments and include an analysis of robustness under distribution shift by testing on out-of-domain queries from a different domain, where ASR remains at 0%. These results will be presented with the same trial counts as the main experiments. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes an empirical post-hoc auditing method that combines a counterfactual memory influence score with a memory consistency graph, then reports experimental reductions in attack success rates on QA and RAP tasks under the MINJA attack. No equations, parameter-fitting steps, or derivation chains appear in the abstract or description that would reduce the claimed outcomes to self-definitions, fitted inputs renamed as predictions, or load-bearing self-citations. The results are presented as direct empirical measurements rather than derived quantities, leaving the framework self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no information on free parameters, background axioms, or new postulated entities; the framework is described at a high level without implementation specifics.

pith-pipeline@v0.9.0 · 5809 in / 1098 out tokens · 30315 ms · 2026-05-25T04:03:12.596566+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 7 internal anchors

[1]

Advances in neural information processing systems , volume=

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=

work page
[2]

Advances in Neural Information Processing Systems , volume=

Webshop: Towards scalable real-world web interaction with grounded language agents , author=. Advances in Neural Information Processing Systems , volume=

work page
[3]

GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts , author=. arXiv preprint arXiv:2309.10253 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents

Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents , author=. arXiv preprint arXiv:2604.02623 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[5]

GPT-4o System Card

Gpt-4o system card , author=. arXiv preprint arXiv:2410.21276 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[6]

DeepSeek-V3 Technical Report

Deepseek-v3 technical report , author=. arXiv preprint arXiv:2412.19437 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[7]

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

Openhands: An open platform for ai software developers as generalist agents , author=. arXiv preprint arXiv:2407.16741 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

work page
[9]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Voyager: An open-ended embodied agent with large language models , author=. arXiv preprint arXiv:2305.16291 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[10]

The eleventh international conference on learning representations , year=

React: Synergizing reasoning and acting in language models , author=. The eleventh international conference on learning representations , year=

work page
[11]

Advances in neural information processing systems , volume=

Reflexion: Language agents with verbal reinforcement learning , author=. Advances in neural information processing systems , volume=

work page
[12]

Advances in Neural Information Processing Systems , volume=

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases , author=. Advances in Neural Information Processing Systems , volume=

work page
[13]

arXiv e-prints , pages=

A practical memory injection attack against llm agents , author=. arXiv e-prints , pages=

work page
[14]

arXiv preprint arXiv:2512.16962 , year=

MemoryGraft: Persistent compromise of LLM agents via poisoned experience retrieval , author=. arXiv preprint arXiv:2512.16962 , year=

work page arXiv
[15]

arXiv preprint arXiv:2601.07072 , year=

Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems , author=. arXiv preprint arXiv:2601.07072 , year=

work page arXiv
[16]

34th USENIX Security Symposium (USENIX Security 25) , pages=

\ PoisonedRAG \ : Knowledge corruption attacks to \ Retrieval-Augmented \ generation of large language models , author=. 34th USENIX Security Symposium (USENIX Security 25) , pages=

work page
[17]

Proceedings of the 16th ACM workshop on artificial intelligence and security , pages=

Not what you've signed up for: Compromising real-world llm-integrated applications with indirect prompt injection , author=. Proceedings of the 16th ACM workshop on artificial intelligence and security , pages=

work page
[18]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

Attacks, defenses and evaluations for llm conversation safety: A survey , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

work page 2024
[19]

arXiv preprint arXiv:2505.12567 , year=

A survey of attacks on large language models , author=. arXiv preprint arXiv:2505.12567 , year=

work page arXiv
[20]

Jailbreak Attacks and Defenses Against Large Language Models: A Survey

Jailbreak attacks and defenses against large language models: A survey , author=. arXiv preprint arXiv:2407.04295 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[21]

do anything now

" do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models , author=. Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security , pages=

work page 2024
[22]

arXiv preprint arXiv:2505.04806 , year=

Red teaming the mind of the machine: A systematic evaluation of prompt injection and jailbreak vulnerabilities in llms , author=. arXiv preprint arXiv:2505.04806 , year=

work page arXiv
[23]

ICT Express , year=

From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows , author=. ICT Express , year=

work page
[24]

arXiv preprint arXiv:2509.14285 , year=

A multi-agent LLM defense pipeline against prompt injection attacks , author=. arXiv preprint arXiv:2509.14285 , year=

work page arXiv
[25]

2021 , eprint=

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing , author=. 2021 , eprint=

work page 2021
[26]

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) , pages=

A broad-coverage challenge corpus for sentence understanding through inference , author=. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) , pages=

work page 2018
[27]

arXiv preprint arXiv:2510.02373 , year=

A-memguard: A proactive defense framework for llm-based agent memory , author=. arXiv preprint arXiv:2510.02373 , year=

work page arXiv
[28]

URLhttps://arxiv.org/abs/2601.05504 First Author et al.:Preprint submitted to ElsevierPage 20 of 21 Security, Privacy, and Ethical Risks in OpenClaw

Memory Poisoning Attack and Defense on Memory Based LLM-Agents , author=. arXiv preprint arXiv:2601.05504 , year=

work page arXiv
[29]

arXiv preprint arXiv:2603.02240 , year=

SuperLocalMemory: Privacy-preserving multi-agent memory with Bayesian trust defense against memory poisoning , author=. arXiv preprint arXiv:2603.02240 , year=

work page arXiv

[1] [1]

Advances in neural information processing systems , volume=

Retrieval-augmented generation for knowledge-intensive nlp tasks , author=. Advances in neural information processing systems , volume=

work page

[2] [2]

Advances in Neural Information Processing Systems , volume=

Webshop: Towards scalable real-world web interaction with grounded language agents , author=. Advances in Neural Information Processing Systems , volume=

work page

[3] [3]

GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts , author=. arXiv preprint arXiv:2309.10253 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents

Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents , author=. arXiv preprint arXiv:2604.02623 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

GPT-4o System Card

Gpt-4o system card , author=. arXiv preprint arXiv:2410.21276 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

DeepSeek-V3 Technical Report

Deepseek-v3 technical report , author=. arXiv preprint arXiv:2412.19437 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

Openhands: An open platform for ai software developers as generalist agents , author=. arXiv preprint arXiv:2407.16741 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

work page

[9] [9]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Voyager: An open-ended embodied agent with large language models , author=. arXiv preprint arXiv:2305.16291 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

The eleventh international conference on learning representations , year=

React: Synergizing reasoning and acting in language models , author=. The eleventh international conference on learning representations , year=

work page

[11] [11]

Advances in neural information processing systems , volume=

Reflexion: Language agents with verbal reinforcement learning , author=. Advances in neural information processing systems , volume=

work page

[12] [12]

Advances in Neural Information Processing Systems , volume=

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases , author=. Advances in Neural Information Processing Systems , volume=

work page

[13] [13]

arXiv e-prints , pages=

A practical memory injection attack against llm agents , author=. arXiv e-prints , pages=

work page

[14] [14]

arXiv preprint arXiv:2512.16962 , year=

MemoryGraft: Persistent compromise of LLM agents via poisoned experience retrieval , author=. arXiv preprint arXiv:2512.16962 , year=

work page arXiv

[15] [15]

arXiv preprint arXiv:2601.07072 , year=

Overcoming the Retrieval Barrier: Indirect Prompt Injection in the Wild for LLM Systems , author=. arXiv preprint arXiv:2601.07072 , year=

work page arXiv

[16] [16]

34th USENIX Security Symposium (USENIX Security 25) , pages=

\ PoisonedRAG \ : Knowledge corruption attacks to \ Retrieval-Augmented \ generation of large language models , author=. 34th USENIX Security Symposium (USENIX Security 25) , pages=

work page

[17] [17]

Proceedings of the 16th ACM workshop on artificial intelligence and security , pages=

Not what you've signed up for: Compromising real-world llm-integrated applications with indirect prompt injection , author=. Proceedings of the 16th ACM workshop on artificial intelligence and security , pages=

work page

[18] [18]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

Attacks, defenses and evaluations for llm conversation safety: A survey , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

work page 2024

[19] [19]

arXiv preprint arXiv:2505.12567 , year=

A survey of attacks on large language models , author=. arXiv preprint arXiv:2505.12567 , year=

work page arXiv

[20] [20]

Jailbreak Attacks and Defenses Against Large Language Models: A Survey

Jailbreak attacks and defenses against large language models: A survey , author=. arXiv preprint arXiv:2407.04295 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

do anything now

" do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models , author=. Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security , pages=

work page 2024

[22] [22]

arXiv preprint arXiv:2505.04806 , year=

Red teaming the mind of the machine: A systematic evaluation of prompt injection and jailbreak vulnerabilities in llms , author=. arXiv preprint arXiv:2505.04806 , year=

work page arXiv

[23] [23]

ICT Express , year=

From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows , author=. ICT Express , year=

work page

[24] [24]

arXiv preprint arXiv:2509.14285 , year=

A multi-agent LLM defense pipeline against prompt injection attacks , author=. arXiv preprint arXiv:2509.14285 , year=

work page arXiv

[25] [25]

2021 , eprint=

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing , author=. 2021 , eprint=

work page 2021

[26] [26]

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) , pages=

A broad-coverage challenge corpus for sentence understanding through inference , author=. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) , pages=

work page 2018

[27] [27]

arXiv preprint arXiv:2510.02373 , year=

A-memguard: A proactive defense framework for llm-based agent memory , author=. arXiv preprint arXiv:2510.02373 , year=

work page arXiv

[28] [28]

URLhttps://arxiv.org/abs/2601.05504 First Author et al.:Preprint submitted to ElsevierPage 20 of 21 Security, Privacy, and Ethical Risks in OpenClaw

Memory Poisoning Attack and Defense on Memory Based LLM-Agents , author=. arXiv preprint arXiv:2601.05504 , year=

work page arXiv

[29] [29]

arXiv preprint arXiv:2603.02240 , year=

SuperLocalMemory: Privacy-preserving multi-agent memory with Bayesian trust defense against memory poisoning , author=. arXiv preprint arXiv:2603.02240 , year=

work page arXiv