pith. sign in

arxiv: 2605.09033 · v3 · pith:CPWBRHATnew · submitted 2026-05-09 · 💻 cs.CR · cs.AI

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

Pith reviewed 2026-05-19 17:34 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords poisoning attackgraph-based memoryLLM agentsrelation channel conflictagent memory retrievalAIR pipelinememory poisoning
0
0 comments X

The pith

ShadowMerge poisons graph-based agent memory by injecting relations that share the same query-activated anchor and channel as legitimate evidence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that graph-based memory systems in LLM agents, which store structured relations for long-term recall and multi-hop reasoning, can be poisoned when an attacker crafts a relation that carries a conflicting value yet matches the benign evidence on the activated anchor and canonicalized channel. This match lets the system treat the poison as ordinary data that gets extracted, merged into the target neighborhood, and later retrieved for the victim query. The AIR pipeline makes this possible by turning the channel conflict into a standard interaction the memory system already handles. A reader would care because successful poisoning would let attackers steer agent behavior on chosen tasks, such as medical question answering or shopping simulations, while leaving unrelated queries untouched.

Core claim

ShadowMerge is a poisoning attack against graph-based agent memory that exploits relation-channel conflicts. Its key insight is that a poisoned relation can share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. The AIR pipeline converts this conflict into an ordinary interaction that the graph-memory system extracts, merges into the target anchor neighborhood, and retrieves for the victim query. On Mem0 and the PubMedQA, WebShop, and ToolEmu datasets the attack reaches 93.8 percent average success rate, far above prior baselines, with negligible effect on unrelated benign tasks.

What carries the argument

The AIR pipeline, which converts a relation-channel conflict into an ordinary interaction that shares the query-activated anchor and canonicalized relation channel with benign evidence so the graph-memory system extracts, merges, and retrieves it.

If this is right

  • The attack overcomes the extraction, merge, and retrieval failures that limit prior poisoning methods on flat textual records.
  • Input-side defenses applied before memory ingestion do not prevent the poisoned relation from being merged and later retrieved.
  • The attack alters agent behavior on the targeted task while leaving performance on unrelated benign tasks essentially unchanged.
  • The same channel-sharing approach succeeds across Mem0 and multiple real-world datasets including question answering and tool-use scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Systems that canonicalize relation channels more strictly or add provenance checks on merges could limit this form of conflict-based poisoning.
  • The attack surface may extend to other structured memory formats that rely on anchor-based retrieval and channel normalization for long-term agent recall.
  • Persistent deployment of graph memory in agents would benefit from monitoring for anomalous relations that activate on the same queries as established evidence.

Load-bearing premise

The graph-memory system will extract, merge into the target anchor neighborhood, and retrieve the poisoned relation for the victim query when it shares the same query-activated anchor and canonicalized relation channel as benign evidence via the AIR pipeline.

What would settle it

Running the victim query after injection and finding that the poisoned relation is never retrieved or used, even though it shares the anchor and canonicalized channel with the benign evidence, would show the channel-conflict mechanism does not produce retrieval.

Figures

Figures reproduced from arXiv: 2605.09033 by Lingyun Peng, Shuyu Li, Tiantian Ji, Xinran Liu, Yang Luo, Yong Liu, Zifeng Kang.

Figure 1
Figure 1. Figure 1: Conventional flat memory versus graph-based agent memory. Flat memory appends independent chunks and retrieves them by similarity. Graph-based [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: Conventional flat memory versus graph-based agent memory. Flat memory appends independent chunks and retrieves them by similarity. Graph-based [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A motivating example: why text-only poisoning is unreliable in graph [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: A motivating example for graph-native memory poisoning. A direct [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: SHADOWMERGE workflow. The attacker first fixes (q ∗, y+, y−) under the threat model, using public knowledge for y + when needed. Anchor selects a high-reach entity from q ∗, Inscribe creates a channel-aligned conflicting relation π−, and Render produces a natural-language payload P ∗. After an ordinary interaction writes P ∗ into the shared memory graph, later victim queries can retrieve both benign eviden… view at source ↗
Figure 4
Figure 4. Figure 4: [RQ2] Graph-evidence construction across task suites. Segment width [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 4
Figure 4. Figure 4: [RQ2] Graph-evidence construction across task suites. Segment width [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: [RQ2] CDF of the best poisoned-evidence rank in the target-query [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
read the original abstract

Graph-based agent memory is increasingly used in LLM agents to support structured long-term recall and multi-hop reasoning, but it also creates a new poisoning surface: an attacker can inject a crafted relation into graph memory so that it is later retrieved and influences agent behavior. Existing agent-memory poisoning attacks mainly target flat textual records and are ineffective in graph-based memory because malicious relations often fail to be extracted, merged into the target anchor neighborhood, or retrieved for the victim query. We present SHADOWMERGE, a poisoning attack against graph-based agent memory that exploits relation-channel conflicts. Its key insight is that a poisoned relation can share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. To realize this, we design AIR, a pipeline that converts the conflict into an ordinary interaction that can be extracted, merged, and retrieved by the graph-memory system. We evaluate SHADOWMERGE on Mem0 and three public real-world datasets: PubMedQA, WebShop, and ToolEmu. SHADOWMERGE achieves 93.8% average attack success rate, improving the best baseline by 50.3 absolute points, while having negligible impact on unrelated benign tasks. Mechanism studies show that SHADOWMERGE overcomes the three key limitations of existing agent-memory poisoning attacks, and defense analysis shows that representative input-side defenses are insufficient to mitigate it. We have responsibly disclosed our findings to affected graph-memory vendors and open sourced SHADOWMERGE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents SHADOWMERGE, a poisoning attack on graph-based agent memory that exploits relation-channel conflicts. The core idea is an AIR pipeline that converts a conflicting poisoned relation into an ordinary interaction sharing the same query-activated anchor and canonicalized relation channel as benign evidence, enabling extraction, merging into the target neighborhood, and retrieval by systems such as Mem0. Evaluations on Mem0 with PubMedQA, WebShop, and ToolEmu report 93.8% average attack success rate (50.3 points above the best baseline) with negligible impact on unrelated benign tasks; mechanism studies claim the attack overcomes prior limitations, and defense analysis finds input-side defenses insufficient. The implementation is open-sourced after responsible disclosure.

Significance. If the central empirical claims hold, the work identifies a previously under-explored poisoning surface in structured graph memory for LLM agents, which is increasingly deployed for long-term recall and multi-hop reasoning. The reported ASR improvement and explicit comparison to baselines that fail on extraction/merging/retrieval steps provide concrete evidence of a practical threat. Credit is due for open-sourcing the code and for the responsible disclosure to vendors, both of which support reproducibility and follow-up work.

major comments (2)
  1. [§5.2] §5.2 (Mechanism Studies): the claim that SHADOWMERGE overcomes the three key limitations of prior attacks rests on the AIR pipeline successfully forcing extraction, merge, and retrieval of the poisoned relation. No independent audit of the graph state (pre- and post-injection snapshots or neighborhood inspection) is described to confirm that the conflicting value is merged into the exact target anchor neighborhood rather than discarded or isolated by Mem0's merge policy.
  2. [§4] §4 (Evaluation): the headline 93.8% ASR and the assertion that the attack works because the poisoned relation shares the canonicalized relation channel require explicit validation that Mem0's relation canonicalizer uses the same string-based canonicalization as AIR rather than semantic embedding distance. Without this or an ablation on canonicalization variants, it is unclear whether the success rates are robust or specific to the tested Mem0 configuration.
minor comments (2)
  1. [Abstract] Abstract: the three datasets are named only later in the text; listing PubMedQA, WebShop, and ToolEmu already in the abstract would improve immediate clarity.
  2. [§6] §6 (Defense Analysis): quantitative overhead or false-positive rates for any suggested mitigations would strengthen the practical takeaway beyond the qualitative statement that input-side defenses are insufficient.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the changes we will incorporate in the revised manuscript.

read point-by-point responses
  1. Referee: [§5.2] §5.2 (Mechanism Studies): the claim that SHADOWMERGE overcomes the three key limitations of prior attacks rests on the AIR pipeline successfully forcing extraction, merge, and retrieval of the poisoned relation. No independent audit of the graph state (pre- and post-injection snapshots or neighborhood inspection) is described to confirm that the conflicting value is merged into the exact target anchor neighborhood rather than discarded or isolated by Mem0's merge policy.

    Authors: We agree that an explicit audit of the graph state would strengthen the mechanistic claims. The current §5.2 relies on attack success rates and the AIR pipeline design to infer successful extraction, merge, and retrieval. In the revised version we will add pre- and post-injection graph snapshots together with neighborhood inspection results to directly verify that the conflicting poisoned relation is merged into the target anchor neighborhood rather than discarded or isolated. revision: yes

  2. Referee: [§4] §4 (Evaluation): the headline 93.8% ASR and the assertion that the attack works because the poisoned relation shares the canonicalized relation channel require explicit validation that Mem0's relation canonicalizer uses the same string-based canonicalization as AIR rather than semantic embedding distance. Without this or an ablation on canonicalization variants, it is unclear whether the success rates are robust or specific to the tested Mem0 configuration.

    Authors: The AIR pipeline employs string-based canonicalization after normalization to ensure the poisoned relation shares the same channel as benign evidence. The consistently high ASR across three datasets is consistent with this design. To address the concern directly, the revised §4 will include an explicit check of Mem0's canonicalizer behavior (via code inspection and logging) and an ablation across string-based versus embedding-based canonicalization variants to demonstrate robustness. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical attack evaluation

full rationale

This paper presents an empirical security attack (SHADOWMERGE) and its evaluation on external public datasets (PubMedQA, WebShop, ToolEmu) plus the Mem0 system. Attack success rates and mechanism studies are measured experimental outcomes, not quantities derived by construction from fitted parameters, self-definitions, or prior self-citations. The AIR pipeline is introduced as a design artifact whose behavior is validated through direct testing rather than assumed via internal equations or uniqueness theorems. No load-bearing step reduces to a self-referential fit or citation chain; the central 93.8% ASR claim rests on observable retrieval behavior in the target systems.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the domain assumption that graph memory systems process relations via query-activated anchors and canonicalized channels in a way that allows conflicting values to be merged and later retrieved.

axioms (1)
  • domain assumption Graph-based memory systems extract, merge, and retrieve relations based on query-activated anchors and canonicalized relation channels.
    This premise is required for the poisoned relation to be treated as ordinary evidence and retrieved for the victim query.

pith-pipeline@v0.9.0 · 5816 in / 1335 out tokens · 61029 ms · 2026-05-19T17:34:08.512884+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 14 internal anchors

  1. [1]

    ReAct: Synergizing Reasoning and Acting in Language Models

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” in International Conference on Learning Representations, 2023. [Online]. Available: https://arxiv.org/abs/2210.03629

  2. [2]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” inAdvances in Neural Information Processing Systems, 2023. [Online]. Available: https://arxiv.org/abs/2302.04761

  3. [3]

    Re- flexion: Language agents with verbal reinforcement learning,

    N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Re- flexion: Language agents with verbal reinforcement learning,”Advances in neural information processing systems, vol. 36, pp. 8634–8652, 2023

  4. [4]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    G. Wang, Y . Xie, Y . Jiang, A. Mandlekar, C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar, “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023. [Online]. Available: https://arxiv.org/abs/2305.16291

  5. [5]

    , year = 2023, booktitle =

    J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023, pp. 1–22. [Online]. Available: https://dl.acm.org/doi/10.1145/3586183.3606763

  6. [6]

    MemGPT: Towards LLMs as Operating Systems

    C. Packer, V . Fang, S. G. Patil, K. Lin, S. Wooders, and J. E. Gonzalez, “Memgpt: Towards llms as operating systems,” arXiv preprint arXiv:2310.08560, 2023. [Online]. Available: https: //arxiv.org/abs/2310.08560

  7. [7]

    MemoryBank: Enhancing Large Language Models with Long-Term Memory

    W. Zhong, L. Guo, Q. Gao, H. Ye, and Y . Wang, “Memorybank: Enhancing large language models with long-term memory,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, 2024, pp. 19 724–19 731. [Online]. Available: https://arxiv.org/abs/2305.10250

  8. [8]

    Augmenting language models with long-term memory,

    W. Wang, L. Dong, H. Cheng, X. Liu, X. Yan, J. Gao, and F. Wei, “Augmenting language models with long-term memory,” in Advances in Neural Information Processing Systems, vol. 36, 2023, pp. 74 530–74 543. [Online]. Available: https://arxiv.org/abs/2306.07174

  9. [10]

    A-MEM: Agentic Memory for LLM Agents

    [Online]. Available: https://arxiv.org/abs/2502.12110

  10. [11]

    MIRIX: Multi-Agent Memory System for LLM-Based Agents

    Y . Wang and X. Chen, “Mirix: Multi-agent memory system for llm-based agents,”arXiv preprint arXiv:2507.07957, 2025. [Online]. Available: https://arxiv.org/abs/2507.07957

  11. [12]

    Graph-based agent memory: Taxonomy, techniques, and applications,

    C. Yang, C. Zhou, Y . Xiao, S. Dong, L. Zhuang, Y . Zhang, Z. Wang, Z. Hong, Z. Yuan, Z. Xianget al., “Graph-based agent memory: Taxonomy, techniques, and applications,”arXiv preprint arXiv:2602.05665, 2026. [Online]. Available: https://arxiv.org/abs/2602 .05665

  12. [13]

    From Local to Global: A Graph RAG Approach to Query-Focused Summarization

    D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness, and J. Larson, “From local to global: A graph rag approach to query-focused summarization,” arXiv preprint arXiv:2404.16130, 2024. [Online]. Available: https: //arxiv.org/abs/2404.16130

  13. [14]

    arXiv:2405.14831 [cs.CL]

    B. J. Guti ´errez, Y . Shu, Y . Gu, M. Yasunaga, and Y . Su, “Hipporag: Neurobiologically inspired long-term memory for large language models,” inAdvances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2405.14831

  14. [15]

    Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

    P. Chhikara, D. Khant, S. Aryan, T. Singh, and D. Yadav, “Mem0: Building production-ready ai agents with scalable long-term memory,”arXiv preprint arXiv:2504.19413, 2025. [Online]. Available: https://arxiv.org/abs/2504.19413

  15. [16]

    Graphiti: Build real-time knowledge graphs for ai agents,

    Zep, “Graphiti: Build real-time knowledge graphs for ai agents,” https: //github.com/getzep/graphiti, 2025, accessed: 2026-05-07

  16. [17]

    Aws and mem0 partner to bring persistent memory to next-gen ai agents with strands,

    Mem0, “Aws and mem0 partner to bring persistent memory to next-gen ai agents with strands,” https://mem0.ai/blog/aws-and-mem0-partner-t o-bring-persistent-memory-to-next-gen-ai-agents-with-strands, May 2025, accessed: 2026-05-07

  17. [18]

    Build persistent memory for agentic ai appli- cations with mem0 open source, amazon elasticache for valkey, and amazon neptune analytics,

    Amazon Web Services, “Build persistent memory for agentic ai appli- cations with mem0 open source, amazon elasticache for valkey, and amazon neptune analytics,” https://aws.amazon.com/blogs/database/bu ild-persistent-memory-for-agentic-ai-applications-with-mem0-open-s ource-amazon-elasticache-for-valkey-and-amazon-neptune-analytics/, Nov. 2025, accessed: ...

  18. [19]

    Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases

    Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,” in Advances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2407.12784

  19. [20]

    Yann Dubois, Balázs Galambosi, Percy Liang, and Tat- sunori B Hashimoto

    S. Dong, S. Xu, P. He, Y . Li, J. Tang, T. Liu, H. Liu, and Z. Xiang, “Memory injection attacks on llm agents via query- only interaction,”arXiv preprint arXiv:2503.03704, 2025. [Online]. Available: https://arxiv.org/abs/2503.03704

  20. [21]

    Er-mia: Black-box adversarial memory injection attacks on long-term memory-augmented large language models,

    M. Piehl, Z. Xi, Z. Xiong, P. He, and M. Ye, “Er-mia: Black-box adversarial memory injection attacks on long-term memory-augmented large language models,”arXiv preprint arXiv:2602.15344, 2026. [Online]. Available: https://arxiv.org/abs/2602.15344

  21. [22]

    arXiv preprint arXiv:2512.16962 , year =

    S. S. Srivastava and H. He, “Memorygraft: Persistent compromise of llm agents via poisoned experience retrieval,”arXiv preprint arXiv:2512.16962, 2025. [Online]. Available: https://arxiv.org/abs/2512 .16962

  22. [23]

    Zombie agents: Persistent control of self-evolving llm agents via self-reinforcing injections,

    X. Yang, Y . He, S. Ji, B. Hooi, and J. S. Dong, “Zombie agents: Persistent control of self-evolving llm agents via self-reinforcing injections,”arXiv preprint arXiv:2602.15654, 2026. [Online]. Available: https://arxiv.org/abs/2602.15654

  23. [24]

    Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,

    W. Zou, R. Geng, B. Wang, and J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 3827–3844. [Online]. Available: https://arxiv.org/abs/2402.0 7867

  24. [25]

    Gasliteing the retrieval: Exploring vulner- abilities in dense embedding-based search,

    M. Ben-Tov and M. Sharif, “Gasliteing the retrieval: Exploring vulner- abilities in dense embedding-based search,” inProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, 2025, pp. 4364–4378. 14

  25. [26]

    Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,

    J. Xue, M. Zheng, Y . Hu, F. Liu, X. Chen, and Q. Lou, “Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,”arXiv preprint arXiv:2406.00083, 2024. [Online]. Available: https://arxiv.org/abs/2406.00083

  26. [27]

    Phantom: General trigger attacks on retrieval augmented language generation,

    H. Chaudhari, G. Severi, J. Abascal, M. Jagielski, C. A. Choquette- Choo, M. Nasr, C. Nita-Rotaru, and A. Oprea, “Phantom: General trigger attacks on retrieval augmented language generation,” 2024

  27. [28]

    Graphrag under fire.arXiv preprint arXiv:2501.14050,

    J. Liang, Y . Wang, C. Li, R. Zhu, T. Jiang, N. Gong, and T. Wang, “Graphrag under fire,”arXiv preprint arXiv:2501.14050, 2025. [Online]. Available: https://arxiv.org/abs/2501.14050

  28. [29]

    Data Poisoning Attack against Knowledge Graph Embedding

    H. Zhang, T. Zheng, J. Gao, C. Miao, L. Su, Y . Li, and K. Ren, “Data poisoning attack against knowledge graph embedding,” arXiv preprint arXiv:1904.12052, 2019. [Online]. Available: https: //arxiv.org/abs/1904.12052

  29. [30]

    Poisoning knowledge graph embeddings via relation inference patterns,

    P. Bhardwaj, J. D. Kelleher, L. Costabello, and D. O’Sullivan, “Poisoning knowledge graph embeddings via relation inference patterns,” inProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, pp. 1875–1888. [Online]. Available: https://aclant...

  30. [31]

    Adversarial attack and defense on graph data: A survey,

    L. Sun, Y . Dou, C. Yang, K. Zhang, J. Wang, P. S. Yu, L. He, and B. Li, “Adversarial attack and defense on graph data: A survey,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 8, pp. 7693–7711, 2022

  31. [32]

    Adversarial attacks on graph neural networks via node injections: A hierarchical reinforcement learning approach,

    Y . Sun, S. Wang, X. Tang, T.-Y . Hsieh, and V . Honavar, “Adversarial attacks on graph neural networks via node injections: A hierarchical reinforcement learning approach,” inProceedings of The Web Conference 2020, 2020, pp. 673–683. [Online]. Available: https: //dl.acm.org/doi/10.1145/3366423.3380149

  32. [33]

    Backdoor attacks to graph neural networks,

    Z. Zhang, J. Jia, B. Wang, and N. Z. Gong, “Backdoor attacks to graph neural networks,” inProceedings of the 26th ACM Symposium on Access Control Models and Technologies, 2021, pp. 15–26. [Online]. Available: https://dl.acm.org/doi/10.1145/3450569.3463560

  33. [34]

    Pubmedqa: A dataset for biomedical research question answering,

    Q. Jin, B. Dhingra, Z. Liu, W. W. Cohen, and X. Lu, “Pubmedqa: A dataset for biomedical research question answering,” inProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019, pp. 2567–2577. [Online]. Available: https://aclanthology.org/D19-1259/

  34. [35]

    Webshop: Towards scalable real-world web interaction with grounded language agents.arXiv preprint arXiv:2207.01206, 2022

    S. Yao, H. Chen, J. Yang, and K. Narasimhan, “Webshop: Towards scalable real-world web interaction with grounded language agents,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 20 744–20 757. [Online]. Available: https://arxiv.org/abs/2207.01206

  35. [36]

    Identifying the Risks of LM Agents with an LM-Emulated Sandbox

    Y . Ruan, H. Dong, A. Wang, S. Pitis, Y . Zhou, J. Ba, Y . Dubois, C. J. Maddison, and T. Hashimoto, “Identifying the risks of lm agents with an lm-emulated sandbox,”arXiv preprint arXiv:2309.15817, 2023. [Online]. Available: https://arxiv.org/abs/2309.15817

  36. [37]

    Lightagent: Production-level open-source agentic ai framework,

    W. Cai, T. Zhu, J. Niu, R. Hu, L. Li, T. Wang, X. Dai, W. Shen, and L. Zhang, “Lightagent: Production-level open-source agentic ai framework,”arXiv preprint arXiv:2509.09292, 2025

  37. [38]

    Identifying the risks of lm agents with an lm-emulated sandbox,

    Y . Ruan, H. Dong, A. Wang, S. Pitis, Y . Zhou, J. Ba, Y . Dubois, C. J. Maddison, and T. Hashimoto, “Identifying the risks of lm agents with an lm-emulated sandbox,” inThe Twelfth International Conference on Learning Representations, 2024

  38. [39]

    GPT-4o System Card,

    OpenAI, “GPT-4o System Card,” https://openai.com/index/gpt-4o-syste m-card/, Aug. 2024, accessed: 2026-05-07

  39. [40]

    GPT-5.5 Model,

    ——, “GPT-5.5 Model,” https://developers.openai.com/api/docs/model s/gpt-5.5, 2026, accessed: 2026-05-07

  40. [41]

    Claude Sonnet 4.6,

    Anthropic, “Claude Sonnet 4.6,” https://www.anthropic.com/claude/son net, 2026, accessed: 2026-05-07

  41. [42]

    DeepSeek-V4-Pro,

    DeepSeek-AI, “DeepSeek-V4-Pro,” https://huggingface.co/deepseek-ai/ DeepSeek-V4-Pro, 2026, accessed: 2026-05-07

  42. [43]

    Gemini 3.1 Pro Preview,

    Google, “Gemini 3.1 Pro Preview,” https://ai.google.dev/gemini-api/d ocs/models/gemini-3.1-pro-preview, 2026, accessed: 2026-05-07

  43. [44]

    Defending Against Indirect Prompt Injection Attacks With Spotlighting

    K. Hines, G. Lopez, M. Hall, F. Zarfati, Y . Zunger, and E. Kiciman, “Defending against indirect prompt injection attacks with spotlighting,” arXiv preprint arXiv:2403.14720, 2024. [Online]. Available: https: //arxiv.org/abs/2403.14720

  44. [45]

    Struq: Defending against prompt injection with structured queries,

    S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “Struq: Defending against prompt injection with structured queries,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 2383–2400. [Online]. Available: https://www.usenix.org/conference/usenixsecurity 25/presentation/chen-sizhe

  45. [46]

    The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

    E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel, “The instruction hierarchy: Training llms to prioritize privileged instructions,”arXiv preprint arXiv:2404.13208, 2024. [Online]. Available: https://arxiv.org/abs/2404.13208

  46. [47]

    The task shield: Enforcing task alignment to defend against indirect prompt injection in llm agents,

    F. Jia, T. Wu, X. Qin, and A. Squicciarini, “The task shield: Enforcing task alignment to defend against indirect prompt injection in llm agents,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 29 680– 29 697

  47. [49]

    Available: https://arxiv.org/abs/2410.21492

    [Online]. Available: https://arxiv.org/abs/2410.21492

  48. [50]

    Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei

    H. Qian, P. Zhang, Z. Liu, K. Mao, and Z. Dou, “Memorag: Moving towards next-gen rag via memory-inspired knowledge discovery,”arXiv preprint arXiv:2409.05591, 2024. [Online]. Available: https://arxiv.org/abs/2409.05591

  49. [51]

    G-retriever: Retrieval-augmented generation for textual graph understanding and question answering,

    X. He, Y . Tian, Y . Sun, N. V . Chawla, T. Laurent, Y . LeCun, X. Bresson, and B. Hooi, “G-retriever: Retrieval-augmented generation for textual graph understanding and question answering,” inAdvances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2402.07630

  50. [52]

    Raptor: Recursive abstractive processing for tree-organized retrieval,

    P. Sarthi, S. Abdullah, A. Tuli, S. Khanna, A. Goldie, and C. D. Manning, “Raptor: Recursive abstractive processing for tree-organized retrieval,” inInternational Conference on Learning Representations,

  51. [53]

    RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

    [Online]. Available: https://arxiv.org/abs/2401.18059

  52. [54]

    arXiv preprint arXiv:2411.14110 , year=

    C. Jiang, X. Pan, G. Hong, C. Bao, and M. Yang, “Rag-thief: Scalable extraction of private data from retrieval-augmented generation applica- tions with agent-based attacks,”arXiv preprint arXiv:2411.14110, vol. 4, 2024

  53. [55]

    Memory poisoning attack and defense on memory based llm-agents,

    B. D. Sunil, I. Sinha, P. Maheshwari, S. Todmal, S. Mallik, and S. Mishra, “Memory poisoning attack and defense on memory based llm-agents,”arXiv preprint arXiv:2601.05504, 2026. [Online]. Available: https://arxiv.org/abs/2601.05504

  54. [56]

    Certifiably robust rag against retrieval corrup- tion.arXiv preprint arXiv:2405.15556,

    C. Xiang, T. Wu, Z. Zhong, D. Wagner, D. Chen, and P. Mittal, “Certifiably robust rag against retrieval corruption,”arXiv preprint arXiv:2405.15556, 2024. [Online]. Available: https://arxiv.org/abs/2405 .15556

  55. [57]

    Ragchecker: A fine-grained framework for diagnosing retrieval-augmented generation,

    D. Ru, L. Qiu, X. Hu, T. Zhang, P. Shi, S. Chang, C. Jiayang, C. Wang, S. Sun, H. Liet al., “Ragchecker: A fine-grained framework for diagnosing retrieval-augmented generation,” inAdvances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2408.08067

  56. [58]

    Meta secalign: A secure foundation llm against prompt injection attacks, 2026

    S. Chen, A. Zharmagambetov, D. Wagner, and C. Guo, “Meta secalign: A secure foundation llm against prompt injection attacks,” arXiv preprint arXiv:2507.02735, 2025. [Online]. Available: https: //arxiv.org/abs/2507.02735

  57. [59]

    Superlocalmemory: Privacy-preserving multi-agent memory with bayesian trust defense against memory poisoning,

    V . P. Bhardwaj, “Superlocalmemory: Privacy-preserving multi-agent memory with bayesian trust defense against memory poisoning,” arXiv preprint arXiv:2603.02240, 2026. [Online]. Available: https: //arxiv.org/abs/2603.02240 APPENDIXA BASELINEADAPTATIONDETAILS The baselines are adapted to the same ordinary-interaction threat model as SHADOWMERGE. The origin...