pith. machine review for the scientific record. sign in

arxiv: 2605.09033 · v2 · submitted 2026-05-09 · 💻 cs.CR · cs.AI

Recognition: unknown

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

Authors on Pith no claims yet

Pith reviewed 2026-05-15 06:06 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords poisoning attackgraph memoryLLM agentsrelation channel conflictsagent memory poisoningMem0attack success rate
0
0 comments X

The pith

ShadowMerge poisons graph-based agent memory by injecting relations that share anchors and channels but carry conflicting values.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ShadowMerge as a poisoning attack on graph-based memory systems used by LLM agents. It exploits relation-channel conflicts, where a malicious relation matches the anchor and channel of legitimate evidence but provides a different value. Through the AIR pipeline, this conflict is turned into an ordinary interaction that the memory system extracts, merges, and retrieves for the victim query. Tests on Mem0 and datasets such as PubMedQA, WebShop, and ToolEmu demonstrate a 93.8% average attack success rate, a 50.3 point improvement over the best baseline, with minimal disruption to benign tasks. The approach overcomes key limitations of prior attacks on flat text records when applied to graphs.

Core claim

ShadowMerge is a poisoning attack against graph-based agent memory that exploits relation-channel conflicts. Its key insight is that a poisoned relation can share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. To realize this, the AIR pipeline converts the conflict into an ordinary interaction that can be extracted, merged, and retrieved by the graph-memory system. Evaluations show it achieves 93.8% average attack success rate on real-world datasets while having negligible impact on unrelated benign tasks.

What carries the argument

The relation-channel conflict realized via the AIR pipeline, which allows a poisoned relation to share the query-activated anchor and canonicalized relation channel with benign evidence but deliver a conflicting value, enabling extraction, merging into the target neighborhood, and retrieval.

Load-bearing premise

A poisoned relation sharing the same query-activated anchor and canonicalized relation channel as benign evidence will be extracted, merged into the target neighborhood, and retrieved for the victim query by the graph-memory system.

What would settle it

Observe whether injecting a poisoned relation with matching anchor and channel results in its retrieval for the victim query and successful influence on agent behavior; if retrieval or influence fails consistently, the attack would not succeed.

Figures

Figures reproduced from arXiv: 2605.09033 by Lingyun Peng, Shuyu Li, Tiantian Ji, Xinran Liu, Yang Luo, Yong Liu, Zifeng Kang.

Figure 1
Figure 1. Figure 1: Conventional flat memory versus graph-based agent memory. Flat memory appends independent chunks and retrieves them by similarity. Graph-based [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: Conventional flat memory versus graph-based agent memory. Flat memory appends independent chunks and retrieves them by similarity. Graph-based [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A motivating example: why text-only poisoning is unreliable in graph [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: A motivating example for graph-native memory poisoning. A direct [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: SHADOWMERGE workflow. The attacker first fixes (q ∗, y+, y−) under the threat model, using public knowledge for y + when needed. Anchor selects a high-reach entity from q ∗, Inscribe creates a channel-aligned conflicting relation π−, and Render produces a natural-language payload P ∗. After an ordinary interaction writes P ∗ into the shared memory graph, later victim queries can retrieve both benign eviden… view at source ↗
Figure 4
Figure 4. Figure 4: [RQ2] Graph-evidence construction across task suites. Segment width [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 4
Figure 4. Figure 4: [RQ2] Graph-evidence construction across task suites. Segment width [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: [RQ2] CDF of the best poisoned-evidence rank in the target-query [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
read the original abstract

Graph-based agent memory is increasingly used in LLM agents to support structured long-term recall and multi-hop reasoning, but it also creates a new poisoning surface: an attacker can inject a crafted relation into graph memory so that it is later retrieved and influences agent behavior. Existing agent-memory poisoning attacks mainly target flat textual records and are ineffective in graph-based memory because malicious relations often fail to be extracted, merged into the target anchor neighborhood, or retrieved for the victim query. We present SHADOWMERGE, a poisoning attack against graph-based agent memory that exploits relation-channel conflicts. Its key insight is that a poisoned relation can share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. To realize this, we design AIR, a pipeline that converts the conflict into an ordinary interaction that can be extracted, merged, and retrieved by the graph-memory system. We evaluate SHADOWMERGE on Mem0 and three public real-world datasets: PubMedQA, WebShop, and ToolEmu. SHADOWMERGE achieves 93.8% average attack success rate, improving the best baseline by 50.3 absolute points, while having negligible impact on unrelated benign tasks. Mechanism studies show that SHADOWMERGE overcomes the three key limitations of existing agent-memory poisoning attacks, and defense analysis shows that representative input-side defenses are insufficient to mitigate it. We have responsibly disclosed our findings to affected graph-memory vendors and open sourced SHADOWMERGE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces SHADOWMERGE, a poisoning attack on graph-based agent memory systems that exploits relation-channel conflicts: a poisoned relation is crafted to share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. This is realized via the AIR pipeline that converts the conflict into an extractable, mergeable interaction. The authors evaluate on Mem0 using PubMedQA, WebShop, and ToolEmu, reporting 93.8% average attack success rate (50.3 points above the best baseline) with negligible impact on benign tasks, plus mechanism studies and defense analysis; the work is open-sourced after responsible disclosure.

Significance. If the central mechanism holds, the result is significant because it demonstrates a previously unexploited poisoning vector in structured graph memories that defeats prior flat-text attacks by surviving extraction, merge, and retrieval. The quantitative improvement, mechanism ablation, and open-sourcing of code provide concrete, reproducible evidence that could guide both attack research and the design of conflict-aware merge policies in production agent-memory systems.

major comments (2)
  1. [Abstract / mechanism studies] Abstract and mechanism-studies section: the headline claim that SHADOWMERGE overcomes the three key limitations of prior attacks rests on the assumption that a conflicting-value relation sharing anchor+channel will be extracted, merged into the target neighborhood, and retrieved without rejection by canonicalization or value-consistency logic. No formal characterization or pseudocode of the canonicalization function or merge policy is supplied, leaving steps (3) and (4) of the attack pipeline unverified.
  2. [Evaluation] Evaluation section: the reported 93.8% ASR and 50.3-point improvement are presented without the full experimental protocol, data-exclusion rules, exact Mem0 configuration parameters, or release of the evaluation harness, so the quantitative support for the central claim cannot be independently reproduced from the manuscript alone.
minor comments (2)
  1. [§3] Notation for the AIR pipeline stages is introduced without an accompanying diagram or pseudocode listing, making the conversion of conflict into ordinary interaction harder to follow.
  2. [Evaluation tables] Table captions for the ASR results should explicitly state the number of trials and any statistical significance tests performed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will strengthen the manuscript with additional formal details and experimental specifications to improve clarity and reproducibility.

read point-by-point responses
  1. Referee: [Abstract / mechanism studies] Abstract and mechanism-studies section: the headline claim that SHADOWMERGE overcomes the three key limitations of prior attacks rests on the assumption that a conflicting-value relation sharing anchor+channel will be extracted, merged into the target neighborhood, and retrieved without rejection by canonicalization or value-consistency logic. No formal characterization or pseudocode of the canonicalization function or merge policy is supplied, leaving steps (3) and (4) of the attack pipeline unverified.

    Authors: We appreciate this observation. Section 3.2 of the manuscript describes the AIR pipeline and the relation-channel conflict design, explaining how the poisoned relation is constructed to share the query-activated anchor and canonicalized channel so that it is treated as a standard extractable interaction. To directly address the request for verification, we will add formal pseudocode for the canonicalization function and merge policy (including value-consistency checks) to the mechanism studies section in the revision, along with a precise characterization of the conflict condition that ensures the poisoned value is merged without rejection. revision: yes

  2. Referee: [Evaluation] Evaluation section: the reported 93.8% ASR and 50.3-point improvement are presented without the full experimental protocol, data-exclusion rules, exact Mem0 configuration parameters, or release of the evaluation harness, so the quantitative support for the central claim cannot be independently reproduced from the manuscript alone.

    Authors: The full evaluation harness, including code, exact Mem0 configurations, data splits, and processing scripts, has been released in the open-source repository following responsible disclosure. To make the manuscript self-contained, we will expand the evaluation section and add a dedicated appendix detailing the complete experimental protocol, data-exclusion rules, Mem0 parameter settings, and reproduction steps. This will allow independent verification directly from the revised paper while retaining the link to the public artifacts. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation or claims

full rationale

The paper is an empirical attack evaluation on public datasets (PubMedQA, WebShop, ToolEmu) and the Mem0 memory backend. Attack success rates are measured directly via experiments on open-sourced code rather than derived from any fitted parameters, self-referential definitions, or load-bearing self-citations. No equations, uniqueness theorems, or first-principles derivations are presented that reduce to inputs by construction; the central 93.8% ASR claim is an observed experimental outcome, not a renamed or fitted prediction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

The paper introduces the ShadowMerge attack and AIR pipeline as new constructs to realize the conflict exploitation; these are the primary additions beyond prior literature on agent memory poisoning.

invented entities (2)
  • ShadowMerge attack no independent evidence
    purpose: To exploit relation-channel conflicts for poisoning graph memory
    Newly proposed technique that converts conflicts into extractable interactions.
  • AIR pipeline no independent evidence
    purpose: To convert the conflict into an ordinary interaction extractable by the graph-memory system
    Introduced as the realization mechanism for the attack.

pith-pipeline@v0.9.0 · 5585 in / 1179 out tokens · 37206 ms · 2026-05-15T06:06:01.446789+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 13 internal anchors

  1. [1]

    ReAct: Synergizing Reasoning and Acting in Language Models

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” in International Conference on Learning Representations, 2023. [Online]. Available: https://arxiv.org/abs/2210.03629

  2. [2]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” inAdvances in Neural Information Processing Systems, 2023. [Online]. Available: https://arxiv.org/abs/2302.04761

  3. [3]

    Reflexion: Language Agents with Verbal Reinforcement Learning

    N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning,” in Advances in Neural Information Processing Systems, vol. 36, 2023, pp. 8634–8652. [Online]. Available: https://arxiv.org/abs/2303.11366

  4. [4]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    G. Wang, Y . Xie, Y . Jiang, A. Mandlekar, C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar, “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023. [Online]. Available: https://arxiv.org/abs/2305.16291

  5. [5]

    ISBN 9798400701320

    J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023, pp. 1–22. [Online]. Available: https://dl.acm.org/doi/10.1145/3586183.3606763

  6. [6]

    MemGPT: Towards LLMs as Operating Systems

    C. Packer, V . Fang, S. G. Patil, K. Lin, S. Wooders, and J. E. Gonzalez, “Memgpt: Towards llms as operating systems,” arXiv preprint arXiv:2310.08560, 2023. [Online]. Available: https: //arxiv.org/abs/2310.08560

  7. [7]

    Memorybank: Enhancing large language models with long-term memory

    W. Zhong, L. Guo, Q. Gao, H. Ye, and Y . Wang, “Memorybank: Enhancing large language models with long-term memory,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, 2024, pp. 19 724–19 731. [Online]. Available: https://arxiv.org/abs/2305.10250

  8. [8]

    Augmenting language models with long-term memory

    W. Wang, L. Dong, H. Cheng, X. Liu, X. Yan, J. Gao, and F. Wei, “Augmenting language models with long-term memory,” in Advances in Neural Information Processing Systems, vol. 36, 2023, pp. 74 530–74 543. [Online]. Available: https://arxiv.org/abs/2306.07174

  9. [10]

    A-MEM: Agentic Memory for LLM Agents

    [Online]. Available: https://arxiv.org/abs/2502.12110

  10. [11]

    MIRIX: Multi-Agent Memory System for LLM-Based Agents

    Y . Wang and X. Chen, “Mirix: Multi-agent memory system for llm-based agents,”arXiv preprint arXiv:2507.07957, 2025. [Online]. Available: https://arxiv.org/abs/2507.07957

  11. [12]

    Graph-based agent memory: Taxonomy, techniques, and applications,

    C. Yang, C. Zhou, Y . Xiao, S. Dong, L. Zhuang, Y . Zhang, Z. Wang, Z. Hong, Z. Yuan, Z. Xianget al., “Graph-based agent memory: Taxonomy, techniques, and applications,”arXiv preprint arXiv:2602.05665, 2026. [Online]. Available: https://arxiv.org/abs/2602 .05665

  12. [13]

    From Local to Global: A Graph RAG Approach to Query-Focused Summarization

    D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness, and J. Larson, “From local to global: A graph rag approach to query-focused summarization,” arXiv preprint arXiv:2404.16130, 2024. [Online]. Available: https: //arxiv.org/abs/2404.16130

  13. [14]

    Hipporag: Neurobiologically inspired long-term memory for large language models,

    B. J. Guti ´errez, Y . Shu, Y . Gu, M. Yasunaga, and Y . Su, “Hipporag: Neurobiologically inspired long-term memory for large language models,” inAdvances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2405.14831

  14. [15]

    Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

    P. Chhikara, D. Khant, S. Aryan, T. Singh, and D. Yadav, “Mem0: Building production-ready ai agents with scalable long-term memory,”arXiv preprint arXiv:2504.19413, 2025. [Online]. Available: https://arxiv.org/abs/2504.19413

  15. [16]

    Graphiti: Build real-time knowledge graphs for ai agents,

    Zep, “Graphiti: Build real-time knowledge graphs for ai agents,” https: //github.com/getzep/graphiti, 2025, accessed: 2026-05-07

  16. [17]

    Aws and mem0 partner to bring persistent memory to next-gen ai agents with strands,

    Mem0, “Aws and mem0 partner to bring persistent memory to next-gen ai agents with strands,” https://mem0.ai/blog/aws-and-mem0-partner-t o-bring-persistent-memory-to-next-gen-ai-agents-with-strands, May 2025, accessed: 2026-05-07

  17. [18]

    Build persistent memory for agentic ai appli- cations with mem0 open source, amazon elasticache for valkey, and amazon neptune analytics,

    Amazon Web Services, “Build persistent memory for agentic ai appli- cations with mem0 open source, amazon elasticache for valkey, and amazon neptune analytics,” https://aws.amazon.com/blogs/database/bu ild-persistent-memory-for-agentic-ai-applications-with-mem0-open-s ource-amazon-elasticache-for-valkey-and-amazon-neptune-analytics/, Nov. 2025, accessed: ...

  18. [19]

    Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,

    Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,” in Advances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2407.12784

  19. [20]

    Memory injection attacks on llm agents via query- only interaction,

    S. Dong, S. Xu, P. He, Y . Li, J. Tang, T. Liu, H. Liu, and Z. Xiang, “Memory injection attacks on llm agents via query- only interaction,”arXiv preprint arXiv:2503.03704, 2025. [Online]. Available: https://arxiv.org/abs/2503.03704

  20. [21]

    Er-mia: Black-box adversarial memory injection attacks on long-term memory-augmented large language models,

    M. Piehl, Z. Xi, Z. Xiong, P. He, and M. Ye, “Er-mia: Black-box adversarial memory injection attacks on long-term memory-augmented large language models,”arXiv preprint arXiv:2602.15344, 2026. [Online]. Available: https://arxiv.org/abs/2602.15344

  21. [22]

    Memorygraft: Persistent compromise of llm agents via poisoned experience retrieval,

    S. S. Srivastava and H. He, “Memorygraft: Persistent compromise of llm agents via poisoned experience retrieval,”arXiv preprint arXiv:2512.16962, 2025. [Online]. Available: https://arxiv.org/abs/2512 .16962

  22. [23]

    Zombie agents: Persistent control of self-evolving llm agents via self-reinforcing injections,

    X. Yang, Y . He, S. Ji, B. Hooi, and J. S. Dong, “Zombie agents: Persistent control of self-evolving llm agents via self-reinforcing injections,”arXiv preprint arXiv:2602.15654, 2026. [Online]. Available: https://arxiv.org/abs/2602.15654

  23. [24]

    Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,

    W. Zou, R. Geng, B. Wang, and J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 3827–3844. [Online]. Available: https://arxiv.org/abs/2402.0 7867

  24. [25]

    Gasliteing the retrieval: Exploring vulnerabilities in dense embedding-based search,

    M. Ben-Tov and M. Sharif, “Gasliteing the retrieval: Exploring vulnerabilities in dense embedding-based search,” inProceedings of the ACM SIGSAC Conference on Computer and Communications Security, 14 2025, pp. 4364–4378. [Online]. Available: https://dl.acm.org/doi/10.11 45/3719027.3765090

  25. [26]

    Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,

    J. Xue, M. Zheng, Y . Hu, F. Liu, X. Chen, and Q. Lou, “Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,”arXiv preprint arXiv:2406.00083, 2024. [Online]. Available: https://arxiv.org/abs/2406.00083

  26. [27]

    Phantom: General trigger attacks on retrieval augmented language generation,

    H. Chaudhari, G. Severi, J. Abascal, M. Jagielski, C. A. Choquette- Choo, M. Nasr, C. Nita-Rotaru, and A. Oprea, “Phantom: General trigger attacks on retrieval augmented language generation,”arXiv preprint, 2024. [Online]. Available: https://arxiv.org/

  27. [28]

    Graphrag under fire,

    J. Liang, Y . Wang, C. Li, R. Zhu, T. Jiang, N. Gong, and T. Wang, “Graphrag under fire,”arXiv preprint arXiv:2501.14050, 2025. [Online]. Available: https://arxiv.org/abs/2501.14050

  28. [29]

    Data Poisoning Attack against Knowledge Graph Embedding

    H. Zhang, T. Zheng, J. Gao, C. Miao, L. Su, Y . Li, and K. Ren, “Data poisoning attack against knowledge graph embedding,” arXiv preprint arXiv:1904.12052, 2019. [Online]. Available: https: //arxiv.org/abs/1904.12052

  29. [30]

    Poisoning knowledge graph embeddings via relation inference patterns,

    P. Bhardwaj, J. D. Kelleher, L. Costabello, and D. O’Sullivan, “Poisoning knowledge graph embeddings via relation inference patterns,” inProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, pp. 1875–1888. [Online]. Available: https://aclant...

  30. [31]

    Yu, Lifang He, and Bo Li

    L. Sun, Y . Dou, C. Yang, K. Zhang, J. Wang, P. S. Yu, L. He, and B. Li, “Adversarial attack and defense on graph data: A survey,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 8, pp. 7693–7711, 2022. [Online]. Available: https://doi.org/10.1109/TKDE.2022.3201246

  31. [32]

    Adversarial attacks on graph neural networks via node injections: A hierarchical reinforcement learning approach,

    Y . Sun, S. Wang, X. Tang, T.-Y . Hsieh, and V . Honavar, “Adversarial attacks on graph neural networks via node injections: A hierarchical reinforcement learning approach,” inProceedings of The Web Conference 2020, 2020, pp. 673–683. [Online]. Available: https: //dl.acm.org/doi/10.1145/3366423.3380149

  32. [33]

    Backdoor attacks to graph neural networks

    Z. Zhang, J. Jia, B. Wang, and N. Z. Gong, “Backdoor attacks to graph neural networks,” inProceedings of the 26th ACM Symposium on Access Control Models and Technologies, 2021, pp. 15–26. [Online]. Available: https://dl.acm.org/doi/10.1145/3450569.3463560

  33. [34]

    Pubmedqa: A dataset for biomedical research question answering,

    Q. Jin, B. Dhingra, Z. Liu, W. W. Cohen, and X. Lu, “Pubmedqa: A dataset for biomedical research question answering,” inProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019, pp. 2567–2577. [Online]. Available: https://aclanthology.org/D19-1259/

  34. [35]

    Webshop: Towards scalable real-world web interaction with grounded language agents,

    S. Yao, H. Chen, J. Yang, and K. Narasimhan, “Webshop: Towards scalable real-world web interaction with grounded language agents,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 20 744–20 757. [Online]. Available: https://arxiv.org/abs/2207.01206

  35. [36]

    Identifying the Risks of LM Agents with an LM-Emulated Sandbox

    Y . Ruan, H. Dong, A. Wang, S. Pitis, Y . Zhou, J. Ba, Y . Dubois, C. J. Maddison, and T. Hashimoto, “Identifying the risks of lm agents with an lm-emulated sandbox,”arXiv preprint arXiv:2309.15817, 2023. [Online]. Available: https://arxiv.org/abs/2309.15817

  36. [37]

    Lightagent: Production-level open-source agentic ai framework,

    W. Cai, T. Zhu, J. Niu, R. Hu, L. Li, T. Wang, X. Dai, W. Shen, and L. Zhang, “Lightagent: Production-level open-source agentic ai framework,”arXiv preprint arXiv:2509.09292, 2025

  37. [38]

    Identifying the risks of lm agents with an lm-emulated sandbox,

    Y . Ruan, H. Dong, A. Wang, S. Pitis, Y . Zhou, J. Ba, Y . Dubois, C. J. Maddison, and T. Hashimoto, “Identifying the risks of lm agents with an lm-emulated sandbox,” inThe Twelfth International Conference on Learning Representations, 2024

  38. [39]

    GPT-4o System Card,

    OpenAI, “GPT-4o System Card,” https://openai.com/index/gpt-4o-syste m-card/, Aug. 2024, accessed: 2026-05-07

  39. [40]

    GPT-5.5 Model,

    ——, “GPT-5.5 Model,” https://developers.openai.com/api/docs/model s/gpt-5.5, 2026, accessed: 2026-05-07

  40. [41]

    Claude Sonnet 4.6,

    Anthropic, “Claude Sonnet 4.6,” https://www.anthropic.com/claude/son net, 2026, accessed: 2026-05-07

  41. [42]

    DeepSeek-V4-Pro,

    DeepSeek-AI, “DeepSeek-V4-Pro,” https://huggingface.co/deepseek-ai/ DeepSeek-V4-Pro, 2026, accessed: 2026-05-07

  42. [43]

    Gemini 3.1 Pro Preview,

    Google, “Gemini 3.1 Pro Preview,” https://ai.google.dev/gemini-api/d ocs/models/gemini-3.1-pro-preview, 2026, accessed: 2026-05-07

  43. [44]

    Defending Against Indirect Prompt Injection Attacks With Spotlighting

    K. Hines, G. Lopez, M. Hall, F. Zarfati, Y . Zunger, and E. Kiciman, “Defending against indirect prompt injection attacks with spotlighting,” arXiv preprint arXiv:2403.14720, 2024. [Online]. Available: https: //arxiv.org/abs/2403.14720

  44. [45]

    Struq: Defending against prompt injection with structured queries,

    S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “Struq: Defending against prompt injection with structured queries,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 2383–2400. [Online]. Available: https://www.usenix.org/conference/usenixsecurity 25/presentation/chen-sizhe

  45. [46]

    The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

    E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel, “The instruction hierarchy: Training llms to prioritize privileged instructions,”arXiv preprint arXiv:2404.13208, 2024. [Online]. Available: https://arxiv.org/abs/2404.13208

  46. [47]

    The task shield: Enforcing task alignment to defend against indirect prompt injection in llm agents,

    F. Jia, T. Wu, X. Qin, and A. Squicciarini, “The task shield: Enforcing task alignment to defend against indirect prompt injection in llm agents,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025, pp. 29 680–29 697. [Online]. Available: https://aclanthology.org/2025.acl-long.1452/

  47. [49]

    Available: https://arxiv.org/abs/2410.21492

    [Online]. Available: https://arxiv.org/abs/2410.21492

  48. [50]

    Memorag: Moving towards next-gen rag via memory-inspired knowledge discovery,

    H. Qian, P. Zhang, Z. Liu, K. Mao, and Z. Dou, “Memorag: Moving towards next-gen rag via memory-inspired knowledge discovery,”arXiv preprint arXiv:2409.05591, 2024. [Online]. Available: https://arxiv.org/abs/2409.05591

  49. [51]

    G-retriever: Retrieval-augmented generation for textual graph understanding and question answering,

    X. He, Y . Tian, Y . Sun, N. V . Chawla, T. Laurent, Y . LeCun, X. Bresson, and B. Hooi, “G-retriever: Retrieval-augmented generation for textual graph understanding and question answering,” inAdvances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2402.07630

  50. [52]

    Raptor: Recursive abstractive processing for tree-organized retrieval,

    P. Sarthi, S. Abdullah, A. Tuli, S. Khanna, A. Goldie, and C. D. Manning, “Raptor: Recursive abstractive processing for tree-organized retrieval,” inInternational Conference on Learning Representations,

  51. [53]

    Available: https://arxiv.org/abs/2401.18059

    [Online]. Available: https://arxiv.org/abs/2401.18059

  52. [55]

    Available: https://arxiv.org/abs/2411.14110

    [Online]. Available: https://arxiv.org/abs/2411.14110

  53. [56]

    Memory poisoning attack and defense on memory based llm-agents,

    B. D. Sunil, I. Sinha, P. Maheshwari, S. Todmal, S. Mallik, and S. Mishra, “Memory poisoning attack and defense on memory based llm-agents,”arXiv preprint arXiv:2601.05504, 2026. [Online]. Available: https://arxiv.org/abs/2601.05504

  54. [57]

    Certifiably robust rag against retrieval corruption,

    C. Xiang, T. Wu, Z. Zhong, D. Wagner, D. Chen, and P. Mittal, “Certifiably robust rag against retrieval corruption,”arXiv preprint arXiv:2405.15556, 2024. [Online]. Available: https://arxiv.org/abs/2405 .15556

  55. [58]

    Ragchecker: A fine-grained framework for diagnosing retrieval-augmented generation,

    D. Ru, L. Qiu, X. Hu, T. Zhang, P. Shi, S. Chang, C. Jiayang, C. Wang, S. Sun, H. Liet al., “Ragchecker: A fine-grained framework for diagnosing retrieval-augmented generation,” inAdvances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2408.08067

  56. [59]

    Meta secalign: A secure foundation llm against prompt injection attacks,

    S. Chen, A. Zharmagambetov, D. Wagner, and C. Guo, “Meta secalign: A secure foundation llm against prompt injection attacks,” arXiv preprint arXiv:2507.02735, 2025. [Online]. Available: https: //arxiv.org/abs/2507.02735

  57. [60]

    Superlocalmemory: Privacy-preserving multi-agent memory with bayesian trust defense against memory poisoning,

    V . P. Bhardwaj, “Superlocalmemory: Privacy-preserving multi-agent memory with bayesian trust defense against memory poisoning,” arXiv preprint arXiv:2603.02240, 2026. [Online]. Available: https: //arxiv.org/abs/2603.02240 APPENDIXA BASELINEADAPTATIONDETAILS The baselines are adapted to the same ordinary-interaction threat model as SHADOWMERGE. The origin...