arxiv: 2605.09033 · v2 · submitted 2026-05-09 · 💻 cs.CR · cs.AI

Recognition: unknown

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

Yang Luo , Zifeng Kang , Tiantian Ji , Xinran Liu , Yong Liu , Shuyu Li , Lingyun Peng

Authors on Pith no claims yet

Pith reviewed 2026-05-15 06:06 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords poisoning attackgraph memoryLLM agentsrelation channel conflictsagent memory poisoningMem0attack success rate

0 comments

The pith

ShadowMerge poisons graph-based agent memory by injecting relations that share anchors and channels but carry conflicting values.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ShadowMerge as a poisoning attack on graph-based memory systems used by LLM agents. It exploits relation-channel conflicts, where a malicious relation matches the anchor and channel of legitimate evidence but provides a different value. Through the AIR pipeline, this conflict is turned into an ordinary interaction that the memory system extracts, merges, and retrieves for the victim query. Tests on Mem0 and datasets such as PubMedQA, WebShop, and ToolEmu demonstrate a 93.8% average attack success rate, a 50.3 point improvement over the best baseline, with minimal disruption to benign tasks. The approach overcomes key limitations of prior attacks on flat text records when applied to graphs.

Core claim

ShadowMerge is a poisoning attack against graph-based agent memory that exploits relation-channel conflicts. Its key insight is that a poisoned relation can share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. To realize this, the AIR pipeline converts the conflict into an ordinary interaction that can be extracted, merged, and retrieved by the graph-memory system. Evaluations show it achieves 93.8% average attack success rate on real-world datasets while having negligible impact on unrelated benign tasks.

What carries the argument

The relation-channel conflict realized via the AIR pipeline, which allows a poisoned relation to share the query-activated anchor and canonicalized relation channel with benign evidence but deliver a conflicting value, enabling extraction, merging into the target neighborhood, and retrieval.

Load-bearing premise

A poisoned relation sharing the same query-activated anchor and canonicalized relation channel as benign evidence will be extracted, merged into the target neighborhood, and retrieved for the victim query by the graph-memory system.

What would settle it

Observe whether injecting a poisoned relation with matching anchor and channel results in its retrieval for the victim query and successful influence on agent behavior; if retrieval or influence fails consistently, the attack would not succeed.

Figures

Figures reproduced from arXiv: 2605.09033 by Lingyun Peng, Shuyu Li, Tiantian Ji, Xinran Liu, Yang Luo, Yong Liu, Zifeng Kang.

**Figure 1.** Figure 1: Conventional flat memory versus graph-based agent memory. Flat memory appends independent chunks and retrieves them by similarity. Graph-based [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: A motivating example: why text-only poisoning is unreliable in graph [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 2.** Figure 2: A motivating example for graph-native memory poisoning. A direct [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: SHADOWMERGE workflow. The attacker first fixes (q ∗, y+, y−) under the threat model, using public knowledge for y + when needed. Anchor selects a high-reach entity from q ∗, Inscribe creates a channel-aligned conflicting relation π−, and Render produces a natural-language payload P ∗. After an ordinary interaction writes P ∗ into the shared memory graph, later victim queries can retrieve both benign eviden… view at source ↗

**Figure 4.** Figure 4: [RQ2] Graph-evidence construction across task suites. Segment width [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 4.** Figure 4: [RQ2] Graph-evidence construction across task suites. Segment width [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: [RQ2] CDF of the best poisoned-evidence rank in the target-query [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

read the original abstract

Graph-based agent memory is increasingly used in LLM agents to support structured long-term recall and multi-hop reasoning, but it also creates a new poisoning surface: an attacker can inject a crafted relation into graph memory so that it is later retrieved and influences agent behavior. Existing agent-memory poisoning attacks mainly target flat textual records and are ineffective in graph-based memory because malicious relations often fail to be extracted, merged into the target anchor neighborhood, or retrieved for the victim query. We present SHADOWMERGE, a poisoning attack against graph-based agent memory that exploits relation-channel conflicts. Its key insight is that a poisoned relation can share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. To realize this, we design AIR, a pipeline that converts the conflict into an ordinary interaction that can be extracted, merged, and retrieved by the graph-memory system. We evaluate SHADOWMERGE on Mem0 and three public real-world datasets: PubMedQA, WebShop, and ToolEmu. SHADOWMERGE achieves 93.8% average attack success rate, improving the best baseline by 50.3 absolute points, while having negligible impact on unrelated benign tasks. Mechanism studies show that SHADOWMERGE overcomes the three key limitations of existing agent-memory poisoning attacks, and defense analysis shows that representative input-side defenses are insufficient to mitigate it. We have responsibly disclosed our findings to affected graph-memory vendors and open sourced SHADOWMERGE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ShadowMerge shows a workable poisoning attack on graph agent memory by forcing conflicting relations to share anchors and channels so they merge and retrieve, with strong reported success rates but thin detail on the actual merge policy.

read the letter

The main point is that this paper shows how to poison graph-based memory in LLM agents by exploiting conflicts in relation channels, so a bad relation gets treated like good data and pulled into the agent's reasoning. The new part is the AIR pipeline that turns the conflict into something the extractor and merger handle as normal. That lets it beat the limitations of earlier attacks aimed at plain text records. The results look decent: 93.8% attack success on average across Mem0 and the three datasets, with a 50-point gain over the best baseline, and it barely touches benign performance. They also checked mechanisms and some defenses. The soft spot is that everything depends on the memory system actually merging the conflicting value without any consistency check or rejection. The description of how canonicalization works is thin, so we don't know if this holds for other graph memory setups or if Mem0 has hidden logic that would block it. More detail on the exact merge policy would help. This paper is for people studying security in agent systems. A reader interested in practical attacks on memory would get value from the evaluation and the insight on why graph structures change the attack surface. It should go to peer review so the experiments can be checked properly.

Referee Report

2 major / 2 minor

Summary. The paper introduces SHADOWMERGE, a poisoning attack on graph-based agent memory systems that exploits relation-channel conflicts: a poisoned relation is crafted to share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. This is realized via the AIR pipeline that converts the conflict into an extractable, mergeable interaction. The authors evaluate on Mem0 using PubMedQA, WebShop, and ToolEmu, reporting 93.8% average attack success rate (50.3 points above the best baseline) with negligible impact on benign tasks, plus mechanism studies and defense analysis; the work is open-sourced after responsible disclosure.

Significance. If the central mechanism holds, the result is significant because it demonstrates a previously unexploited poisoning vector in structured graph memories that defeats prior flat-text attacks by surviving extraction, merge, and retrieval. The quantitative improvement, mechanism ablation, and open-sourcing of code provide concrete, reproducible evidence that could guide both attack research and the design of conflict-aware merge policies in production agent-memory systems.

major comments (2)

[Abstract / mechanism studies] Abstract and mechanism-studies section: the headline claim that SHADOWMERGE overcomes the three key limitations of prior attacks rests on the assumption that a conflicting-value relation sharing anchor+channel will be extracted, merged into the target neighborhood, and retrieved without rejection by canonicalization or value-consistency logic. No formal characterization or pseudocode of the canonicalization function or merge policy is supplied, leaving steps (3) and (4) of the attack pipeline unverified.
[Evaluation] Evaluation section: the reported 93.8% ASR and 50.3-point improvement are presented without the full experimental protocol, data-exclusion rules, exact Mem0 configuration parameters, or release of the evaluation harness, so the quantitative support for the central claim cannot be independently reproduced from the manuscript alone.

minor comments (2)

[§3] Notation for the AIR pipeline stages is introduced without an accompanying diagram or pseudocode listing, making the conversion of conflict into ordinary interaction harder to follow.
[Evaluation tables] Table captions for the ASR results should explicitly state the number of trials and any statistical significance tests performed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will strengthen the manuscript with additional formal details and experimental specifications to improve clarity and reproducibility.

read point-by-point responses

Referee: [Abstract / mechanism studies] Abstract and mechanism-studies section: the headline claim that SHADOWMERGE overcomes the three key limitations of prior attacks rests on the assumption that a conflicting-value relation sharing anchor+channel will be extracted, merged into the target neighborhood, and retrieved without rejection by canonicalization or value-consistency logic. No formal characterization or pseudocode of the canonicalization function or merge policy is supplied, leaving steps (3) and (4) of the attack pipeline unverified.

Authors: We appreciate this observation. Section 3.2 of the manuscript describes the AIR pipeline and the relation-channel conflict design, explaining how the poisoned relation is constructed to share the query-activated anchor and canonicalized channel so that it is treated as a standard extractable interaction. To directly address the request for verification, we will add formal pseudocode for the canonicalization function and merge policy (including value-consistency checks) to the mechanism studies section in the revision, along with a precise characterization of the conflict condition that ensures the poisoned value is merged without rejection. revision: yes
Referee: [Evaluation] Evaluation section: the reported 93.8% ASR and 50.3-point improvement are presented without the full experimental protocol, data-exclusion rules, exact Mem0 configuration parameters, or release of the evaluation harness, so the quantitative support for the central claim cannot be independently reproduced from the manuscript alone.

Authors: The full evaluation harness, including code, exact Mem0 configurations, data splits, and processing scripts, has been released in the open-source repository following responsible disclosure. To make the manuscript self-contained, we will expand the evaluation section and add a dedicated appendix detailing the complete experimental protocol, data-exclusion rules, Mem0 parameter settings, and reproduction steps. This will allow independent verification directly from the revised paper while retaining the link to the public artifacts. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation or claims

full rationale

The paper is an empirical attack evaluation on public datasets (PubMedQA, WebShop, ToolEmu) and the Mem0 memory backend. Attack success rates are measured directly via experiments on open-sourced code rather than derived from any fitted parameters, self-referential definitions, or load-bearing self-citations. No equations, uniqueness theorems, or first-principles derivations are presented that reduce to inputs by construction; the central 93.8% ASR claim is an observed experimental outcome, not a renamed or fitted prediction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

The paper introduces the ShadowMerge attack and AIR pipeline as new constructs to realize the conflict exploitation; these are the primary additions beyond prior literature on agent memory poisoning.

invented entities (2)

ShadowMerge attack no independent evidence
purpose: To exploit relation-channel conflicts for poisoning graph memory
Newly proposed technique that converts conflicts into extractable interactions.
AIR pipeline no independent evidence
purpose: To convert the conflict into an ordinary interaction extractable by the graph-memory system
Introduced as the realization mechanism for the attack.

pith-pipeline@v0.9.0 · 5585 in / 1179 out tokens · 37206 ms · 2026-05-15T06:06:01.446789+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 13 internal anchors

[1]

ReAct: Synergizing Reasoning and Acting in Language Models

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” in International Conference on Learning Representations, 2023. [Online]. Available: https://arxiv.org/abs/2210.03629

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Toolformer: Language Models Can Teach Themselves to Use Tools

T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” inAdvances in Neural Information Processing Systems, 2023. [Online]. Available: https://arxiv.org/abs/2302.04761

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Reflexion: Language Agents with Verbal Reinforcement Learning

N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning,” in Advances in Neural Information Processing Systems, vol. 36, 2023, pp. 8634–8652. [Online]. Available: https://arxiv.org/abs/2303.11366

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Voyager: An Open-Ended Embodied Agent with Large Language Models

G. Wang, Y . Xie, Y . Jiang, A. Mandlekar, C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar, “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023. [Online]. Available: https://arxiv.org/abs/2305.16291

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

ISBN 9798400701320

J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023, pp. 1–22. [Online]. Available: https://dl.acm.org/doi/10.1145/3586183.3606763

work page doi:10.1145/3586183.3606763 2023
[6]

MemGPT: Towards LLMs as Operating Systems

C. Packer, V . Fang, S. G. Patil, K. Lin, S. Wooders, and J. E. Gonzalez, “Memgpt: Towards llms as operating systems,” arXiv preprint arXiv:2310.08560, 2023. [Online]. Available: https: //arxiv.org/abs/2310.08560

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

Memorybank: Enhancing large language models with long-term memory

W. Zhong, L. Guo, Q. Gao, H. Ye, and Y . Wang, “Memorybank: Enhancing large language models with long-term memory,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, 2024, pp. 19 724–19 731. [Online]. Available: https://arxiv.org/abs/2305.10250

work page arXiv 2024
[8]

Augmenting language models with long-term memory

W. Wang, L. Dong, H. Cheng, X. Liu, X. Yan, J. Gao, and F. Wei, “Augmenting language models with long-term memory,” in Advances in Neural Information Processing Systems, vol. 36, 2023, pp. 74 530–74 543. [Online]. Available: https://arxiv.org/abs/2306.07174

work page arXiv 2023
[10]

A-MEM: Agentic Memory for LLM Agents

[Online]. Available: https://arxiv.org/abs/2502.12110

work page internal anchor Pith review Pith/arXiv arXiv
[11]

MIRIX: Multi-Agent Memory System for LLM-Based Agents

Y . Wang and X. Chen, “Mirix: Multi-agent memory system for llm-based agents,”arXiv preprint arXiv:2507.07957, 2025. [Online]. Available: https://arxiv.org/abs/2507.07957

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

Graph-based agent memory: Taxonomy, techniques, and applications,

C. Yang, C. Zhou, Y . Xiao, S. Dong, L. Zhuang, Y . Zhang, Z. Wang, Z. Hong, Z. Yuan, Z. Xianget al., “Graph-based agent memory: Taxonomy, techniques, and applications,”arXiv preprint arXiv:2602.05665, 2026. [Online]. Available: https://arxiv.org/abs/2602 .05665

work page arXiv 2026
[13]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness, and J. Larson, “From local to global: A graph rag approach to query-focused summarization,” arXiv preprint arXiv:2404.16130, 2024. [Online]. Available: https: //arxiv.org/abs/2404.16130

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Hipporag: Neurobiologically inspired long-term memory for large language models,

B. J. Guti ´errez, Y . Shu, Y . Gu, M. Yasunaga, and Y . Su, “Hipporag: Neurobiologically inspired long-term memory for large language models,” inAdvances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2405.14831

work page arXiv 2024
[15]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

P. Chhikara, D. Khant, S. Aryan, T. Singh, and D. Yadav, “Mem0: Building production-ready ai agents with scalable long-term memory,”arXiv preprint arXiv:2504.19413, 2025. [Online]. Available: https://arxiv.org/abs/2504.19413

work page internal anchor Pith review Pith/arXiv arXiv 2025
[16]

Graphiti: Build real-time knowledge graphs for ai agents,

Zep, “Graphiti: Build real-time knowledge graphs for ai agents,” https: //github.com/getzep/graphiti, 2025, accessed: 2026-05-07

work page 2025
[17]

Aws and mem0 partner to bring persistent memory to next-gen ai agents with strands,

Mem0, “Aws and mem0 partner to bring persistent memory to next-gen ai agents with strands,” https://mem0.ai/blog/aws-and-mem0-partner-t o-bring-persistent-memory-to-next-gen-ai-agents-with-strands, May 2025, accessed: 2026-05-07

work page 2025
[18]

Build persistent memory for agentic ai appli- cations with mem0 open source, amazon elasticache for valkey, and amazon neptune analytics,

Amazon Web Services, “Build persistent memory for agentic ai appli- cations with mem0 open source, amazon elasticache for valkey, and amazon neptune analytics,” https://aws.amazon.com/blogs/database/bu ild-persistent-memory-for-agentic-ai-applications-with-mem0-open-s ource-amazon-elasticache-for-valkey-and-amazon-neptune-analytics/, Nov. 2025, accessed: ...

work page 2025
[19]

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,

Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,” in Advances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2407.12784

work page arXiv 2024
[20]

Memory injection attacks on llm agents via query- only interaction,

S. Dong, S. Xu, P. He, Y . Li, J. Tang, T. Liu, H. Liu, and Z. Xiang, “Memory injection attacks on llm agents via query- only interaction,”arXiv preprint arXiv:2503.03704, 2025. [Online]. Available: https://arxiv.org/abs/2503.03704

work page arXiv 2025
[21]

Er-mia: Black-box adversarial memory injection attacks on long-term memory-augmented large language models,

M. Piehl, Z. Xi, Z. Xiong, P. He, and M. Ye, “Er-mia: Black-box adversarial memory injection attacks on long-term memory-augmented large language models,”arXiv preprint arXiv:2602.15344, 2026. [Online]. Available: https://arxiv.org/abs/2602.15344

work page arXiv 2026
[22]

Memorygraft: Persistent compromise of llm agents via poisoned experience retrieval,

S. S. Srivastava and H. He, “Memorygraft: Persistent compromise of llm agents via poisoned experience retrieval,”arXiv preprint arXiv:2512.16962, 2025. [Online]. Available: https://arxiv.org/abs/2512 .16962

work page arXiv 2025
[23]

Zombie agents: Persistent control of self-evolving llm agents via self-reinforcing injections,

X. Yang, Y . He, S. Ji, B. Hooi, and J. S. Dong, “Zombie agents: Persistent control of self-evolving llm agents via self-reinforcing injections,”arXiv preprint arXiv:2602.15654, 2026. [Online]. Available: https://arxiv.org/abs/2602.15654

work page arXiv 2026
[24]

Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,

W. Zou, R. Geng, B. Wang, and J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 3827–3844. [Online]. Available: https://arxiv.org/abs/2402.0 7867

work page 2025
[25]

Gasliteing the retrieval: Exploring vulnerabilities in dense embedding-based search,

M. Ben-Tov and M. Sharif, “Gasliteing the retrieval: Exploring vulnerabilities in dense embedding-based search,” inProceedings of the ACM SIGSAC Conference on Computer and Communications Security, 14 2025, pp. 4364–4378. [Online]. Available: https://dl.acm.org/doi/10.11 45/3719027.3765090

work page arXiv 2025
[26]

Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,

J. Xue, M. Zheng, Y . Hu, F. Liu, X. Chen, and Q. Lou, “Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,”arXiv preprint arXiv:2406.00083, 2024. [Online]. Available: https://arxiv.org/abs/2406.00083

work page arXiv 2024
[27]

Phantom: General trigger attacks on retrieval augmented language generation,

H. Chaudhari, G. Severi, J. Abascal, M. Jagielski, C. A. Choquette- Choo, M. Nasr, C. Nita-Rotaru, and A. Oprea, “Phantom: General trigger attacks on retrieval augmented language generation,”arXiv preprint, 2024. [Online]. Available: https://arxiv.org/

work page 2024
[28]

Graphrag under fire,

J. Liang, Y . Wang, C. Li, R. Zhu, T. Jiang, N. Gong, and T. Wang, “Graphrag under fire,”arXiv preprint arXiv:2501.14050, 2025. [Online]. Available: https://arxiv.org/abs/2501.14050

work page arXiv 2025
[29]

Data Poisoning Attack against Knowledge Graph Embedding

H. Zhang, T. Zheng, J. Gao, C. Miao, L. Su, Y . Li, and K. Ren, “Data poisoning attack against knowledge graph embedding,” arXiv preprint arXiv:1904.12052, 2019. [Online]. Available: https: //arxiv.org/abs/1904.12052

work page internal anchor Pith review Pith/arXiv arXiv 1904
[30]

Poisoning knowledge graph embeddings via relation inference patterns,

P. Bhardwaj, J. D. Kelleher, L. Costabello, and D. O’Sullivan, “Poisoning knowledge graph embeddings via relation inference patterns,” inProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, pp. 1875–1888. [Online]. Available: https://aclant...

work page 2021
[31]

Yu, Lifang He, and Bo Li

L. Sun, Y . Dou, C. Yang, K. Zhang, J. Wang, P. S. Yu, L. He, and B. Li, “Adversarial attack and defense on graph data: A survey,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 8, pp. 7693–7711, 2022. [Online]. Available: https://doi.org/10.1109/TKDE.2022.3201246

work page doi:10.1109/tkde.2022.3201246 2022
[32]

Adversarial attacks on graph neural networks via node injections: A hierarchical reinforcement learning approach,

Y . Sun, S. Wang, X. Tang, T.-Y . Hsieh, and V . Honavar, “Adversarial attacks on graph neural networks via node injections: A hierarchical reinforcement learning approach,” inProceedings of The Web Conference 2020, 2020, pp. 673–683. [Online]. Available: https: //dl.acm.org/doi/10.1145/3366423.3380149

work page doi:10.1145/3366423.3380149 2020
[33]

Backdoor attacks to graph neural networks

Z. Zhang, J. Jia, B. Wang, and N. Z. Gong, “Backdoor attacks to graph neural networks,” inProceedings of the 26th ACM Symposium on Access Control Models and Technologies, 2021, pp. 15–26. [Online]. Available: https://dl.acm.org/doi/10.1145/3450569.3463560

work page doi:10.1145/3450569.3463560 2021
[34]

Pubmedqa: A dataset for biomedical research question answering,

Q. Jin, B. Dhingra, Z. Liu, W. W. Cohen, and X. Lu, “Pubmedqa: A dataset for biomedical research question answering,” inProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019, pp. 2567–2577. [Online]. Available: https://aclanthology.org/D19-1259/

work page 2019
[35]

Webshop: Towards scalable real-world web interaction with grounded language agents,

S. Yao, H. Chen, J. Yang, and K. Narasimhan, “Webshop: Towards scalable real-world web interaction with grounded language agents,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 20 744–20 757. [Online]. Available: https://arxiv.org/abs/2207.01206

work page arXiv 2022
[36]

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

Y . Ruan, H. Dong, A. Wang, S. Pitis, Y . Zhou, J. Ba, Y . Dubois, C. J. Maddison, and T. Hashimoto, “Identifying the risks of lm agents with an lm-emulated sandbox,”arXiv preprint arXiv:2309.15817, 2023. [Online]. Available: https://arxiv.org/abs/2309.15817

work page internal anchor Pith review Pith/arXiv arXiv 2023
[37]

Lightagent: Production-level open-source agentic ai framework,

W. Cai, T. Zhu, J. Niu, R. Hu, L. Li, T. Wang, X. Dai, W. Shen, and L. Zhang, “Lightagent: Production-level open-source agentic ai framework,”arXiv preprint arXiv:2509.09292, 2025

work page arXiv 2025
[38]

Identifying the risks of lm agents with an lm-emulated sandbox,

Y . Ruan, H. Dong, A. Wang, S. Pitis, Y . Zhou, J. Ba, Y . Dubois, C. J. Maddison, and T. Hashimoto, “Identifying the risks of lm agents with an lm-emulated sandbox,” inThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[39]

GPT-4o System Card,

OpenAI, “GPT-4o System Card,” https://openai.com/index/gpt-4o-syste m-card/, Aug. 2024, accessed: 2026-05-07

work page 2024
[40]

GPT-5.5 Model,

——, “GPT-5.5 Model,” https://developers.openai.com/api/docs/model s/gpt-5.5, 2026, accessed: 2026-05-07

work page 2026
[41]

Claude Sonnet 4.6,

Anthropic, “Claude Sonnet 4.6,” https://www.anthropic.com/claude/son net, 2026, accessed: 2026-05-07

work page 2026
[42]

DeepSeek-V4-Pro,

DeepSeek-AI, “DeepSeek-V4-Pro,” https://huggingface.co/deepseek-ai/ DeepSeek-V4-Pro, 2026, accessed: 2026-05-07

work page 2026
[43]

Gemini 3.1 Pro Preview,

Google, “Gemini 3.1 Pro Preview,” https://ai.google.dev/gemini-api/d ocs/models/gemini-3.1-pro-preview, 2026, accessed: 2026-05-07

work page 2026
[44]

Defending Against Indirect Prompt Injection Attacks With Spotlighting

K. Hines, G. Lopez, M. Hall, F. Zarfati, Y . Zunger, and E. Kiciman, “Defending against indirect prompt injection attacks with spotlighting,” arXiv preprint arXiv:2403.14720, 2024. [Online]. Available: https: //arxiv.org/abs/2403.14720

work page internal anchor Pith review Pith/arXiv arXiv 2024
[45]

Struq: Defending against prompt injection with structured queries,

S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “Struq: Defending against prompt injection with structured queries,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 2383–2400. [Online]. Available: https://www.usenix.org/conference/usenixsecurity 25/presentation/chen-sizhe

work page 2025
[46]

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel, “The instruction hierarchy: Training llms to prioritize privileged instructions,”arXiv preprint arXiv:2404.13208, 2024. [Online]. Available: https://arxiv.org/abs/2404.13208

work page internal anchor Pith review Pith/arXiv arXiv 2024
[47]

The task shield: Enforcing task alignment to defend against indirect prompt injection in llm agents,

F. Jia, T. Wu, X. Qin, and A. Squicciarini, “The task shield: Enforcing task alignment to defend against indirect prompt injection in llm agents,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025, pp. 29 680–29 697. [Online]. Available: https://aclanthology.org/2025.acl-long.1452/

work page 2025
[49]

Available: https://arxiv.org/abs/2410.21492

[Online]. Available: https://arxiv.org/abs/2410.21492

work page arXiv
[50]

Memorag: Moving towards next-gen rag via memory-inspired knowledge discovery,

H. Qian, P. Zhang, Z. Liu, K. Mao, and Z. Dou, “Memorag: Moving towards next-gen rag via memory-inspired knowledge discovery,”arXiv preprint arXiv:2409.05591, 2024. [Online]. Available: https://arxiv.org/abs/2409.05591

work page arXiv 2024
[51]

G-retriever: Retrieval-augmented generation for textual graph understanding and question answering,

X. He, Y . Tian, Y . Sun, N. V . Chawla, T. Laurent, Y . LeCun, X. Bresson, and B. Hooi, “G-retriever: Retrieval-augmented generation for textual graph understanding and question answering,” inAdvances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2402.07630

work page arXiv 2024
[52]

Raptor: Recursive abstractive processing for tree-organized retrieval,

P. Sarthi, S. Abdullah, A. Tuli, S. Khanna, A. Goldie, and C. D. Manning, “Raptor: Recursive abstractive processing for tree-organized retrieval,” inInternational Conference on Learning Representations,

work page
[53]

Available: https://arxiv.org/abs/2401.18059

[Online]. Available: https://arxiv.org/abs/2401.18059

work page arXiv
[55]

Available: https://arxiv.org/abs/2411.14110

[Online]. Available: https://arxiv.org/abs/2411.14110

work page arXiv
[56]

Memory poisoning attack and defense on memory based llm-agents,

B. D. Sunil, I. Sinha, P. Maheshwari, S. Todmal, S. Mallik, and S. Mishra, “Memory poisoning attack and defense on memory based llm-agents,”arXiv preprint arXiv:2601.05504, 2026. [Online]. Available: https://arxiv.org/abs/2601.05504

work page arXiv 2026
[57]

Certifiably robust rag against retrieval corruption,

C. Xiang, T. Wu, Z. Zhong, D. Wagner, D. Chen, and P. Mittal, “Certifiably robust rag against retrieval corruption,”arXiv preprint arXiv:2405.15556, 2024. [Online]. Available: https://arxiv.org/abs/2405 .15556

work page arXiv 2024
[58]

Ragchecker: A fine-grained framework for diagnosing retrieval-augmented generation,

D. Ru, L. Qiu, X. Hu, T. Zhang, P. Shi, S. Chang, C. Jiayang, C. Wang, S. Sun, H. Liet al., “Ragchecker: A fine-grained framework for diagnosing retrieval-augmented generation,” inAdvances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2408.08067

work page arXiv 2024
[59]

Meta secalign: A secure foundation llm against prompt injection attacks,

S. Chen, A. Zharmagambetov, D. Wagner, and C. Guo, “Meta secalign: A secure foundation llm against prompt injection attacks,” arXiv preprint arXiv:2507.02735, 2025. [Online]. Available: https: //arxiv.org/abs/2507.02735

work page arXiv 2025
[60]

Superlocalmemory: Privacy-preserving multi-agent memory with bayesian trust defense against memory poisoning,

V . P. Bhardwaj, “Superlocalmemory: Privacy-preserving multi-agent memory with bayesian trust defense against memory poisoning,” arXiv preprint arXiv:2603.02240, 2026. [Online]. Available: https: //arxiv.org/abs/2603.02240 APPENDIXA BASELINEADAPTATIONDETAILS The baselines are adapted to the same ordinary-interaction threat model as SHADOWMERGE. The origin...

work page arXiv 2026