ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts
Pith reviewed 2026-05-19 17:34 UTC · model grok-4.3
The pith
ShadowMerge poisons graph-based agent memory by injecting relations that share the same query-activated anchor and channel as legitimate evidence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ShadowMerge is a poisoning attack against graph-based agent memory that exploits relation-channel conflicts. Its key insight is that a poisoned relation can share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. The AIR pipeline converts this conflict into an ordinary interaction that the graph-memory system extracts, merges into the target anchor neighborhood, and retrieves for the victim query. On Mem0 and the PubMedQA, WebShop, and ToolEmu datasets the attack reaches 93.8 percent average success rate, far above prior baselines, with negligible effect on unrelated benign tasks.
What carries the argument
The AIR pipeline, which converts a relation-channel conflict into an ordinary interaction that shares the query-activated anchor and canonicalized relation channel with benign evidence so the graph-memory system extracts, merges, and retrieves it.
If this is right
- The attack overcomes the extraction, merge, and retrieval failures that limit prior poisoning methods on flat textual records.
- Input-side defenses applied before memory ingestion do not prevent the poisoned relation from being merged and later retrieved.
- The attack alters agent behavior on the targeted task while leaving performance on unrelated benign tasks essentially unchanged.
- The same channel-sharing approach succeeds across Mem0 and multiple real-world datasets including question answering and tool-use scenarios.
Where Pith is reading between the lines
- Systems that canonicalize relation channels more strictly or add provenance checks on merges could limit this form of conflict-based poisoning.
- The attack surface may extend to other structured memory formats that rely on anchor-based retrieval and channel normalization for long-term agent recall.
- Persistent deployment of graph memory in agents would benefit from monitoring for anomalous relations that activate on the same queries as established evidence.
Load-bearing premise
The graph-memory system will extract, merge into the target anchor neighborhood, and retrieve the poisoned relation for the victim query when it shares the same query-activated anchor and canonicalized relation channel as benign evidence via the AIR pipeline.
What would settle it
Running the victim query after injection and finding that the poisoned relation is never retrieved or used, even though it shares the anchor and canonicalized channel with the benign evidence, would show the channel-conflict mechanism does not produce retrieval.
Figures
read the original abstract
Graph-based agent memory is increasingly used in LLM agents to support structured long-term recall and multi-hop reasoning, but it also creates a new poisoning surface: an attacker can inject a crafted relation into graph memory so that it is later retrieved and influences agent behavior. Existing agent-memory poisoning attacks mainly target flat textual records and are ineffective in graph-based memory because malicious relations often fail to be extracted, merged into the target anchor neighborhood, or retrieved for the victim query. We present SHADOWMERGE, a poisoning attack against graph-based agent memory that exploits relation-channel conflicts. Its key insight is that a poisoned relation can share the same query-activated anchor and canonicalized relation channel as benign evidence while carrying a conflicting value. To realize this, we design AIR, a pipeline that converts the conflict into an ordinary interaction that can be extracted, merged, and retrieved by the graph-memory system. We evaluate SHADOWMERGE on Mem0 and three public real-world datasets: PubMedQA, WebShop, and ToolEmu. SHADOWMERGE achieves 93.8% average attack success rate, improving the best baseline by 50.3 absolute points, while having negligible impact on unrelated benign tasks. Mechanism studies show that SHADOWMERGE overcomes the three key limitations of existing agent-memory poisoning attacks, and defense analysis shows that representative input-side defenses are insufficient to mitigate it. We have responsibly disclosed our findings to affected graph-memory vendors and open sourced SHADOWMERGE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents SHADOWMERGE, a poisoning attack on graph-based agent memory that exploits relation-channel conflicts. The core idea is an AIR pipeline that converts a conflicting poisoned relation into an ordinary interaction sharing the same query-activated anchor and canonicalized relation channel as benign evidence, enabling extraction, merging into the target neighborhood, and retrieval by systems such as Mem0. Evaluations on Mem0 with PubMedQA, WebShop, and ToolEmu report 93.8% average attack success rate (50.3 points above the best baseline) with negligible impact on unrelated benign tasks; mechanism studies claim the attack overcomes prior limitations, and defense analysis finds input-side defenses insufficient. The implementation is open-sourced after responsible disclosure.
Significance. If the central empirical claims hold, the work identifies a previously under-explored poisoning surface in structured graph memory for LLM agents, which is increasingly deployed for long-term recall and multi-hop reasoning. The reported ASR improvement and explicit comparison to baselines that fail on extraction/merging/retrieval steps provide concrete evidence of a practical threat. Credit is due for open-sourcing the code and for the responsible disclosure to vendors, both of which support reproducibility and follow-up work.
major comments (2)
- [§5.2] §5.2 (Mechanism Studies): the claim that SHADOWMERGE overcomes the three key limitations of prior attacks rests on the AIR pipeline successfully forcing extraction, merge, and retrieval of the poisoned relation. No independent audit of the graph state (pre- and post-injection snapshots or neighborhood inspection) is described to confirm that the conflicting value is merged into the exact target anchor neighborhood rather than discarded or isolated by Mem0's merge policy.
- [§4] §4 (Evaluation): the headline 93.8% ASR and the assertion that the attack works because the poisoned relation shares the canonicalized relation channel require explicit validation that Mem0's relation canonicalizer uses the same string-based canonicalization as AIR rather than semantic embedding distance. Without this or an ablation on canonicalization variants, it is unclear whether the success rates are robust or specific to the tested Mem0 configuration.
minor comments (2)
- [Abstract] Abstract: the three datasets are named only later in the text; listing PubMedQA, WebShop, and ToolEmu already in the abstract would improve immediate clarity.
- [§6] §6 (Defense Analysis): quantitative overhead or false-positive rates for any suggested mitigations would strengthen the practical takeaway beyond the qualitative statement that input-side defenses are insufficient.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the changes we will incorporate in the revised manuscript.
read point-by-point responses
-
Referee: [§5.2] §5.2 (Mechanism Studies): the claim that SHADOWMERGE overcomes the three key limitations of prior attacks rests on the AIR pipeline successfully forcing extraction, merge, and retrieval of the poisoned relation. No independent audit of the graph state (pre- and post-injection snapshots or neighborhood inspection) is described to confirm that the conflicting value is merged into the exact target anchor neighborhood rather than discarded or isolated by Mem0's merge policy.
Authors: We agree that an explicit audit of the graph state would strengthen the mechanistic claims. The current §5.2 relies on attack success rates and the AIR pipeline design to infer successful extraction, merge, and retrieval. In the revised version we will add pre- and post-injection graph snapshots together with neighborhood inspection results to directly verify that the conflicting poisoned relation is merged into the target anchor neighborhood rather than discarded or isolated. revision: yes
-
Referee: [§4] §4 (Evaluation): the headline 93.8% ASR and the assertion that the attack works because the poisoned relation shares the canonicalized relation channel require explicit validation that Mem0's relation canonicalizer uses the same string-based canonicalization as AIR rather than semantic embedding distance. Without this or an ablation on canonicalization variants, it is unclear whether the success rates are robust or specific to the tested Mem0 configuration.
Authors: The AIR pipeline employs string-based canonicalization after normalization to ensure the poisoned relation shares the same channel as benign evidence. The consistently high ASR across three datasets is consistent with this design. To address the concern directly, the revised §4 will include an explicit check of Mem0's canonicalizer behavior (via code inspection and logging) and an ablation across string-based versus embedding-based canonicalization variants to demonstrate robustness. revision: yes
Circularity Check
No significant circularity in empirical attack evaluation
full rationale
This paper presents an empirical security attack (SHADOWMERGE) and its evaluation on external public datasets (PubMedQA, WebShop, ToolEmu) plus the Mem0 system. Attack success rates and mechanism studies are measured experimental outcomes, not quantities derived by construction from fitted parameters, self-definitions, or prior self-citations. The AIR pipeline is introduced as a design artifact whose behavior is validated through direct testing rather than assumed via internal equations or uniqueness theorems. No load-bearing step reduces to a self-referential fit or citation chain; the central 93.8% ASR claim rests on observable retrieval behavior in the target systems.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Graph-based memory systems extract, merge, and retrieve relations based on query-activated anchors and canonicalized relation channels.
Reference graph
Works this paper leans on
-
[1]
ReAct: Synergizing Reasoning and Acting in Language Models
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” in International Conference on Learning Representations, 2023. [Online]. Available: https://arxiv.org/abs/2210.03629
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Toolformer: Language Models Can Teach Themselves to Use Tools
T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” inAdvances in Neural Information Processing Systems, 2023. [Online]. Available: https://arxiv.org/abs/2302.04761
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Re- flexion: Language agents with verbal reinforcement learning,
N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Re- flexion: Language agents with verbal reinforcement learning,”Advances in neural information processing systems, vol. 36, pp. 8634–8652, 2023
work page 2023
-
[4]
Voyager: An Open-Ended Embodied Agent with Large Language Models
G. Wang, Y . Xie, Y . Jiang, A. Mandlekar, C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar, “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023. [Online]. Available: https://arxiv.org/abs/2305.16291
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023, pp. 1–22. [Online]. Available: https://dl.acm.org/doi/10.1145/3586183.3606763
-
[6]
MemGPT: Towards LLMs as Operating Systems
C. Packer, V . Fang, S. G. Patil, K. Lin, S. Wooders, and J. E. Gonzalez, “Memgpt: Towards llms as operating systems,” arXiv preprint arXiv:2310.08560, 2023. [Online]. Available: https: //arxiv.org/abs/2310.08560
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[7]
MemoryBank: Enhancing Large Language Models with Long-Term Memory
W. Zhong, L. Guo, Q. Gao, H. Ye, and Y . Wang, “Memorybank: Enhancing large language models with long-term memory,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, 2024, pp. 19 724–19 731. [Online]. Available: https://arxiv.org/abs/2305.10250
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[8]
Augmenting language models with long-term memory,
W. Wang, L. Dong, H. Cheng, X. Liu, X. Yan, J. Gao, and F. Wei, “Augmenting language models with long-term memory,” in Advances in Neural Information Processing Systems, vol. 36, 2023, pp. 74 530–74 543. [Online]. Available: https://arxiv.org/abs/2306.07174
-
[10]
A-MEM: Agentic Memory for LLM Agents
[Online]. Available: https://arxiv.org/abs/2502.12110
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
MIRIX: Multi-Agent Memory System for LLM-Based Agents
Y . Wang and X. Chen, “Mirix: Multi-agent memory system for llm-based agents,”arXiv preprint arXiv:2507.07957, 2025. [Online]. Available: https://arxiv.org/abs/2507.07957
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[12]
Graph-based agent memory: Taxonomy, techniques, and applications,
C. Yang, C. Zhou, Y . Xiao, S. Dong, L. Zhuang, Y . Zhang, Z. Wang, Z. Hong, Z. Yuan, Z. Xianget al., “Graph-based agent memory: Taxonomy, techniques, and applications,”arXiv preprint arXiv:2602.05665, 2026. [Online]. Available: https://arxiv.org/abs/2602 .05665
-
[13]
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness, and J. Larson, “From local to global: A graph rag approach to query-focused summarization,” arXiv preprint arXiv:2404.16130, 2024. [Online]. Available: https: //arxiv.org/abs/2404.16130
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[14]
B. J. Guti ´errez, Y . Shu, Y . Gu, M. Yasunaga, and Y . Su, “Hipporag: Neurobiologically inspired long-term memory for large language models,” inAdvances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2405.14831
-
[15]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
P. Chhikara, D. Khant, S. Aryan, T. Singh, and D. Yadav, “Mem0: Building production-ready ai agents with scalable long-term memory,”arXiv preprint arXiv:2504.19413, 2025. [Online]. Available: https://arxiv.org/abs/2504.19413
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[16]
Graphiti: Build real-time knowledge graphs for ai agents,
Zep, “Graphiti: Build real-time knowledge graphs for ai agents,” https: //github.com/getzep/graphiti, 2025, accessed: 2026-05-07
work page 2025
-
[17]
Aws and mem0 partner to bring persistent memory to next-gen ai agents with strands,
Mem0, “Aws and mem0 partner to bring persistent memory to next-gen ai agents with strands,” https://mem0.ai/blog/aws-and-mem0-partner-t o-bring-persistent-memory-to-next-gen-ai-agents-with-strands, May 2025, accessed: 2026-05-07
work page 2025
-
[18]
Amazon Web Services, “Build persistent memory for agentic ai appli- cations with mem0 open source, amazon elasticache for valkey, and amazon neptune analytics,” https://aws.amazon.com/blogs/database/bu ild-persistent-memory-for-agentic-ai-applications-with-mem0-open-s ource-amazon-elasticache-for-valkey-and-amazon-neptune-analytics/, Nov. 2025, accessed: ...
work page 2025
-
[19]
Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases
Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases,” in Advances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2407.12784
-
[20]
Yann Dubois, Balázs Galambosi, Percy Liang, and Tat- sunori B Hashimoto
S. Dong, S. Xu, P. He, Y . Li, J. Tang, T. Liu, H. Liu, and Z. Xiang, “Memory injection attacks on llm agents via query- only interaction,”arXiv preprint arXiv:2503.03704, 2025. [Online]. Available: https://arxiv.org/abs/2503.03704
-
[21]
M. Piehl, Z. Xi, Z. Xiong, P. He, and M. Ye, “Er-mia: Black-box adversarial memory injection attacks on long-term memory-augmented large language models,”arXiv preprint arXiv:2602.15344, 2026. [Online]. Available: https://arxiv.org/abs/2602.15344
-
[22]
arXiv preprint arXiv:2512.16962 , year =
S. S. Srivastava and H. He, “Memorygraft: Persistent compromise of llm agents via poisoned experience retrieval,”arXiv preprint arXiv:2512.16962, 2025. [Online]. Available: https://arxiv.org/abs/2512 .16962
-
[23]
Zombie agents: Persistent control of self-evolving llm agents via self-reinforcing injections,
X. Yang, Y . He, S. Ji, B. Hooi, and J. S. Dong, “Zombie agents: Persistent control of self-evolving llm agents via self-reinforcing injections,”arXiv preprint arXiv:2602.15654, 2026. [Online]. Available: https://arxiv.org/abs/2602.15654
-
[24]
W. Zou, R. Geng, B. Wang, and J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 3827–3844. [Online]. Available: https://arxiv.org/abs/2402.0 7867
work page 2025
-
[25]
Gasliteing the retrieval: Exploring vulner- abilities in dense embedding-based search,
M. Ben-Tov and M. Sharif, “Gasliteing the retrieval: Exploring vulner- abilities in dense embedding-based search,” inProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, 2025, pp. 4364–4378. 14
work page 2025
-
[26]
Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,
J. Xue, M. Zheng, Y . Hu, F. Liu, X. Chen, and Q. Lou, “Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,”arXiv preprint arXiv:2406.00083, 2024. [Online]. Available: https://arxiv.org/abs/2406.00083
-
[27]
Phantom: General trigger attacks on retrieval augmented language generation,
H. Chaudhari, G. Severi, J. Abascal, M. Jagielski, C. A. Choquette- Choo, M. Nasr, C. Nita-Rotaru, and A. Oprea, “Phantom: General trigger attacks on retrieval augmented language generation,” 2024
work page 2024
-
[28]
Graphrag under fire.arXiv preprint arXiv:2501.14050,
J. Liang, Y . Wang, C. Li, R. Zhu, T. Jiang, N. Gong, and T. Wang, “Graphrag under fire,”arXiv preprint arXiv:2501.14050, 2025. [Online]. Available: https://arxiv.org/abs/2501.14050
-
[29]
Data Poisoning Attack against Knowledge Graph Embedding
H. Zhang, T. Zheng, J. Gao, C. Miao, L. Su, Y . Li, and K. Ren, “Data poisoning attack against knowledge graph embedding,” arXiv preprint arXiv:1904.12052, 2019. [Online]. Available: https: //arxiv.org/abs/1904.12052
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[30]
Poisoning knowledge graph embeddings via relation inference patterns,
P. Bhardwaj, J. D. Kelleher, L. Costabello, and D. O’Sullivan, “Poisoning knowledge graph embeddings via relation inference patterns,” inProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, pp. 1875–1888. [Online]. Available: https://aclant...
work page 2021
-
[31]
Adversarial attack and defense on graph data: A survey,
L. Sun, Y . Dou, C. Yang, K. Zhang, J. Wang, P. S. Yu, L. He, and B. Li, “Adversarial attack and defense on graph data: A survey,”IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 8, pp. 7693–7711, 2022
work page 2022
-
[32]
Y . Sun, S. Wang, X. Tang, T.-Y . Hsieh, and V . Honavar, “Adversarial attacks on graph neural networks via node injections: A hierarchical reinforcement learning approach,” inProceedings of The Web Conference 2020, 2020, pp. 673–683. [Online]. Available: https: //dl.acm.org/doi/10.1145/3366423.3380149
-
[33]
Backdoor attacks to graph neural networks,
Z. Zhang, J. Jia, B. Wang, and N. Z. Gong, “Backdoor attacks to graph neural networks,” inProceedings of the 26th ACM Symposium on Access Control Models and Technologies, 2021, pp. 15–26. [Online]. Available: https://dl.acm.org/doi/10.1145/3450569.3463560
-
[34]
Pubmedqa: A dataset for biomedical research question answering,
Q. Jin, B. Dhingra, Z. Liu, W. W. Cohen, and X. Lu, “Pubmedqa: A dataset for biomedical research question answering,” inProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019, pp. 2567–2577. [Online]. Available: https://aclanthology.org/D19-1259/
work page 2019
-
[35]
S. Yao, H. Chen, J. Yang, and K. Narasimhan, “Webshop: Towards scalable real-world web interaction with grounded language agents,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 20 744–20 757. [Online]. Available: https://arxiv.org/abs/2207.01206
-
[36]
Identifying the Risks of LM Agents with an LM-Emulated Sandbox
Y . Ruan, H. Dong, A. Wang, S. Pitis, Y . Zhou, J. Ba, Y . Dubois, C. J. Maddison, and T. Hashimoto, “Identifying the risks of lm agents with an lm-emulated sandbox,”arXiv preprint arXiv:2309.15817, 2023. [Online]. Available: https://arxiv.org/abs/2309.15817
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
Lightagent: Production-level open-source agentic ai framework,
W. Cai, T. Zhu, J. Niu, R. Hu, L. Li, T. Wang, X. Dai, W. Shen, and L. Zhang, “Lightagent: Production-level open-source agentic ai framework,”arXiv preprint arXiv:2509.09292, 2025
-
[38]
Identifying the risks of lm agents with an lm-emulated sandbox,
Y . Ruan, H. Dong, A. Wang, S. Pitis, Y . Zhou, J. Ba, Y . Dubois, C. J. Maddison, and T. Hashimoto, “Identifying the risks of lm agents with an lm-emulated sandbox,” inThe Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[39]
OpenAI, “GPT-4o System Card,” https://openai.com/index/gpt-4o-syste m-card/, Aug. 2024, accessed: 2026-05-07
work page 2024
-
[40]
——, “GPT-5.5 Model,” https://developers.openai.com/api/docs/model s/gpt-5.5, 2026, accessed: 2026-05-07
work page 2026
-
[41]
Anthropic, “Claude Sonnet 4.6,” https://www.anthropic.com/claude/son net, 2026, accessed: 2026-05-07
work page 2026
-
[42]
DeepSeek-AI, “DeepSeek-V4-Pro,” https://huggingface.co/deepseek-ai/ DeepSeek-V4-Pro, 2026, accessed: 2026-05-07
work page 2026
-
[43]
Google, “Gemini 3.1 Pro Preview,” https://ai.google.dev/gemini-api/d ocs/models/gemini-3.1-pro-preview, 2026, accessed: 2026-05-07
work page 2026
-
[44]
Defending Against Indirect Prompt Injection Attacks With Spotlighting
K. Hines, G. Lopez, M. Hall, F. Zarfati, Y . Zunger, and E. Kiciman, “Defending against indirect prompt injection attacks with spotlighting,” arXiv preprint arXiv:2403.14720, 2024. [Online]. Available: https: //arxiv.org/abs/2403.14720
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[45]
Struq: Defending against prompt injection with structured queries,
S. Chen, J. Piet, C. Sitawarin, and D. Wagner, “Struq: Defending against prompt injection with structured queries,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 2383–2400. [Online]. Available: https://www.usenix.org/conference/usenixsecurity 25/presentation/chen-sizhe
work page 2025
-
[46]
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel, “The instruction hierarchy: Training llms to prioritize privileged instructions,”arXiv preprint arXiv:2404.13208, 2024. [Online]. Available: https://arxiv.org/abs/2404.13208
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[47]
The task shield: Enforcing task alignment to defend against indirect prompt injection in llm agents,
F. Jia, T. Wu, X. Qin, and A. Squicciarini, “The task shield: Enforcing task alignment to defend against indirect prompt injection in llm agents,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 29 680– 29 697
work page 2025
-
[49]
Available: https://arxiv.org/abs/2410.21492
[Online]. Available: https://arxiv.org/abs/2410.21492
-
[50]
Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei
H. Qian, P. Zhang, Z. Liu, K. Mao, and Z. Dou, “Memorag: Moving towards next-gen rag via memory-inspired knowledge discovery,”arXiv preprint arXiv:2409.05591, 2024. [Online]. Available: https://arxiv.org/abs/2409.05591
-
[51]
G-retriever: Retrieval-augmented generation for textual graph understanding and question answering,
X. He, Y . Tian, Y . Sun, N. V . Chawla, T. Laurent, Y . LeCun, X. Bresson, and B. Hooi, “G-retriever: Retrieval-augmented generation for textual graph understanding and question answering,” inAdvances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2402.07630
-
[52]
Raptor: Recursive abstractive processing for tree-organized retrieval,
P. Sarthi, S. Abdullah, A. Tuli, S. Khanna, A. Goldie, and C. D. Manning, “Raptor: Recursive abstractive processing for tree-organized retrieval,” inInternational Conference on Learning Representations,
-
[53]
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
[Online]. Available: https://arxiv.org/abs/2401.18059
work page internal anchor Pith review Pith/arXiv arXiv
-
[54]
arXiv preprint arXiv:2411.14110 , year=
C. Jiang, X. Pan, G. Hong, C. Bao, and M. Yang, “Rag-thief: Scalable extraction of private data from retrieval-augmented generation applica- tions with agent-based attacks,”arXiv preprint arXiv:2411.14110, vol. 4, 2024
-
[55]
Memory poisoning attack and defense on memory based llm-agents,
B. D. Sunil, I. Sinha, P. Maheshwari, S. Todmal, S. Mallik, and S. Mishra, “Memory poisoning attack and defense on memory based llm-agents,”arXiv preprint arXiv:2601.05504, 2026. [Online]. Available: https://arxiv.org/abs/2601.05504
-
[56]
Certifiably robust rag against retrieval corrup- tion.arXiv preprint arXiv:2405.15556,
C. Xiang, T. Wu, Z. Zhong, D. Wagner, D. Chen, and P. Mittal, “Certifiably robust rag against retrieval corruption,”arXiv preprint arXiv:2405.15556, 2024. [Online]. Available: https://arxiv.org/abs/2405 .15556
-
[57]
Ragchecker: A fine-grained framework for diagnosing retrieval-augmented generation,
D. Ru, L. Qiu, X. Hu, T. Zhang, P. Shi, S. Chang, C. Jiayang, C. Wang, S. Sun, H. Liet al., “Ragchecker: A fine-grained framework for diagnosing retrieval-augmented generation,” inAdvances in Neural Information Processing Systems, 2024. [Online]. Available: https://arxiv.org/abs/2408.08067
-
[58]
Meta secalign: A secure foundation llm against prompt injection attacks, 2026
S. Chen, A. Zharmagambetov, D. Wagner, and C. Guo, “Meta secalign: A secure foundation llm against prompt injection attacks,” arXiv preprint arXiv:2507.02735, 2025. [Online]. Available: https: //arxiv.org/abs/2507.02735
-
[59]
V . P. Bhardwaj, “Superlocalmemory: Privacy-preserving multi-agent memory with bayesian trust defense against memory poisoning,” arXiv preprint arXiv:2603.02240, 2026. [Online]. Available: https: //arxiv.org/abs/2603.02240 APPENDIXA BASELINEADAPTATIONDETAILS The baselines are adapted to the same ordinary-interaction threat model as SHADOWMERGE. The origin...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.