pith. sign in

arxiv: 2606.10742 · v1 · pith:NRG7PQ34new · submitted 2026-06-09 · 💻 cs.CR · cs.LG

MemVenom: Triggered Poisoning of Multimodal Memories in Web Agents

Pith reviewed 2026-06-27 12:56 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords multimodal memory poisoningweb agentsexternal memoryblack-box attackvision-language modelsadversarial perturbationsOCR injectiongraph-structured retrieval
0
0 comments X

The pith

Web agents using external memory can be persistently hijacked by coordinated text-image poisoning that overrides user goals on retrieval.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies multimodal memory poisoning as a vulnerability in web agents that store and retrieve past experiences in graph-structured external memory for long-horizon tasks. It introduces MemVenom, a black-box two-stage attack: first a trigger-conditioned retrieval step that uses matching text-image evidence to ensure high-probability recall of malicious entries, then post-retrieval induction via adversarial perturbations and stealthy OCR injection to make the agent pursue an attacker-chosen objective instead. This approach requires no model parameter changes and works across different user tasks. Experiments report up to 99.15 percent end-to-end success on GPT-5-family agents with effective transfer to other architectures and little degradation of normal performance. A sympathetic reader would care because external memory is becoming standard for capable agents, turning one-time injection into repeated behavioral influence.

Core claim

MemVenom shows that graph-structured external memory in web agents can be poisoned in a black-box setting through coordinated text-image evidence, enabling a trigger-conditioned retrieval attack that ensures high-probability recall followed by post-retrieval induction with adversarial perturbations and stealthy OCR injection to override the original user objective, achieving strong end-to-end success rates up to 99.15 percent on GPT-5-family web agents while transferring across architectures and model scales with minimal impact on benign performance.

What carries the argument

Two-stage design of trigger-conditioned retrieval attack plus post-retrieval induction using adversarial perturbations and stealthy OCR injection on multimodal evidence.

If this is right

  • Attacks remain effective without model fine-tuning or task-specific re-optimization.
  • Poisoned memories transfer across web-agent frameworks and vision-language model scales.
  • The same injected content can influence many unrelated future user interactions.
  • Benign task performance remains largely unchanged, reducing easy detection signals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Agent memory systems may need provenance checks or content verification steps before acting on retrieved entries.
  • The same poisoning pattern could affect other long-term memory designs used by multimodal agents beyond web tasks.
  • Security measures limited to input prompts would leave this retrieval-based vector unaddressed.
  • Live deployment logs could be examined for unexpected goal shifts after known memory insertions to measure real impact.

Load-bearing premise

Coordinated text-image evidence will reliably trigger retrieval of the poisoned memory and the subsequent induction will make the agent follow the malicious objective over the user's original goal.

What would settle it

An experiment that inserts the poisoned memory entries but finds the agent either fails to retrieve them on trigger presentation or retrieves them yet still executes the user's stated task without adopting the injected malicious instructions.

Figures

Figures reproduced from arXiv: 2606.10742 by Bin Chen, Fan Mo, Hao Fang, Hao Sun, Kuofeng Gao, Shu-Tao Xia, Yaowei Wang, Yv Zhang.

Figure 1
Figure 1. Figure 1: Comparison between prior text-only memory poisoning and MemVenom. Prior work [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the MemVenom attack framework. Stage 1 builds a recall-oriented component [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Retriever transferability of MemVenom. action manipulation. In contrast, Adapted CPA and Adapted BadChain show much lower overall success, mainly due to unstable malicious recall or weak post-recall induction. These results confirm the necessity of our two-stage design. By jointly optimizing trigger-conditioned recall and post-recall prioritization, MemVenom better transfers poisoned memory exposure into a… view at source ↗
Figure 4
Figure 4. Figure 4: Top-k malicious re￾call behavior [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Representative localhost sandbox interfaces used in the controlled task-substitution eval [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Structure-preserving first-round VLM input example for phishing/redirection. [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Structure-preserving first-round VLM input example for unauthorized financial operation. [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Structure-preserving first-round VLM input example for controlled privacy leakage. [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Structure-preserving first-round VLM input example for destructive data operation. [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Execution-path comparison for a controlled privacy-leakage task on a Department of [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Execution-path comparison for a controlled privacy-leakage task on a WebMD-style [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Execution-path comparison for an unauthorized-financial-operation task. Without attack, [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Trigger-induced embedding shift during optimization. Gray points denote benign screen [PITH_FULL_IMAGE:figures/full_fig_p026_14.png] view at source ↗
read the original abstract

External memory has become a core component of modern web agents, enabling long-horizon reasoning through the retrieval of past experiences. However, this paradigm introduces a critical vulnerability: malicious content injected into memory can be persistently recalled and repeatedly influence agent behavior. In this work, we identify and systematically study multimodal memory poisoning, an overlooked yet practical attack surface in web-agent systems. We propose MemVenom, a unified black-box attack framework that poisons graph-structured external memory with coordinated text-image evidence. Our method consists of a two-stage design: (1) a trigger-conditioned retrieval attack that ensures high-probability recall of malicious memory, and (2) a post-retrieval attack induction that leverages adversarial perturbations and stealthy OCR injection to override the original user objective. Unlike prior attacks that operate on prompts or text-only memory, our approach enables persistent, reusable, and goal-agnostic attacks without modifying model parameters or re-optimizing malicious tasks. Experiments across multiple web-agent frameworks and vision-language models demonstrate that MemVenom achieves strong end-to-end attack success with minimal impact on benign performance, reaching up to 99.15% on GPT-5-family web agents, while transferring effectively across architectures and model scales.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces MemVenom, a black-box two-stage attack framework targeting graph-structured external memory in web agents. Stage 1 uses coordinated text-image triggers to achieve high-probability retrieval of poisoned memories; Stage 2 applies adversarial perturbations and stealthy OCR injection to override the agent's original objective after retrieval. Experiments across web-agent frameworks and VLMs report end-to-end attack success rates up to 99.15% on GPT-5-family agents, with minimal degradation of benign performance and effective transfer across architectures and scales.

Significance. If the empirical results are reproducible, the work identifies a previously overlooked persistent attack surface in multimodal memory-augmented agents. The emphasis on reusable, goal-agnostic poisoning without parameter modification or task re-optimization distinguishes it from prompt-injection literature and could inform defenses for long-horizon web agents.

major comments (3)
  1. [§3] §3 (Trigger-Conditioned Retrieval Attack): The manuscript provides no description of the graph schema, the similarity metric used for memory retrieval, or the mechanism by which multiple retrieved memories are ranked or selected. Without these details it is impossible to assess whether the claimed high-probability recall is a robust property of the attack or an artifact of the specific test environments.
  2. [§4] §4 (Post-Retrieval Attack Induction) and experimental results: The paper reports end-to-end success rates (e.g., 99.15%) but supplies neither ablation studies isolating the contribution of adversarial perturbations versus OCR injection, nor quantitative measurements of whether the poisoned memory actually overrides the benign goal when both are retrieved. The central claim therefore rests on aggregate success rates whose internal validity cannot be verified.
  3. [Experimental evaluation] Experimental evaluation: No statistical tests, confidence intervals, or baseline comparisons against prior memory-poisoning or prompt-injection methods are presented. The transferability claims across model scales likewise lack controls for retrieval frequency or memory size, making it difficult to determine whether the reported numbers generalize beyond the evaluated configurations.
minor comments (2)
  1. [§2] The abstract and introduction use the term 'graph-structured external memory' without defining the underlying data model or retrieval API; a short paragraph or figure in §2 would improve clarity.
  2. Figure captions and table headers should explicitly state the number of trials and the exact success metric (e.g., fraction of episodes in which the poisoned objective is executed).

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the detailed and constructive review. We appreciate the recognition of the work's potential significance in identifying a persistent attack surface for multimodal memory-augmented web agents. We address each major comment below and will revise the manuscript accordingly to improve clarity, internal validity, and statistical rigor.

read point-by-point responses
  1. Referee: [§3] §3 (Trigger-Conditioned Retrieval Attack): The manuscript provides no description of the graph schema, the similarity metric used for memory retrieval, or the mechanism by which multiple retrieved memories are ranked or selected. Without these details it is impossible to assess whether the claimed high-probability recall is a robust property of the attack or an artifact of the specific test environments.

    Authors: We agree that explicit details on the memory graph structure and retrieval process are necessary for assessing robustness. In the revised manuscript we will expand §3 with a dedicated subsection describing the graph schema (nodes as multimodal entries containing text and image embeddings), the cosine similarity metric for retrieval, and the top-k selection mechanism with ranking by similarity score. These additions will clarify that the high-probability recall stems from the coordinated trigger design rather than environment-specific artifacts. revision: yes

  2. Referee: [§4] §4 (Post-Retrieval Attack Induction) and experimental results: The paper reports end-to-end success rates (e.g., 99.15%) but supplies neither ablation studies isolating the contribution of adversarial perturbations versus OCR injection, nor quantitative measurements of whether the poisoned memory actually overrides the benign goal when both are retrieved. The central claim therefore rests on aggregate success rates whose internal validity cannot be verified.

    Authors: We concur that isolating component contributions and measuring override behavior would strengthen the central claim. The revised version will include new ablation experiments separating adversarial perturbations from OCR injection, as well as targeted trials that retrieve both poisoned and benign memories simultaneously and report the frequency with which the poisoned memory overrides the original objective. revision: yes

  3. Referee: [Experimental evaluation] Experimental evaluation: No statistical tests, confidence intervals, or baseline comparisons against prior memory-poisoning or prompt-injection methods are presented. The transferability claims across model scales likewise lack controls for retrieval frequency or memory size, making it difficult to determine whether the reported numbers generalize beyond the evaluated configurations.

    Authors: We acknowledge the absence of statistical analysis and controlled baselines. The revision will add paired t-tests and 95% confidence intervals for all reported success rates, plus direct comparisons against representative prior memory-poisoning and prompt-injection baselines. For transferability, we will include additional experiments that systematically vary retrieval frequency and memory size while holding other factors constant. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical attack rates rest on reported experiments, not derivation

full rationale

The paper describes an empirical attack framework (MemVenom) consisting of trigger-conditioned retrieval and post-retrieval induction, validated through experiments on multiple web-agent systems and VLMs. No equations, fitted parameters, or mathematical derivations are present in the provided text. Central claims of up to 99.15% success are presented as measured outcomes rather than predictions derived from self-referential inputs or self-citations. The method is introduced as a novel two-stage design without reducing to prior author work by construction or renaming known results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.1-grok · 5766 in / 1269 out tokens · 25957 ms · 2026-06-27T12:56:02.681756+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 7 canonical work pages · 1 internal anchor

  1. [1]

    2024 , url =

    Chen, Zhaorun and Xiang, Zhen and Xiao, Chaowei and Song, Dawn and Li, Bo , booktitle =. 2024 , url =

  2. [2]

    2025 , eprint =

    Liang, Jiacheng and Wang, Yuhui and Li, Changjiang and Zhu, Rongyi and Jiang, Tanqiu and Gong, Neil and Wang, Ting , journal =. 2025 , eprint =

  3. [3]

    2025 , url =

    Zou, Wei and Geng, Runpeng and Wang, Binghui and Jia, Jinyuan , booktitle =. 2025 , url =

  4. [4]

    Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =

    Poisoning Retrieval Corpora by Injecting Adversarial Passages , author =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =. 2023 , address =. doi:10.18653/v1/2023.emnlp-main.849 , url =

  5. [5]

    2024 , eprint =

    Xiang, Zhen and Jiang, Fengqing and Xiong, Zidi and Ramasubramanian, Bhaskar and Poovendran, Radha and Li, Bo , journal =. 2024 , eprint =

  6. [6]

    2025 , eprint =

    Ha, Hyeonjeong and Zhan, Qiusi and Kim, Jeonghwan and Bralios, Dimitrios and Sanniboina, Saikrishna and Peng, Nanyun and Chang, Kai-Wei and Kang, Daniel and Ji, Heng , journal =. 2025 , eprint =

  7. [7]

    2025 , url =

    Zhang, Chenyang and Zhang, Xiaoyu and Lou, Jian and Wu, Kai and Wang, Zilong and Chen, Xiaofeng , booktitle =. 2025 , url =

  8. [8]

    arXiv preprint arXiv:2603.29418 , year =

    Adversarial Prompt Injection Attack on Multimodal Large Language Models , author =. arXiv preprint arXiv:2603.29418 , year =. 2603.29418 , archivePrefix =

  9. [9]

    Memory Injection Attacks on

    Dong, Shen and Xu, Shaochen and He, Pengfei and Li, Yige and Tang, Jiliang and Liu, Tianming and Liu, Hui and Xiang, Zhen , journal =. Memory Injection Attacks on. 2025 , eprint =

  10. [10]

    2025 , eprint =

    Srivastava, Saksham Sahai and He, Haoyu , journal =. 2025 , eprint =

  11. [11]

    From Storage to Steering: Memory Control Flow Attacks on

    Xu, Zhenlin and Zhu, Xiaogang and Yao, Yu and Xue, Minhui and Song, Yiliao , journal =. From Storage to Steering: Memory Control Flow Attacks on. 2026 , eprint =

  12. [12]

    Memory Poisoning Attack and Defense on Memory Based

    Sunil, Balachandra Devarangadi and Sinha, Isheeta and Maheshwari, Piyush and Todmal, Shantanu and Malik, Shreyan and Mishra, Shuchi , journal =. Memory Poisoning Attack and Defense on Memory Based. 2026 , eprint =

  13. [13]

    From Local to Global: A Graph RAG Approach to Query-Focused Summarization

    Edge, Darren and Trinh, Ha and Cheng, Newman and Bradley, Joshua and Chao, Alex and Mody, Apurva and Truitt, Steven and Metropolitansky, Dasha and Ness, Robert Osazuwa and Larson, Jonathan , journal =. From Local to Global: A Graph. 2024 , eprint =. doi:10.48550/arXiv.2404.16130 , url =

  14. [14]

    2024 , eprint =

    Guo, Zirui and Xia, Lianghao and Yu, Yanhua and Ao, Tu and Huang, Chao , journal =. 2024 , eprint =

  15. [15]

    2025 , eprint =

    Guo, Zirui and Ren, Xubin and Xu, Lingrui and Zhang, Jiahao and Huang, Chao , journal =. 2025 , eprint =

  16. [16]

    2024 , eprint =

    Hu, Wenbo and Gu, Jia-Chen and Dou, Zi-Yi and Fayyaz, Mohsen and Lu, Pan and Chang, Kai-Wei and Peng, Nanyun , journal =. 2024 , eprint =

  17. [17]

    2024 , url =

    Zheng, Boyuan and Gou, Boyu and Kil, Jihyung and Sun, Huan and Su, Yu , booktitle =. 2024 , url =

  18. [18]

    2024 , publisher =

    He, Hongliang and Yao, Wenlin and Ma, Kaixin and Yu, Wenhao and Dai, Yong and Zhang, Hongming and Lan, Zhenzhong and Yu, Dong , booktitle =. 2024 , publisher =. doi:10.18653/v1/2024.acl-long.371 , url =

  19. [19]

    2023 , url =

    Deng, Xiang and Gu, Yu and Zheng, Boyuan and Chen, Shijie and Stevens, Samuel and Wang, Boshi and Sun, Huan and Su, Yu , booktitle =. 2023 , url =

  20. [20]

    2023 , url =

    Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , booktitle =. 2023 , url =

  21. [21]

    V isual W eb A rena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

    Koh, Jing Yu and Lo, Robert and Jang, Lawrence and Duvvur, Vikram and Lim, Ming and Huang, Po-Yu and Neubig, Graham and Zhou, Shuyan and Salakhutdinov, Ruslan and Fried, Daniel , booktitle =. 2024 , publisher =. doi:10.18653/v1/2024.acl-long.50 , url =

  22. [22]

    Zhou, Shuyan and Xu, Frank F. and Zhu, Hao and Zhou, Xuhui and Lo, Robert and Sridhar, Abishek and Cheng, Xianyi and Ou, Tianyue and Bisk, Yonatan and Fried, Daniel and Alon, Uri and Neubig, Graham , booktitle =. 2024 , url =

  23. [23]

    2024 , url =

    Xie, Tao and Zhang, Danyang and Chen, Jixuan and Li, Xiaochuan and Zhao, Siheng and Cao, Ruisheng and Hua, Toh Jing and Cheng, Zhoujun and Shin, Dongchan and Lei, Fangyu and Liu, Yitao and Xu, Yiheng and Zhou, Shuyan and Savarese, Silvio and Xiong, Caiming and Zhou, Doyen and Zhang, Tao , booktitle =. 2024 , url =

  24. [24]

    2025 , url =

    Yang, Ke and Liu, Yao and Chaudhary, Sapana and Fakoor, Rasool and Chaudhari, Pratik and Karypis, George and Rangwala, Huzefa , booktitle =. 2025 , url =

  25. [25]

    Agent Security Bench (

    Zhang, Hanrong and Huang, Jingyuan and Mei, Kai and Yao, Yifei and Wang, Zhenting and Zhan, Chenlu and Wang, Hongwei and Zhang, Yongfeng , booktitle =. Agent Security Bench (. 2025 , url =

  26. [26]

    Findings of the Association for Computational Linguistics: ACL 2024 , pages =

    Zhan, Qiusi and Liang, Zhixiang and Ying, Zifan and Kang, Daniel , booktitle =. 2024 , publisher =. doi:10.18653/v1/2024.findings-acl.624 , url =

  27. [27]

    Advances in Neural Information Processing Systems Datasets and Benchmarks Track , year =

    Debenedetti, Edoardo and Zhang, Jie and Balunovi. Advances in Neural Information Processing Systems Datasets and Benchmarks Track , year =

  28. [28]

    2024 , eprint =

    Xiang, Zhen and Zheng, Linzhi and Li, Yanjie and Hong, Junyuan and Li, Qinbin and Xie, Han and Zhang, Jiawei and Xiong, Zidi and Xie, Chulin and Yang, Carl and Song, Dawn and Li, Bo , journal =. 2024 , eprint =

  29. [29]

    The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in

    Jia, Feiran and Wu, Tong and Qin, Xin and Squicciarini, Anna , booktitle =. The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in. 2025 , address =. doi:10.18653/v1/2025.acl-long.1435 , url =

  30. [30]

    2023 , eprint =

    Inan, Hakan and Upasani, Kartikeya and Chi, Jianfeng and Rungta, Rashi and Iyer, Krithika and Mao, Yuning and Tontchev, Michael and Hu, Qing and Fuller, Brian and Testuggine, Davide and Khabsa, Madian , journal =. 2023 , eprint =

  31. [31]

    arXiv preprint arXiv:2510.06445 , year =

    A Survey on Agentic Security: Applications, Threats and Defenses , author =. arXiv preprint arXiv:2510.06445 , year =. 2510.06445 , archivePrefix =

  32. [32]

    arXiv preprint arXiv:2604.02623 , year =

    Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents , author =. arXiv preprint arXiv:2604.02623 , year =. 2604.02623 , archivePrefix =

  33. [33]

    2025 , eprint =

    Evtimov, Ivan and Zharmagambetov, Arman and Grattafiori, Aaron and Guo, Chuan and Chaudhuri, Kamalika , journal =. 2025 , eprint =

  34. [34]

    Engineering Applications of Artificial Intelligence , volume =

    Memory Poisoning Attacks on Retrieval-Augmented Large Language Model Agents via Deceptive Semantic Reasoning , author =. Engineering Applications of Artificial Intelligence , volume =. 2026 , doi =

  35. [35]

    2023 , url =

    Chen, Ziyi and Feng, Xiyang and Jin, Guodong and Liu, Chang and Salihoglu, Semih , booktitle =. 2023 , url =

  36. [36]

    2025 , month = apr, address =

    Zhang, Danqing and Rama, Balaji and Ni, Jingyi and He, Shiying and Zhao, Fu and Chen, Kunyu and Chen, Arnold and Cao, Junyu , booktitle =. 2025 , month = apr, address =. doi:10.18653/v1/2025.naacl-demo.36 , url =

  37. [37]

    2026 , howpublished =

  38. [38]

    2025 , howpublished =