pith. sign in

arxiv: 2510.09093 · v2 · submitted 2025-10-10 · 💻 cs.CR · cs.CL

Exploiting Web Search Tools of AI Agents for Data Exfiltration

Pith reviewed 2026-05-18 08:33 UTC · model grok-4.3

classification 💻 cs.CR cs.CL
keywords indirect prompt injectiondata exfiltrationAI agentsweb search toolsLLM vulnerabilitiesprompt attacksmodel defensescybersecurity
0
0 comments X p. Extension

The pith

Indirect prompt injection still lets attackers exfiltrate data from AI agents through their web search tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates how indirect prompt injection attacks exploit AI agents that use web search tools to retrieve and leak sensitive information. It tests multiple large language models to measure susceptibility based on size, manufacturer, and implementation details. Results show that familiar attack patterns continue to succeed, indicating that current defenses have not closed these gaps. A reader would care because autonomous agents now routinely handle corporate data through external tools, turning functional features into leakage paths. The work calls for better training, shared attack databases, and ongoing testing to make security part of LLM design rather than an afterthought.

Core claim

The paper claims that indirect prompt injection attacks succeed in exploiting the web search tools of AI agents to exfiltrate data, and that even well-known attack patterns continue to bypass defenses across different model sizes and manufacturers.

What carries the argument

Indirect prompt injection, which places malicious instructions in external web content that the agent retrieves and then follows to send out sensitive data.

If this is right

  • Strengthened training procedures would increase inherent resilience against these attacks.
  • A centralized database of known attack vectors would support proactive defense.
  • A unified testing framework would enable continuous security validation.
  • Developers must integrate security into the core design of LLMs rather than treating it as optional.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar exfiltration risks likely exist when agents use other external tools such as APIs or file systems.
  • Organizations running AI agents should add output monitoring or sanitization as an extra layer beyond model-level fixes.
  • The persistence of these attacks suggests that simply increasing model size will not remove the vulnerability.

Load-bearing premise

The tested models, attack implementations, and web search tool integrations are representative of real-world AI agent deployments and current LLM capabilities.

What would settle it

A production AI agent whose web search tool returns no sensitive data after repeated indirect prompt injection attempts on multiple models.

Figures

Figures reproduced from arXiv: 2510.09093 by Bernhard Bauer, Dennis Rall, Mohit Mittal, Thomas Fraunholz.

Figure 1
Figure 1. Figure 1: Attack Scenario Illustrating Indirect Prompt Injection for Data [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Bar plot showing the attack success rates of the models. The blue [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Attack success rate of the different variations. The [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Attack success rate of the twenty most effective templates. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Scatter plot showing the attack success rate of the models compared [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

Large language models (LLMs) are now routinely used to autonomously execute complex tasks, from natural language processing to dynamic workflows like web searches. The usage of tool-calling and Retrieval Augmented Generation (RAG) allows LLMs to process and retrieve sensitive corporate data, amplifying both their functionality and vulnerability to abuse. As LLMs increasingly interact with external data sources, indirect prompt injection emerges as a critical and evolving attack vector, enabling adversaries to exploit models through manipulated inputs. Through a systematic evaluation of indirect prompt injection attacks across diverse models, we analyze how susceptible current LLMs are to such attacks, which parameters, including model size and manufacturer, specific implementations, shape their vulnerability, and which attack methods remain most effective. Our results reveal that even well-known attack patterns continue to succeed, exposing persistent weaknesses in model defenses. To address these vulnerabilities, we emphasize the need for strengthened training procedures to enhance inherent resilience, a centralized database of known attack vectors to enable proactive defense, and a unified testing framework to ensure continuous security validation. These steps are essential to push developers toward integrating security into the core design of LLMs, as our findings show that current models still fail to mitigate long-standing threats.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper evaluates indirect prompt injection attacks that exploit web search tools in LLM-based AI agents to achieve data exfiltration. It conducts a systematic empirical study across multiple models, examining the influence of model size, manufacturer, and implementation details on vulnerability, and reports that established attack patterns remain effective. The work concludes by advocating strengthened training procedures, a centralized attack-vector database, and a unified testing framework.

Significance. If the reported success rates and attack templates hold, the results are significant for AI agent security: they supply concrete attack templates, quantitative success rates differentiated by model size and provider, and examples of tool-call outputs that embed exfiltrated context. These elements directly support the central claim of persistent weaknesses in current defenses and provide reproducible evidence that can inform future mitigation research.

minor comments (3)
  1. [§4.2] §4.2: the success-rate tables would benefit from explicit baseline comparisons against non-tool-augmented LLMs to isolate the contribution of the web-search integration.
  2. [Figure 3] Figure 3: axis labels and legend entries are too small for readability; enlarge or add a supplementary high-resolution version.
  3. [§6] The discussion of the proposed centralized attack database lacks a concrete schema or example entry format, which would aid reproducibility of the recommended defense.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of our work and for recommending minor revision. The referee's summary correctly captures the focus of our systematic evaluation of indirect prompt injection attacks that exploit web search tools in LLM agents to enable data exfiltration.

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical attack evaluations

full rationale

The paper presents a systematic empirical evaluation of indirect prompt injection attacks on LLMs equipped with web search tools, including concrete attack templates, success rates across multiple models and providers, and observed exfiltration outcomes. No equations, derivations, fitted parameters, or self-referential definitions appear in the provided sections. Central claims derive directly from experimental results rather than reducing to inputs by construction, self-citation chains, or renamed known patterns. The work is self-contained as an observational security study against external benchmarks of LLM behavior.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard domain assumptions about LLM tool-calling behavior without introducing new free parameters or invented entities.

axioms (1)
  • domain assumption LLMs with tool-calling capabilities can process and act on data retrieved from external web sources.
    This underpins the setup of AI agents performing web searches and handling sensitive data.

pith-pipeline@v0.9.0 · 5743 in / 1044 out tokens · 37096 ms · 2026-05-18T08:33:26.294298+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

    cs.CR 2026-05 unverdicted novelty 8.0

    Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying ...

  2. When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

    cs.CR 2026-05 unverdicted novelty 7.0

    A malicious relay can strategically rewrite aligned LLM outputs in BYOK agent architectures to achieve up to 99.1% attack success on benchmarks like AgentDojo and ASB.

  3. Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

    cs.CR 2026-05 unverdicted novelty 6.0

    The paper defines and evaluates Trojan Hippo attacks on LLM agent memory, showing 85-100% success in data exfiltration across backends and reduced rates with defenses at varying utility costs.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · cited by 2 Pith papers · 6 internal anchors

  1. [1]

    A Comprehensive Overview of Large Language Models

    H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Akhtar, N. Barnes, and A. Mian, “A Comprehensive Overview of Large Language Models,” Jul. 2023. [Online]. Available: https: //arxiv.org/abs/2307.06435v10

  2. [2]

    Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

    A. Singh, A. Ehtesham, S. Kumar, and T. T. Khoei, “Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG,” Feb. 2025, arXiv:2501.09136 [cs]. [Online]. Available: http://arxiv.org/abs/ 2501.09136

  3. [3]

    Llm with tools: A survey,

    Z. Shen, “LLM With Tools: A Survey,” Sep. 2024, arXiv:2409.18807 [cs]. [Online]. Available: http://arxiv.org/abs/2409.18807

  4. [4]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    A. Zou, Z. Wang, N. Carlini, M. Nasr, J. Z. Kolter, and M. Fredrikson, “Universal and Transferable Adversarial Attacks on Aligned Language Models,” Dec. 2023, arXiv:2307.15043 [cs]. [Online]. Available: http://arxiv.org/abs/2307.15043 5https://openrouter.ai/

  5. [5]

    “real attackers don’t compute gradients

    G. Apruzzese, H. S. Anderson, S. Dambra, D. Freeman, F. Pierazzi, and K. Roundy, ““real attackers don’t compute gradients”: bridging the gap between adversarial ml research and practice,” in2023 IEEE conference on secure and trustworthy machine learning (SaTML). IEEE, 2023, pp. 339–364. [Online]. Available: https: //ieeexplore.ieee.org/abstract/document/10136152/

  6. [6]

    “Do Anything Now

    X. Shen, Z. Chen, M. Backes, Y . Shen, and Y . Zhang, ““Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models,” inACM SIGSAC Conference on Computer and Communications Security (CCS). ACM, 2024

  7. [7]

    Autodan: interpretable gradient-based adversarial attacks on large language models

    S. Zhu, R. Zhang, B. An, G. Wu, J. Barrow, Z. Wang, F. Huang, A. Nenkova, and T. Sun, “AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models,” Dec. 2023, arXiv:2310.15140 [cs]. [Online]. Available: http://arxiv.org/abs/2310. 15140

  8. [8]

    GPT- 4 is too smart to be safe: Stealthy chat with LLMs via cipher,

    Y . Yuan, W. Jiao, W. Wang, J. tse Huang, P. He, S. Shi, and Z. Tu, “GPT- 4 is too smart to be safe: Stealthy chat with LLMs via cipher,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=MbfAK4s61A

  9. [9]

    Bypassing Prompt Injection and Jailbreak Detection in LLM Guardrails,

    W. Hackett, L. Birch, S. Trawicki, N. Suri, and P. Garraghan, “Bypassing Prompt Injection and Jailbreak Detection in LLM Guardrails,” Apr. 2025, arXiv:2504.11168 [cs] version: 1. [Online]. Available: http://arxiv.org/abs/2504.11168

  10. [10]

    A Survey of Attacks on Large Language Models,

    W. Xu and K. K. Parhi, “A Survey of Attacks on Large Language Models,” May 2025, arXiv:2505.12567 [cs]. [Online]. Available: http://arxiv.org/abs/2505.12567

  11. [11]

    V ocabulary Attack to Hijack Large Language Model Applications,

    P. Levi and C. P. Neumann, “V ocabulary Attack to Hijack Large Language Model Applications,” May 2024, arXiv:2404.02637 [cs]. [Online]. Available: http://arxiv.org/abs/2404.02637

  12. [12]

    Prompt injection attacks and defenses in llm-integrated applications

    Y . Liu, Y . Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and Benchmarking Prompt Injection Attacks and Defenses,” Nov. 2024, arXiv:2310.12815 [cs]. [Online]. Available: http://arxiv.org/abs/2310. 12815

  13. [13]

    Shi, et al., Optimization-based Prompt Injection Attack to LLM-as-a- Judge (2025), https: //arxiv.org/abs/2403.17710

    J. Shi, Z. Yuan, Y . Liu, Y . Huang, P. Zhou, L. Sun, and N. Z. Gong, “Optimization-based Prompt Injection Attack to LLM-as- a-Judge,” Mar. 2025, arXiv:2403.17710 [cs]. [Online]. Available: http://arxiv.org/abs/2403.17710

  14. [14]

    OW ASP Gen AI Security Project

    “OW ASP Gen AI Security Project.” [Online]. Available: https: //genai.owasp.org/

  15. [15]

    Protecting against indirect prompt injection attacks in MCP,

    S. Young, “Protecting against indirect prompt injection attacks in MCP,” Apr. 2025. [Online]. Available: https://developer.microsoft.com/ blog/protecting-against-indirect-injection-attacks-mcp

  16. [16]

    New hack uses prompt injection to corrupt Gemini’s long-term memory,

    D. Goodin, “New hack uses prompt injection to corrupt Gemini’s long-term memory,” Feb. 2025. [Online]. Available: https://arstechnica.com/security/2025/02/ new-hack-uses-prompt-injection-to-corrupt-geminis-long-term-memory/

  17. [17]

    Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

    K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection,” May 2023, arXiv:2302.12173 [cs]. [Online]. Available: http://arxiv.org/abs/ 2302.12173

  18. [18]

    Lessons from Defending Gemini Against Indirect Prompt Injections,

    C. Shi, S. Lin, S. Song, J. Hayes, I. Shumailov, I. Yona, J. Pluto, A. Pappu, C. A. Choquette-Choo, M. Nasr, C. Sitawarin, G. Gibson, A. Terzis, and J. F. Flynn, “Lessons from Defending Gemini Against Indirect Prompt Injections,” May 2025, arXiv:2505.14534 [cs]. [Online]. Available: http://arxiv.org/abs/2505.14534

  19. [19]

    Can Indirect Prompt Injection Attacks Be Detected and Removed?

    Y . Chen, H. Li, Y . Sui, Y . He, Y . Liu, Y . Song, and B. Hooi, “Can Indirect Prompt Injection Attacks Be Detected and Removed?” Aug. 2025, arXiv:2502.16580 [cs] version: 4. [Online]. Available: http://arxiv.org/abs/2502.16580

  20. [20]

    Security of AI Agents,

    Y . He, E. Wang, Y . Rong, Z. Cheng, and H. Chen, “Security of AI Agents,” Dec. 2024, arXiv:2406.08689 [cs]. [Online]. Available: http://arxiv.org/abs/2406.08689

  21. [21]

    LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge,

    S. Abdelnabi, A. Fay, A. Salem, E. Zverev, K.-C. Liao, C.-H. Liu, C.-C. Kuo, J. Weigend, D. Manlangit, A. Apostolov, H. Umair, J. Donato, M. Kawakita, A. Mahboob, T. H. Bach, T.-H. Chiang, M. Cho, H. Choi, B. Kim, H. Lee, B. Pannell, C. McCauley, M. Russinovich, A. Paverd, and G. Cherubin, “LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injecti...

  22. [22]

    PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI System,

    G. D. L. Munoz, A. J. Minnich, R. Lutz, R. Lundeen, R. S. R. Dheekonda, N. Chikanov, B.-E. Jagdagdorj, M. Pouliot, S. Chawla, W. Maxwell, B. Bullwinkel, K. Pratt, J. d. Gruyter, C. Siska, P. Bryan, T. Westerhoff, C. Kawaguchi, C. Seifert, R. S. S. Kumar, and Y . Zunger, “PyRIT: A Framework for Security Risk Identification and Red Teaming in Generative AI ...

  23. [23]

    GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

    J. Yu, X. Lin, Z. Yu, and X. Xing, “GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts,” Jun. 2024, arXiv:2309.10253 [cs]. [Online]. Available: http://arxiv.org/abs/ 2309.10253

  24. [24]

    Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition,

    E. Debenedetti, J. Rando, D. Paleka, S. F. Florin, D. Albastroiu, N. Cohen, Y . Lemberg, R. Ghosh, R. Wen, A. Salem, G. Cherubin, S. Zanella-Beguelin, R. Schmid, V . Klemm, T. Miki, C. Li, S. Kraft, M. Fritz, F. Tram `er, S. Abdelnabi, and L. Sch ¨onherr, “Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition,” Jun. 2024, arXiv:...

  25. [25]

    Plentiful jailbreaks with string compositions,

    B. R. Y . Huang, “Plentiful jailbreaks with string compositions,” 2024. [Online]. Available: https://arxiv.org/abs/2411.01084

  26. [26]

    The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

    E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel, “The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions,” Apr. 2024, arXiv:2404.13208 [cs]. [Online]. Available: http://arxiv.org/abs/2404.13208