pith. machine review for the scientific record. sign in

arxiv: 2604.07536 · v1 · submitted 2026-04-08 · 💻 cs.CR

Recognition: unknown

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:12 UTC · model grok-4.3

classification 💻 cs.CR
keywords tool poisoning attacksLLM tool integrationtrusted descriptionsstatic analysisdynamic verificationprompt injection defenseLLM security
0
0 comments X

The pith

TRUSTDESC generates accurate tool descriptions from code implementations to block implicit tool poisoning attacks in LLM applications.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TRUSTDESC to address tool poisoning attacks where malicious actors alter tool descriptions to trick LLMs into harmful actions or wrong tool choices. Instead of relying on potentially fake descriptions, the system extracts the real behavior straight from the tool's code through static slicing to remove irrelevant parts, synthesis of a description that avoids misleading claims, and dynamic checks by running sample tasks. This approach targets implicit attacks that hide in normal-sounding descriptions rather than obvious bad instructions. If successful it lets LLMs use external tools more reliably while keeping task success rates high and adding only small extra time and cost.

Core claim

TRUSTDESC is a three-stage framework that produces implementation-faithful tool descriptions by first using reachability-aware static analysis and LLM-guided debloating in SliceMin to isolate minimal relevant code, then synthesizing descriptions in DescGen that mitigate adversarial artifacts, and finally refining them in DynVer through dynamic task execution and behavioral validation, thereby preventing implicit tool poisoning attacks at their source.

What carries the argument

The three-stage pipeline of SliceMin for reachability-aware static analysis and LLM-guided code debloating to extract minimal tool slices, DescGen for synthesizing descriptions from those slices, and DynVer for dynamic verification via task execution.

If this is right

  • LLMs achieve higher task completion rates when using tools described by TRUSTDESC.
  • Implicit tool poisoning attacks are mitigated directly at the description source rather than through later detection.
  • The framework applies across 52 real-world tools from multiple ecosystems with minimal added time and monetary cost.
  • Descriptions become more trustworthy because they derive from actual code execution paths instead of user-provided text.
  • Existing detection-based defenses can be supplemented or replaced by this generation method.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Tool ecosystems could adopt automatic description generation as a standard upload requirement to reduce reliance on manual or untrusted metadata.
  • The same slicing and verification steps might extend to other LLM agent components like memory stores or API wrappers.
  • If scaled, this shifts security focus from prompt-level filtering to code-level faithfulness in AI tool use.
  • Developers could integrate similar pipelines into IDEs to produce safe descriptions before tools are shared.

Load-bearing premise

That reachability-aware static analysis combined with debloating and dynamic task runs will always capture every relevant behavior and hidden artifact in the tool code without missing details that affect description accuracy.

What would settle it

A real-world tool where the TRUSTDESC-generated description omits or misstates a behavior, causing an LLM to select or misuse the tool in a way that matches an implicit poisoning attack.

Figures

Figures reproduced from arXiv: 2604.07536 by Hengkai Ye, Hong Hu, Jinyuan Jia, Zhechang Zhang.

Figure 2
Figure 2. Figure 2: Explicit tool poisoning attack. The malicious instruc￾tion in description induces the LLM-integrated application to silently leak the user’s private key. through natural language descriptions. MCP contains three components: the server, client, and host. The server imple￾ments and registers tools and exposes their descriptions, like functionality and input schemas. The client retrieves tool metadata, invoke… view at source ↗
Figure 3
Figure 3. Figure 3: Competition between Context7 and exa-mcp-server. Positive words in tool descriptions bias tool selection toward get_code_context_era, resulting in violation of the user’s request. guidance. The objective of an explicit TPA is to steer the LLM into performing unauthorized or harmful actions, such as leak￾ing private data, bypassing safety constraints, or escalating privileges. With the malicious intent in t… view at source ↗
Figure 4
Figure 4. Figure 4: Code slice for search_arxiv. This tool does not support year-based filtering, since its entry function provides no year when calling search_handler (line 5), rendering lines 13-15 unreachable. tool poisoning attacks remain one of the most effective and practical forms of prompt injection in LLM applications. Several studies [29,31,43,63,67,71] leverage security poli￾cies to mitigate prompt injection. For i… view at source ↗
Figure 5
Figure 5. Figure 5: TRUSTDESC workflow. Given the tool name and source code, SliceMin performs reachability analysis to construct a minimal code slice. DescGen processes the slice and generates an initial description. DynVer iteratively refines the description through dynamic verification. code [42, 64]. These models can extract high-level seman￾tics and infer functionality from complex implementations, and produce concise na… view at source ↗
Figure 6
Figure 6. Figure 6: Call graph debloating on create_chart. SliceMin finds argument style not used and rewrites the code by removing lines 18, 21-24, 27-30 and adding line 25. Lines 34-35 show the difference in generated description with and without debloating. sures robustness against non-standard or framework-specific tool registration patterns. 4.1.2 Call Graph Construction and Debloating Given the identified entry function… view at source ↗
Figure 7
Figure 7. Figure 7: Tool selection rate in adaptive attacks. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
read the original abstract

Large language models (LLMs) increasingly rely on external tools to perform time-sensitive tasks and real-world actions. While tool integration expands LLM capabilities, it also introduces a new prompt-injection attack surface: tool poisoning attacks (TPAs). Attackers manipulate tool descriptions by embedding malicious instructions (explicit TPAs) or misleading claims (implicit TPAs) to influence model behavior and tool selection. Existing defenses mainly detect anomalous instructions and remain ineffective against implicit TPAs. In this paper, we present TRUSTDESC, the first framework for preventing tool poisoning by automatically generating trusted tool descriptions from implementations. TRUSTDESC derives implementation-faithful descriptions through a three-stage pipeline. SliceMin performs reachability-aware static analysis and LLM-guided debloating to extract minimal tool-relevant code slices. DescGen synthesizes descriptions from these slices while mitigating misleading or adversarial code artifacts. DynVer refines descriptions through dynamic verification by executing synthesized tasks and validating behavioral claims. We evaluate TRUSTDESC on 52 real-world tools across multiple tool ecosystems. Results show that TRUSTDESC produces accurate tool descriptions that improve task completion rates while mitigating implicit TPAs at their root, with minimal time and monetary overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents TRUSTDESC, the first framework to prevent tool poisoning attacks (TPAs) in LLM tool-using applications by automatically generating trusted, implementation-faithful tool descriptions. The approach uses a three-stage pipeline: SliceMin applies reachability-aware static analysis and LLM-guided debloating to extract minimal relevant code; DescGen synthesizes descriptions while mitigating adversarial artifacts; and DynVer performs dynamic verification by synthesizing and executing tasks to validate behavioral claims. Evaluation on 52 real-world tools across ecosystems claims that the resulting descriptions improve task completion rates, mitigate implicit TPAs at their root, and incur minimal time/monetary overhead.

Significance. If the pipeline reliably produces accurate descriptions, the work would be significant for LLM security: it shifts defense from reactive detection of poisoned descriptions to proactive generation of trusted ones grounded in code, addressing a gap in handling implicit TPAs that current methods miss. Strengths include the end-to-end pipeline design and evaluation across multiple tool ecosystems; reproducible artifacts or machine-checked elements are not mentioned.

major comments (2)
  1. [Evaluation] Evaluation section: the abstract and results claim positive outcomes (improved task completion, TPA mitigation) on 52 tools, yet no quantitative metrics, baselines, error bars, statistical tests, or exclusion criteria are reported. This prevents verification that the data supports the central claims about accuracy and overhead.
  2. [§3] §3 (SliceMin): reachability-aware static analysis on the call graph combined with LLM-guided debloating can under-approximate behaviors involving reflection, dynamic dispatch, callbacks, or environment-dependent branches. Any such omitted behaviors are never checked by DynVer (which only validates synthesized tasks), risking incomplete descriptions that fail to fully block implicit TPAs.
minor comments (2)
  1. [Abstract] Abstract: the claim of 'minimal time and monetary overhead' is stated without concrete measured values, comparison to baselines, or breakdown by stage.
  2. [Introduction] Notation: the distinction between explicit and implicit TPAs is introduced but not formalized with precise definitions or examples tied to the pipeline stages.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We have carefully reviewed the major comments and provide point-by-point responses below, indicating where revisions will be made to address the concerns.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the abstract and results claim positive outcomes (improved task completion, TPA mitigation) on 52 tools, yet no quantitative metrics, baselines, error bars, statistical tests, or exclusion criteria are reported. This prevents verification that the data supports the central claims about accuracy and overhead.

    Authors: We agree that the evaluation section would benefit from more explicit quantitative details to support verification of our claims. In the revised manuscript, we will expand Section 5 to report concrete metrics including task completion rates (with and without TRUSTDESC), direct comparisons to baseline description-generation approaches, error bars from repeated experimental runs, statistical significance tests, and clear exclusion criteria for the 52 tools. We will also provide precise numerical values for time and monetary overhead. These additions will be reflected in the abstract and results discussion as appropriate. revision: yes

  2. Referee: [§3] §3 (SliceMin): reachability-aware static analysis on the call graph combined with LLM-guided debloating can under-approximate behaviors involving reflection, dynamic dispatch, callbacks, or environment-dependent branches. Any such omitted behaviors are never checked by DynVer (which only validates synthesized tasks), risking incomplete descriptions that fail to fully block implicit TPAs.

    Authors: This is a valid point regarding inherent limitations of static analysis in the presence of dynamic language features. Our reachability analysis and debloating are designed to capture core tool behaviors for the evaluated real-world tools, and DynVer's task synthesis plus manual checks confirmed description accuracy in practice. However, we acknowledge that complete coverage of reflection, callbacks, and environment-dependent paths is undecidable in general. In the revision, we will add an explicit discussion of these limitations in §3 and §6, clarifying how DynVer mitigates risks for typical usage patterns while noting that the approach prioritizes practical TPA prevention over theoretical completeness. revision: partial

Circularity Check

0 steps flagged

No circularity: pipeline derives descriptions from independent static/dynamic analysis

full rationale

The paper's core claim rests on a three-stage pipeline (SliceMin reachability analysis + debloating, DescGen synthesis, DynVer dynamic task execution) that extracts and validates tool behavior directly from implementations. No equations, parameters, or results are shown to reduce to their own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and no fitted quantities are relabeled as predictions. The evaluation on 52 tools is presented as external validation rather than tautological. This is the common case of a self-contained empirical pipeline with no detectable circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on domain assumptions about the soundness of static analysis and dynamic verification for security properties; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Reachability-aware static analysis combined with LLM-guided debloating can extract minimal tool-relevant code slices without losing critical behavior
    Invoked in the SliceMin stage to produce faithful input for description generation.
  • domain assumption Dynamic execution of synthesized tasks can validate that generated descriptions accurately reflect tool behavior
    Core premise of the DynVer stage for refining descriptions.

pith-pipeline@v0.9.0 · 5505 in / 1302 out tokens · 63853 ms · 2026-05-10T17:12:00.515602+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

79 extracted references · 18 canonical work pages · 5 internal anchors

  1. [1]

    https://learnprompting.o rg/docs/prompt_hacking/defensive_measure s/instruction, 2023

    Instruction defense. https://learnprompting.o rg/docs/prompt_hacking/defensive_measure s/instruction, 2023

  2. [2]

    https://learnprompting.o rg/docs/prompt_hacking/defensive_measure s/sandwich_defense, 2023

    Sandwitch defense. https://learnprompting.o rg/docs/prompt_hacking/defensive_measure s/sandwich_defense, 2023

  3. [3]

    Cline.https://github.com/cline/cline, 2024

  4. [4]

    https: //github.com/langchain-ai/langchain, 2024

    LangChain: the platform for reliable agents. https: //github.com/langchain-ai/langchain, 2024

  5. [5]

    ht tps://menlovc.com/perspective/2025-the-s tate-of-generative-ai-in-the-enterprise/ , 2025

    2025: The State of Generative AI in the Enterprise. ht tps://menlovc.com/perspective/2025-the-s tate-of-generative-ai-in-the-enterprise/ , 2025

  6. [6]

    https://github.com/Tencent/A I-Infra-Guard, 2025

    AI-Infra-Guard. https://github.com/Tencent/A I-Infra-Guard, 2025

  7. [7]

    https://modelcontextprotocol.io/docs/dev elop/connect-remote-servers, 2025

    Best Practices for Using Remote MCP Servers. https://modelcontextprotocol.io/docs/dev elop/connect-remote-servers, 2025

  8. [8]

    https://github.com/invar iantlabs-ai/mcp-scan, 2025

    Constrain, log and scan your MCP connections for se- curity vulnerabilities. https://github.com/invar iantlabs-ai/mcp-scan, 2025

  9. [9]

    https://github.com/A syncFuncAI/deepwiki-open, 2025

    DeepWiki: AI-Powered Wiki Generator for GitHub/Git- lab/Bitbucket Repositories. https://github.com/A syncFuncAI/deepwiki-open, 2025

  10. [10]

    https://github.c om/github/github-mcp-server, 2025

    GitHub’s official MCP Server. https://github.c om/github/github-mcp-server, 2025

  11. [11]

    https://huggingface.co/h ub, 2025

    Hugging Face Hub. https://huggingface.co/h ub, 2025

  12. [12]

    https://openrouter.ai/rankin gs, 2025

    LLM Rankings. https://openrouter.ai/rankin gs, 2025

  13. [13]

    https://invariantlabs.ai/blog/mcp-secur ity-notification-tool-poisoning-attacks , 2025

    MCP Security Notification: Tool Poisoning Attacks. https://invariantlabs.ai/blog/mcp-secur ity-notification-tool-poisoning-attacks , 2025

  14. [14]

    https://blog.virtueai.com/2025/ 08/22/mcpguard-first-agent-based-mcp-sca nner-to-protect-ai-agents/, 2025

    MCPGuard: First Agent-based MCP Scanner to Protect AI Agents. https://blog.virtueai.com/2025/ 08/22/mcpguard-first-agent-based-mcp-sca nner-to-protect-ai-agents/, 2025

  15. [15]

    https://github.com/antgroup/MCPS can, 2025

    MCPScan. https://github.com/antgroup/MCPS can, 2025

  16. [16]

    https: //openrouter.ai/, 2025

    OpenRouter: The Unified Interface For LLMs. https: //openrouter.ai/, 2025

  17. [17]

    https://owasp.org/www-project-top -10-for-large-language-model-application s/, 2025

    OWASP Top 10 for Large Language Model Appli- cations. https://owasp.org/www-project-top -10-for-large-language-model-application s/, 2025

  18. [18]

    https://github.com/mic rosoft/playwright-mcp, 2025

    Playwright MCP server. https://github.com/mic rosoft/playwright-mcp, 2025

  19. [19]

    https://github.com/cisco-ai-defen se/mcp-scanner, 2025

    Scan MCP servers for potential threats and security findings. https://github.com/cisco-ai-defen se/mcp-scanner, 2025

  20. [20]

    https://tree-sitter.github.io/tr ee-sitter/, 2025

    Tree-sitter. https://tree-sitter.github.io/tr ee-sitter/, 2025

  21. [21]

    https://invariantlabs.ai/b log/whatsapp-mcp-exploited, 2025

    WhatsApp MCP Exploited: Exfiltrating your message history via MCP. https://invariantlabs.ai/b log/whatsapp-mcp-exploited, 2025

  22. [22]

    Get my drift? catching llm task drift with activation deltas

    Sahar Abdelnabi, Aideen Fay, Giovanni Cherubin, Ahmed Salem, Mario Fritz, and Andrew Paverd. Get my drift? catching llm task drift with activation deltas. InSaTML, 2025

  23. [23]

    A Survey of Malware Detection Using Deep Learning.Machine Learning with Applications, 16:100546, 2024

    Ahmed Bensaoud, Jugal Kalita, and Mahmoud Ben- saoud. A Survey of Malware Detection Using Deep Learning.Machine Learning with Applications, 16:100546, 2024

  24. [24]

    Automated Machine Learning for Deep Learn- ing Based Malware Detection.Computers & Security, 137:103582, 2024

    Austin Brown, Maanak Gupta, and Mahmoud Abdel- salam. Automated Machine Learning for Deep Learn- ing Based Malware Detection.Computers & Security, 137:103582, 2024

  25. [25]

    Yu, Qiang Yang, and Xing Xie

    Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxi- ang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie. A survey on evaluation of large language models.ACM transactions on intelligent systems and technology, 15(3):1–45, 2024

  26. [26]

    StruQ: Defending against Prompt Injection with Structured Queries

    Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. StruQ: Defending against Prompt Injection with Structured Queries. InProceedings of the 34th USENIX Security Symposium (USENIX Security 25), pages 2383–2400, 2025

  27. [27]

    Secalign: Defending against prompt injection with preference optimization

    Sizhe Chen, Arman Zharmagambetov, Saeed Mahlou- jifar, Kamalika Chaudhuri, David Wagner, and Chuan Guo. Secalign: Defending against prompt injection with preference optimization. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communi- cations Security, pages 2833–2847, 2025

  28. [28]

    Meta secalign: A secure foundation llm against prompt injection attacks,

    Sizhe Chen, Arman Zharmagambetov, David Wagner, and Chuan Guo. Meta secalign: A secure foundation llm against prompt injection attacks.arXiv preprint arXiv:2507.02735, 2025. 14

  29. [29]

    Securing AI agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025

    Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Se- curing ai agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025

  30. [30]

    Mathsensei: a tool-augmented large language model for mathematical reasoning

    Debrup Das, Debopriyo Banerjee, Somak Aditya, and Ashish Kulkarni. Mathsensei: a tool-augmented large language model for mathematical reasoning.arXiv preprint arXiv:2402.17231, 2024

  31. [31]

    Defeating Prompt Injections by Design

    Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025

  32. [32]

    Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.NeurIPS, 2024

    Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.NeurIPS, 2024

  33. [33]

    Awesome MCP Servers

    Frank Fiegel. Awesome MCP Servers. https://gith ub.com/punkpeye/awesome-mcp-servers, 2025

  34. [34]

    Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion. InProceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023

  35. [35]

    A survey on llm-as-a-judge

    Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xue- hao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, et al. A survey on llm-as-a-judge. The Innovation, 2024

  36. [36]

    A survey on hallucination in large language models: Prin- ciples, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 43(2):1–55, 2025

    Lei Huang, Weijiang Yu, Weitao Ma, Zhangyin Zhong, Weihong Feng, Haotian Wang, Qianglong Chen, Wei- hua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallucination in large language models: Prin- ciples, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 43(2):1–55, 2025

  37. [37]

    Attention tracker: Detecting prompt injection attacks in llms

    Kuo-Han Hung, Ching-Yun Ko, Ambrish Rawat, I-Hsin Chung, Winston H Hsu, and Pin-Yu Chen. Attention tracker: Detecting prompt injection attacks in llms. In NAACL, 2025

  38. [38]

    Llms can be easily confused by instructional distractions.arXiv preprint arXiv:2502.04362, 2025

    Yerin Hwang, Yongil Kim, Jahyun Koo, Taegwan Kang, Hyunkyung Bae, and Kyomin Jung. Llms can be easily confused by instructional distractions.arXiv preprint arXiv:2502.04362, 2025

  39. [39]

    MCP Tool Poisoning Experiments

    Invariant Labs. MCP Tool Poisoning Experiments. https://github.com/invariantlabs-ai/mcp-i njection-experiments/tree/main, 2025

  40. [40]

    Promptshield: Deployable detection for prompt injection attacks

    Dennis Jacob, Hend Alzahrani, Zhanhao Hu, Basel Alo- mair, and David Wagner. Promptshield: Deployable detection for prompt injection attacks. InProceedings of the Fifteenth ACM Conference on Data and Applica- tion Security and Privacy, pages 341–352, 2024

  41. [41]

    Survey of hallucination in natural language generation.ACM computing surveys, 55(12):1– 38, 2023

    Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation.ACM computing surveys, 55(12):1– 38, 2023

  42. [42]

    Automatic code documentation generation using gpt-3

    Junaed Younus Khan and Gias Uddin. Automatic code documentation generation using gpt-3. InProceedings of the 37th IEEE/ACM International Conference on Au- tomated Software Engineering, pages 1–6, 2022

  43. [43]

    Prompt flow integrity to prevent privilege escalation in LLM agents.arXiv preprint arXiv:2503.15547, 2025

    Juhee Kim, Woohyuk Choi, and Byoungyoung Lee. Prompt flow integrity to prevent privilege escalation in llm agents.arXiv preprint arXiv:2503.15547, 2025

  44. [44]

    Effective and Efficient Malware Detec- tion at the End Host

    Clemens Kolbitsch, Paolo Milani Comparetti, Christo- pher Kruegel, Engin Kirda, Xiao yong Zhou, and Xi- aoFeng Wang. Effective and Efficient Malware Detec- tion at the End Host. InProceedings of the USENIX Security Symposium (USENIX Security 2009), pages 351–366, 2009

  45. [45]

    arXiv preprint arXiv:2410.22770 , year=

    Hao Li and Xiaogeng Liu. Injecguard: Benchmark- ing and mitigating over-defense in prompt injection guardrail models.arXiv preprint arXiv:2410.22770, 2024

  46. [46]

    Piguard: Prompt injection guardrail via mitigating overdefense for free

    Hao Li, Xiaogeng Liu, Ning Zhang, and Chaowei Xiao. Piguard: Prompt injection guardrail via mitigating overdefense for free. InACL, 2025

  47. [47]

    Yi Liu, Zhihao Chen, Yanjun Zhang, Gelei Deng, Yuekang Li, Jianting Ning, Ying Zhang, and Leo Yu Zhang

    Hao Li, Yankai Yang, G Edward Suh, Ning Zhang, and Chaowei Xiao. Reasalign: Reasoning enhanced safety alignment against prompt injection attack.arXiv preprint arXiv:2601.10173, 2026

  48. [48]

    Les Dissonances: Cross-Tool Harvesting and Polluting in Multi-Tool Empowered LLM Agents

    Zichuan Li, Jian Cui, Xiaojing Liao, and Luyi Xing. Les Dissonances: Cross-Tool Harvesting and Polluting in Multi-Tool Empowered LLM Agents. InProceedings of the 33rd Annual Network and Distributed System Secu- rity Symposium (NDSS 2026), San Diego, CA, February 2026

  49. [49]

    Automatic and universal prompt injection attacks against large language models.arXiv, 2024

    Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, and Chaowei Xiao. Automatic and universal prompt injection attacks against large language models.arXiv, 2024. 15

  50. [50]

    Prompt Injection attack against LLM-integrated Applications

    Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, and Leo Yu Zhang. Prompt in- jection attack against llm-integrated applications.arXiv preprint arXiv:2306.05499, 2023

  51. [51]

    Formalizing and benchmark- ing prompt injection attacks and defenses

    Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and benchmark- ing prompt injection attacks and defenses. InUSENIX Security, 2024

  52. [52]

    Datasentinel: A game-theoretic detection of prompt injection attacks

    Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, and Neil Zhenqiang Gong. Datasentinel: A game-theoretic detection of prompt injection attacks. In2025 IEEE Symposium on Security and Privacy (SP), pages 2190–

  53. [53]

    PromptGuard Prompt Injection Guardrail

    Meta. PromptGuard Prompt Injection Guardrail. https://www.llama.com/docs/model-cards-a nd-prompt-formats/prompt-guard/, 2024

  54. [54]

    Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, and Florian Tramèr

    Milad Nasr, Nicholas Carlini, Chawin Sitawarin, Sander V Schulhoff, Jamie Hayes, Michael Ilie, Juli- ette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shu- mailov, Abhradeep Thakurta, Kai Yuanqing Xiao, An- dreas Terzis, and Florian Tramer. The attacker moves second: Stronger adaptive attacks bypass defenses against llm jailbreaks and prompt injections.arXiv...

  55. [55]

    A comprehen- sive overview of large language models.ACM Transac- tions on Intelligent Systems and Technology, 16(5):1–72, 2025

    Humza Naveed, Asad Ullah Khan, Shi Qiu, Muham- mad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. A comprehen- sive overview of large language models.ACM Transac- tions on Intelligent Systems and Technology, 16(5):1–72, 2025

  56. [56]

    Neural exec: Learning (and learning from) exe- cution triggers for prompt injection attacks

    Dario Pasquini, Martin Strohmeier, and Carmela Tron- coso. Neural exec: Learning (and learning from) exe- cution triggers for prompt injection attacks. InAISec, 2024

  57. [57]

    Ignore previous prompt: Attack techniques for language models

    Fábio Perez and Ian Ribeiro. Ignore previous prompt: Attack techniques for language models. InNeurIPS ML Safety Workshop, 2022

  58. [58]

    Jatmo: Prompt injection defense by task-specific finetuning

    Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, and David Wagner. Jatmo: Prompt injection defense by task-specific finetuning. InESORICS, 2024

  59. [59]

    Fine-tuned deberta-v3-base for prompt injection detection, 2024

    ProtectAI.com. Fine-tuned deberta-v3-base for prompt injection detection, 2024

  60. [60]

    Tool learning with large language models: A survey

    Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, and Ji-Rong Wen. Tool learning with large language models: A survey. Frontiers of Computer Science, 19(8):198343, 2025

  61. [61]

    Large language models can be easily distracted by irrelevant context

    Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H Chi, Nathanael Schärli, and Denny Zhou. Large language models can be easily distracted by irrelevant context. InInternational Confer- ence on Machine Learning, pages 31210–31227. PMLR, 2023

  62. [62]

    Prompt Injec- tion Attack to Tool Selection in LLM Agents

    Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, and Lichao Sun. Prompt Injec- tion Attack to Tool Selection in LLM Agents. InPro- ceedings of the 33rd Annual Network and Distributed System Security Symposium (NDSS 2026), San Diego, CA, February 2026

  63. [63]

    Progent: Securing AI Agents with Privilege Control

    Tianneng Shi, Jingxuan He, Zhun Wang, Linyu Wu, Hongwei Li, Wenbo Guo, and Dawn Song. Progent: Programmable privilege control for llm agents.arXiv preprint arXiv:2504.11703, 2025

  64. [64]

    Source code summarization in the era of large language models

    Weisong Sun, Yun Miao, Yuekang Li, Hongyu Zhang, Chunrong Fang, Yi Liu, Gelei Deng, Yang Liu, and Zhenyu Chen. Source code summarization in the era of large language models. In2025 IEEE/ACM 47th Inter- national Conference on Software Engineering (ICSE), pages 1882–1894. IEEE, 2025

  65. [65]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean- Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalk- wyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023

  66. [66]

    The instruction hier- archy: Training llms to prioritize privileged instructions

    Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Jo- hannes Heidecke, and Alex Beutel. The instruction hier- archy: Training llms to prioritize privileged instructions. arXiv, 2024

  67. [67]

    Agentarmor: Enforcing program analysis on agent runtime trace to defend against prompt injection.arXiv preprint arXiv:2508.01249, 2025

    Peiran Wang, Yang Liu, Yunfei Lu, Yifeng Cai, Hongbo Chen, Qingyou Yang, Jie Zhang, Jue Hong, and Ye Wu. Agentarmor: Enforcing program analysis on agent run- time trace to defend against prompt injection.arXiv preprint arXiv:2508.01249, 2025

  68. [68]

    Mcp-bench: Benchmarking tool-using llm agents with complex real-world tasks via mcp servers.arXiv preprint arXiv:2508.20453,

    Zhenting Wang, Qi Chang, Hemani Patel, Shashank Biju, Cheng-En Wu, Quan Liu, Aolin Ding, Alireza Reza- zadeh, Ankit Shah, Yujia Bao, et al. Mcp-bench: Bench- marking tool-using llm agents with complex real-world tasks via mcp servers.arXiv preprint arXiv:2508.20453, 2025

  69. [69]

    Delimiters won’t save you from prompt injection

    Simon Willison. Delimiters won’t save you from prompt injection. https://simonwillison.net/2023/Ma y/11/delimiters-wont-save-you, 2023

  70. [70]

    Gemini introduces Personal Intelli- gence

    Josh Woodward. Gemini introduces Personal Intelli- gence. https://blog.google/innovation-and -ai/products/gemini-app/personal-intelli gence/, 2025. 16

  71. [71]

    System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective

    Fangzhou Wu, Ethan Cecchetti, and Chaowei Xiao. System-level defense against indirect prompt injection attacks: An information flow control perspective.arXiv preprint arXiv:2409.19091, 2024

  72. [72]

    How easily do irrelevant inputs skew the responses of large language models? In First Conference on Language Modeling

    Siye Wu, Jian Xie, Jiangjie Chen, Tinghui Zhu, Kai Zhang, and Yanghua Xiao. How easily do irrelevant inputs skew the responses of large language models? In First Conference on Language Modeling

  73. [73]

    In- structional segment embedding: Improving llm safety with instruction hierarchy

    Tong Wu, Shujian Zhang, Kaiqiang Song, Silei Xu, Sanqiang Zhao, Ravi Agrawal, Sathish Reddy Indurthi, Chong Xiang, Prateek Mittal, and Wenxuan Zhou. In- structional segment embedding: Improving llm safety with instruction hierarchy. InThe Thirteenth Interna- tional Conference on Learning Representations, 2025

  74. [74]

    Isolategpt: An execution iso- lation architecture for llm-based agentic systems

    Yuhao Wu, Franziska Roesner, Tadayoshi Kohno, Ning Zhang, and Umar Iqbal. Isolategpt: An execution iso- lation architecture for llm-based agentic systems. In NDSS, 2025

  75. [75]

    Craft: Customizing llms by creating and retrieving from specialized toolsets.arXiv preprint arXiv:2309.17428, 2023

    Lifan Yuan, Yangyi Chen, Xingyao Wang, Yi R Fung, Hao Peng, and Heng Ji. Craft: Customizing llms by creating and retrieving from specialized toolsets.arXiv preprint arXiv:2309.17428, 2023

  76. [76]

    Browsesafe: Understanding and preventing prompt injection within AI browser agents.arXiv preprint arXiv:2511.20597, 2025

    Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley, Jerry Ma, Denis Yarats, and Ninghui Li. Browsesafe: Un- derstanding and preventing prompt injection within ai browser agents.arXiv preprint arXiv:2511.20597, 2025

  77. [77]

    A Survey of Large Language Models

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. A survey of large language mod- els.arXiv preprint arXiv:2303.18223, 1(2), 2023

  78. [78]

    Judging llm-as- a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595– 46623, 2023

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuo- han Li, Dacheng Li, Eric Xing, et al. Judging llm-as- a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595– 46623, 2023

  79. [79]

    Attention is all you need to defend against indirect prompt injection attacks in llms

    Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, and Wenyuan Xu. Attention is all you need to defend against indirect prompt injection attacks in llms. InNDSS, 2026. 17