arxiv: 2604.07536 · v1 · submitted 2026-04-08 · 💻 cs.CR

Recognition: unknown

TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation

Hengkai Ye , Zhechang Zhang , Jinyuan Jia , Hong Hu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:12 UTC · model grok-4.3

classification 💻 cs.CR

keywords tool poisoning attacksLLM tool integrationtrusted descriptionsstatic analysisdynamic verificationprompt injection defenseLLM security

0 comments

The pith

TRUSTDESC generates accurate tool descriptions from code implementations to block implicit tool poisoning attacks in LLM applications.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TRUSTDESC to address tool poisoning attacks where malicious actors alter tool descriptions to trick LLMs into harmful actions or wrong tool choices. Instead of relying on potentially fake descriptions, the system extracts the real behavior straight from the tool's code through static slicing to remove irrelevant parts, synthesis of a description that avoids misleading claims, and dynamic checks by running sample tasks. This approach targets implicit attacks that hide in normal-sounding descriptions rather than obvious bad instructions. If successful it lets LLMs use external tools more reliably while keeping task success rates high and adding only small extra time and cost.

Core claim

TRUSTDESC is a three-stage framework that produces implementation-faithful tool descriptions by first using reachability-aware static analysis and LLM-guided debloating in SliceMin to isolate minimal relevant code, then synthesizing descriptions in DescGen that mitigate adversarial artifacts, and finally refining them in DynVer through dynamic task execution and behavioral validation, thereby preventing implicit tool poisoning attacks at their source.

What carries the argument

The three-stage pipeline of SliceMin for reachability-aware static analysis and LLM-guided code debloating to extract minimal tool slices, DescGen for synthesizing descriptions from those slices, and DynVer for dynamic verification via task execution.

If this is right

LLMs achieve higher task completion rates when using tools described by TRUSTDESC.
Implicit tool poisoning attacks are mitigated directly at the description source rather than through later detection.
The framework applies across 52 real-world tools from multiple ecosystems with minimal added time and monetary cost.
Descriptions become more trustworthy because they derive from actual code execution paths instead of user-provided text.
Existing detection-based defenses can be supplemented or replaced by this generation method.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Tool ecosystems could adopt automatic description generation as a standard upload requirement to reduce reliance on manual or untrusted metadata.
The same slicing and verification steps might extend to other LLM agent components like memory stores or API wrappers.
If scaled, this shifts security focus from prompt-level filtering to code-level faithfulness in AI tool use.
Developers could integrate similar pipelines into IDEs to produce safe descriptions before tools are shared.

Load-bearing premise

That reachability-aware static analysis combined with debloating and dynamic task runs will always capture every relevant behavior and hidden artifact in the tool code without missing details that affect description accuracy.

What would settle it

A real-world tool where the TRUSTDESC-generated description omits or misstates a behavior, causing an LLM to select or misuse the tool in a way that matches an implicit poisoning attack.

Figures

Figures reproduced from arXiv: 2604.07536 by Hengkai Ye, Hong Hu, Jinyuan Jia, Zhechang Zhang.

**Figure 2.** Figure 2: Explicit tool poisoning attack. The malicious instruction in description induces the LLM-integrated application to silently leak the user’s private key. through natural language descriptions. MCP contains three components: the server, client, and host. The server implements and registers tools and exposes their descriptions, like functionality and input schemas. The client retrieves tool metadata, invoke… view at source ↗

**Figure 3.** Figure 3: Competition between Context7 and exa-mcp-server. Positive words in tool descriptions bias tool selection toward get_code_context_era, resulting in violation of the user’s request. guidance. The objective of an explicit TPA is to steer the LLM into performing unauthorized or harmful actions, such as leaking private data, bypassing safety constraints, or escalating privileges. With the malicious intent in t… view at source ↗

**Figure 4.** Figure 4: Code slice for search_arxiv. This tool does not support year-based filtering, since its entry function provides no year when calling search_handler (line 5), rendering lines 13-15 unreachable. tool poisoning attacks remain one of the most effective and practical forms of prompt injection in LLM applications. Several studies [29,31,43,63,67,71] leverage security policies to mitigate prompt injection. For i… view at source ↗

**Figure 5.** Figure 5: TRUSTDESC workflow. Given the tool name and source code, SliceMin performs reachability analysis to construct a minimal code slice. DescGen processes the slice and generates an initial description. DynVer iteratively refines the description through dynamic verification. code [42, 64]. These models can extract high-level semantics and infer functionality from complex implementations, and produce concise na… view at source ↗

**Figure 6.** Figure 6: Call graph debloating on create_chart. SliceMin finds argument style not used and rewrites the code by removing lines 18, 21-24, 27-30 and adding line 25. Lines 34-35 show the difference in generated description with and without debloating. sures robustness against non-standard or framework-specific tool registration patterns. 4.1.2 Call Graph Construction and Debloating Given the identified entry function… view at source ↗

**Figure 7.** Figure 7: Tool selection rate in adaptive attacks. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

read the original abstract

Large language models (LLMs) increasingly rely on external tools to perform time-sensitive tasks and real-world actions. While tool integration expands LLM capabilities, it also introduces a new prompt-injection attack surface: tool poisoning attacks (TPAs). Attackers manipulate tool descriptions by embedding malicious instructions (explicit TPAs) or misleading claims (implicit TPAs) to influence model behavior and tool selection. Existing defenses mainly detect anomalous instructions and remain ineffective against implicit TPAs. In this paper, we present TRUSTDESC, the first framework for preventing tool poisoning by automatically generating trusted tool descriptions from implementations. TRUSTDESC derives implementation-faithful descriptions through a three-stage pipeline. SliceMin performs reachability-aware static analysis and LLM-guided debloating to extract minimal tool-relevant code slices. DescGen synthesizes descriptions from these slices while mitigating misleading or adversarial code artifacts. DynVer refines descriptions through dynamic verification by executing synthesized tasks and validating behavioral claims. We evaluate TRUSTDESC on 52 real-world tools across multiple tool ecosystems. Results show that TRUSTDESC produces accurate tool descriptions that improve task completion rates while mitigating implicit TPAs at their root, with minimal time and monetary overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TRUSTDESC generates tool descriptions from code slices to block both explicit and implicit poisoning, but the evaluation is too light on metrics and edge cases to confirm it works reliably.

read the letter

The main thing here is that TRUSTDESC generates trusted tool descriptions straight from the tool implementation instead of trusting or patching the ones attackers can manipulate. It uses reachability-aware slicing plus LLM debloating to pull minimal relevant code, synthesizes a description from that slice, and then runs dynamic verification on synthesized tasks to check the claims. This is positioned as the first proactive generation method rather than post-hoc detection, which makes sense given how implicit TPAs can slip past anomaly checks. On 52 real-world tools it reports better task completion, attack mitigation, and low overhead, which would be useful if the numbers are solid. The pipeline itself is a reasonable combination of static analysis and runtime checks for this setting. The soft spots are in the evidence. The abstract claims positive results but gives no accuracy numbers for the generated descriptions, no baselines, no error bars, and no breakdown of how they measured mitigation success or task rates. Without those details it is hard to judge whether the descriptions are actually faithful or just good enough for the tested cases. The reachability analysis plus debloating step can easily miss runtime-dependent branches, reflection, callbacks, or environment-specific paths, and the dynamic verification only covers the tasks they chose to synthesize, so any missed behavior stays unchecked. That directly hits the central claim that this mitigates implicit attacks at the root. If even a few of the 52 tools have those patterns, the descriptions could still be incomplete. This is for people working on LLM agent security and tool integration. A reader who wants a concrete prevention idea rather than another detector would find it worth reading, even with the gaps. It deserves a serious referee to examine the implementation, the exact metrics, and whether the slicing really covers the behaviors that matter.

Referee Report

2 major / 2 minor

Summary. The paper presents TRUSTDESC, the first framework to prevent tool poisoning attacks (TPAs) in LLM tool-using applications by automatically generating trusted, implementation-faithful tool descriptions. The approach uses a three-stage pipeline: SliceMin applies reachability-aware static analysis and LLM-guided debloating to extract minimal relevant code; DescGen synthesizes descriptions while mitigating adversarial artifacts; and DynVer performs dynamic verification by synthesizing and executing tasks to validate behavioral claims. Evaluation on 52 real-world tools across ecosystems claims that the resulting descriptions improve task completion rates, mitigate implicit TPAs at their root, and incur minimal time/monetary overhead.

Significance. If the pipeline reliably produces accurate descriptions, the work would be significant for LLM security: it shifts defense from reactive detection of poisoned descriptions to proactive generation of trusted ones grounded in code, addressing a gap in handling implicit TPAs that current methods miss. Strengths include the end-to-end pipeline design and evaluation across multiple tool ecosystems; reproducible artifacts or machine-checked elements are not mentioned.

major comments (2)

[Evaluation] Evaluation section: the abstract and results claim positive outcomes (improved task completion, TPA mitigation) on 52 tools, yet no quantitative metrics, baselines, error bars, statistical tests, or exclusion criteria are reported. This prevents verification that the data supports the central claims about accuracy and overhead.
[§3] §3 (SliceMin): reachability-aware static analysis on the call graph combined with LLM-guided debloating can under-approximate behaviors involving reflection, dynamic dispatch, callbacks, or environment-dependent branches. Any such omitted behaviors are never checked by DynVer (which only validates synthesized tasks), risking incomplete descriptions that fail to fully block implicit TPAs.

minor comments (2)

[Abstract] Abstract: the claim of 'minimal time and monetary overhead' is stated without concrete measured values, comparison to baselines, or breakdown by stage.
[Introduction] Notation: the distinction between explicit and implicit TPAs is introduced but not formalized with precise definitions or examples tied to the pipeline stages.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We have carefully reviewed the major comments and provide point-by-point responses below, indicating where revisions will be made to address the concerns.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the abstract and results claim positive outcomes (improved task completion, TPA mitigation) on 52 tools, yet no quantitative metrics, baselines, error bars, statistical tests, or exclusion criteria are reported. This prevents verification that the data supports the central claims about accuracy and overhead.

Authors: We agree that the evaluation section would benefit from more explicit quantitative details to support verification of our claims. In the revised manuscript, we will expand Section 5 to report concrete metrics including task completion rates (with and without TRUSTDESC), direct comparisons to baseline description-generation approaches, error bars from repeated experimental runs, statistical significance tests, and clear exclusion criteria for the 52 tools. We will also provide precise numerical values for time and monetary overhead. These additions will be reflected in the abstract and results discussion as appropriate. revision: yes
Referee: [§3] §3 (SliceMin): reachability-aware static analysis on the call graph combined with LLM-guided debloating can under-approximate behaviors involving reflection, dynamic dispatch, callbacks, or environment-dependent branches. Any such omitted behaviors are never checked by DynVer (which only validates synthesized tasks), risking incomplete descriptions that fail to fully block implicit TPAs.

Authors: This is a valid point regarding inherent limitations of static analysis in the presence of dynamic language features. Our reachability analysis and debloating are designed to capture core tool behaviors for the evaluated real-world tools, and DynVer's task synthesis plus manual checks confirmed description accuracy in practice. However, we acknowledge that complete coverage of reflection, callbacks, and environment-dependent paths is undecidable in general. In the revision, we will add an explicit discussion of these limitations in §3 and §6, clarifying how DynVer mitigates risks for typical usage patterns while noting that the approach prioritizes practical TPA prevention over theoretical completeness. revision: partial

Circularity Check

0 steps flagged

No circularity: pipeline derives descriptions from independent static/dynamic analysis

full rationale

The paper's core claim rests on a three-stage pipeline (SliceMin reachability analysis + debloating, DescGen synthesis, DynVer dynamic task execution) that extracts and validates tool behavior directly from implementations. No equations, parameters, or results are shown to reduce to their own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and no fitted quantities are relabeled as predictions. The evaluation on 52 tools is presented as external validation rather than tautological. This is the common case of a self-contained empirical pipeline with no detectable circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on domain assumptions about the soundness of static analysis and dynamic verification for security properties; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)

domain assumption Reachability-aware static analysis combined with LLM-guided debloating can extract minimal tool-relevant code slices without losing critical behavior
Invoked in the SliceMin stage to produce faithful input for description generation.
domain assumption Dynamic execution of synthesized tasks can validate that generated descriptions accurately reflect tool behavior
Core premise of the DynVer stage for refining descriptions.

pith-pipeline@v0.9.0 · 5505 in / 1302 out tokens · 63853 ms · 2026-05-10T17:12:00.515602+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

79 extracted references · 18 canonical work pages · 5 internal anchors

[1]

https://learnprompting.o rg/docs/prompt_hacking/defensive_measure s/instruction, 2023

Instruction defense. https://learnprompting.o rg/docs/prompt_hacking/defensive_measure s/instruction, 2023

2023
[2]

https://learnprompting.o rg/docs/prompt_hacking/defensive_measure s/sandwich_defense, 2023

Sandwitch defense. https://learnprompting.o rg/docs/prompt_hacking/defensive_measure s/sandwich_defense, 2023

2023
[3]

Cline.https://github.com/cline/cline, 2024

2024
[4]

https: //github.com/langchain-ai/langchain, 2024

LangChain: the platform for reliable agents. https: //github.com/langchain-ai/langchain, 2024

2024
[5]

ht tps://menlovc.com/perspective/2025-the-s tate-of-generative-ai-in-the-enterprise/ , 2025

2025: The State of Generative AI in the Enterprise. ht tps://menlovc.com/perspective/2025-the-s tate-of-generative-ai-in-the-enterprise/ , 2025

2025
[6]

https://github.com/Tencent/A I-Infra-Guard, 2025

AI-Infra-Guard. https://github.com/Tencent/A I-Infra-Guard, 2025

2025
[7]

https://modelcontextprotocol.io/docs/dev elop/connect-remote-servers, 2025

Best Practices for Using Remote MCP Servers. https://modelcontextprotocol.io/docs/dev elop/connect-remote-servers, 2025

2025
[8]

https://github.com/invar iantlabs-ai/mcp-scan, 2025

Constrain, log and scan your MCP connections for se- curity vulnerabilities. https://github.com/invar iantlabs-ai/mcp-scan, 2025

2025
[9]

https://github.com/A syncFuncAI/deepwiki-open, 2025

DeepWiki: AI-Powered Wiki Generator for GitHub/Git- lab/Bitbucket Repositories. https://github.com/A syncFuncAI/deepwiki-open, 2025

2025
[10]

https://github.c om/github/github-mcp-server, 2025

GitHub’s official MCP Server. https://github.c om/github/github-mcp-server, 2025

2025
[11]

https://huggingface.co/h ub, 2025

Hugging Face Hub. https://huggingface.co/h ub, 2025

2025
[12]

https://openrouter.ai/rankin gs, 2025

LLM Rankings. https://openrouter.ai/rankin gs, 2025

2025
[13]

https://invariantlabs.ai/blog/mcp-secur ity-notification-tool-poisoning-attacks , 2025

MCP Security Notification: Tool Poisoning Attacks. https://invariantlabs.ai/blog/mcp-secur ity-notification-tool-poisoning-attacks , 2025

2025
[14]

https://blog.virtueai.com/2025/ 08/22/mcpguard-first-agent-based-mcp-sca nner-to-protect-ai-agents/, 2025

MCPGuard: First Agent-based MCP Scanner to Protect AI Agents. https://blog.virtueai.com/2025/ 08/22/mcpguard-first-agent-based-mcp-sca nner-to-protect-ai-agents/, 2025

2025
[15]

https://github.com/antgroup/MCPS can, 2025

MCPScan. https://github.com/antgroup/MCPS can, 2025

2025
[16]

https: //openrouter.ai/, 2025

OpenRouter: The Unified Interface For LLMs. https: //openrouter.ai/, 2025

2025
[17]

https://owasp.org/www-project-top -10-for-large-language-model-application s/, 2025

OWASP Top 10 for Large Language Model Appli- cations. https://owasp.org/www-project-top -10-for-large-language-model-application s/, 2025

2025
[18]

https://github.com/mic rosoft/playwright-mcp, 2025

Playwright MCP server. https://github.com/mic rosoft/playwright-mcp, 2025

2025
[19]

https://github.com/cisco-ai-defen se/mcp-scanner, 2025

Scan MCP servers for potential threats and security findings. https://github.com/cisco-ai-defen se/mcp-scanner, 2025

2025
[20]

https://tree-sitter.github.io/tr ee-sitter/, 2025

Tree-sitter. https://tree-sitter.github.io/tr ee-sitter/, 2025

2025
[21]

https://invariantlabs.ai/b log/whatsapp-mcp-exploited, 2025

WhatsApp MCP Exploited: Exfiltrating your message history via MCP. https://invariantlabs.ai/b log/whatsapp-mcp-exploited, 2025

2025
[22]

Get my drift? catching llm task drift with activation deltas

Sahar Abdelnabi, Aideen Fay, Giovanni Cherubin, Ahmed Salem, Mario Fritz, and Andrew Paverd. Get my drift? catching llm task drift with activation deltas. InSaTML, 2025

2025
[23]

A Survey of Malware Detection Using Deep Learning.Machine Learning with Applications, 16:100546, 2024

Ahmed Bensaoud, Jugal Kalita, and Mahmoud Ben- saoud. A Survey of Malware Detection Using Deep Learning.Machine Learning with Applications, 16:100546, 2024

2024
[24]

Automated Machine Learning for Deep Learn- ing Based Malware Detection.Computers & Security, 137:103582, 2024

Austin Brown, Maanak Gupta, and Mahmoud Abdel- salam. Automated Machine Learning for Deep Learn- ing Based Malware Detection.Computers & Security, 137:103582, 2024

2024
[25]

Yu, Qiang Yang, and Xing Xie

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxi- ang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie. A survey on evaluation of large language models.ACM transactions on intelligent systems and technology, 15(3):1–45, 2024

2024
[26]

StruQ: Defending against Prompt Injection with Structured Queries

Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. StruQ: Defending against Prompt Injection with Structured Queries. InProceedings of the 34th USENIX Security Symposium (USENIX Security 25), pages 2383–2400, 2025

2025
[27]

Secalign: Defending against prompt injection with preference optimization

Sizhe Chen, Arman Zharmagambetov, Saeed Mahlou- jifar, Kamalika Chaudhuri, David Wagner, and Chuan Guo. Secalign: Defending against prompt injection with preference optimization. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communi- cations Security, pages 2833–2847, 2025

2025
[28]

Meta secalign: A secure foundation llm against prompt injection attacks,

Sizhe Chen, Arman Zharmagambetov, David Wagner, and Chuan Guo. Meta secalign: A secure foundation llm against prompt injection attacks.arXiv preprint arXiv:2507.02735, 2025. 14

work page arXiv 2025
[29]

Securing AI agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025

Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Se- curing ai agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025

work page arXiv 2025
[30]

Mathsensei: a tool-augmented large language model for mathematical reasoning

Debrup Das, Debopriyo Banerjee, Somak Aditya, and Ashish Kulkarni. Mathsensei: a tool-augmented large language model for mathematical reasoning.arXiv preprint arXiv:2402.17231, 2024

work page arXiv 2024
[31]

Defeating Prompt Injections by Design

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025

work page internal anchor Pith review arXiv 2025
[32]

Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.NeurIPS, 2024

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.NeurIPS, 2024

2024
[33]

Awesome MCP Servers

Frank Fiegel. Awesome MCP Servers. https://gith ub.com/punkpeye/awesome-mcp-servers, 2025

2025
[34]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion. InProceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023

2023
[35]

A survey on llm-as-a-judge

Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xue- hao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, et al. A survey on llm-as-a-judge. The Innovation, 2024

2024
[36]

A survey on hallucination in large language models: Prin- ciples, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 43(2):1–55, 2025

Lei Huang, Weijiang Yu, Weitao Ma, Zhangyin Zhong, Weihong Feng, Haotian Wang, Qianglong Chen, Wei- hua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallucination in large language models: Prin- ciples, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 43(2):1–55, 2025

2025
[37]

Attention tracker: Detecting prompt injection attacks in llms

Kuo-Han Hung, Ching-Yun Ko, Ambrish Rawat, I-Hsin Chung, Winston H Hsu, and Pin-Yu Chen. Attention tracker: Detecting prompt injection attacks in llms. In NAACL, 2025

2025
[38]

Llms can be easily confused by instructional distractions.arXiv preprint arXiv:2502.04362, 2025

Yerin Hwang, Yongil Kim, Jahyun Koo, Taegwan Kang, Hyunkyung Bae, and Kyomin Jung. Llms can be easily confused by instructional distractions.arXiv preprint arXiv:2502.04362, 2025

work page arXiv 2025
[39]

MCP Tool Poisoning Experiments

Invariant Labs. MCP Tool Poisoning Experiments. https://github.com/invariantlabs-ai/mcp-i njection-experiments/tree/main, 2025

2025
[40]

Promptshield: Deployable detection for prompt injection attacks

Dennis Jacob, Hend Alzahrani, Zhanhao Hu, Basel Alo- mair, and David Wagner. Promptshield: Deployable detection for prompt injection attacks. InProceedings of the Fifteenth ACM Conference on Data and Applica- tion Security and Privacy, pages 341–352, 2024

2024
[41]

Survey of hallucination in natural language generation.ACM computing surveys, 55(12):1– 38, 2023

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation.ACM computing surveys, 55(12):1– 38, 2023

2023
[42]

Automatic code documentation generation using gpt-3

Junaed Younus Khan and Gias Uddin. Automatic code documentation generation using gpt-3. InProceedings of the 37th IEEE/ACM International Conference on Au- tomated Software Engineering, pages 1–6, 2022

2022
[43]

Prompt flow integrity to prevent privilege escalation in LLM agents.arXiv preprint arXiv:2503.15547, 2025

Juhee Kim, Woohyuk Choi, and Byoungyoung Lee. Prompt flow integrity to prevent privilege escalation in llm agents.arXiv preprint arXiv:2503.15547, 2025

work page arXiv 2025
[44]

Effective and Efficient Malware Detec- tion at the End Host

Clemens Kolbitsch, Paolo Milani Comparetti, Christo- pher Kruegel, Engin Kirda, Xiao yong Zhou, and Xi- aoFeng Wang. Effective and Efficient Malware Detec- tion at the End Host. InProceedings of the USENIX Security Symposium (USENIX Security 2009), pages 351–366, 2009

2009
[45]

arXiv preprint arXiv:2410.22770 , year=

Hao Li and Xiaogeng Liu. Injecguard: Benchmark- ing and mitigating over-defense in prompt injection guardrail models.arXiv preprint arXiv:2410.22770, 2024

work page arXiv 2024
[46]

Piguard: Prompt injection guardrail via mitigating overdefense for free

Hao Li, Xiaogeng Liu, Ning Zhang, and Chaowei Xiao. Piguard: Prompt injection guardrail via mitigating overdefense for free. InACL, 2025

2025
[47]

Yi Liu, Zhihao Chen, Yanjun Zhang, Gelei Deng, Yuekang Li, Jianting Ning, Ying Zhang, and Leo Yu Zhang

Hao Li, Yankai Yang, G Edward Suh, Ning Zhang, and Chaowei Xiao. Reasalign: Reasoning enhanced safety alignment against prompt injection attack.arXiv preprint arXiv:2601.10173, 2026

work page arXiv 2026
[48]

Les Dissonances: Cross-Tool Harvesting and Polluting in Multi-Tool Empowered LLM Agents

Zichuan Li, Jian Cui, Xiaojing Liao, and Luyi Xing. Les Dissonances: Cross-Tool Harvesting and Polluting in Multi-Tool Empowered LLM Agents. InProceedings of the 33rd Annual Network and Distributed System Secu- rity Symposium (NDSS 2026), San Diego, CA, February 2026

2026
[49]

Automatic and universal prompt injection attacks against large language models.arXiv, 2024

Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, and Chaowei Xiao. Automatic and universal prompt injection attacks against large language models.arXiv, 2024. 15

2024
[50]

Prompt Injection attack against LLM-integrated Applications

Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, and Leo Yu Zhang. Prompt in- jection attack against llm-integrated applications.arXiv preprint arXiv:2306.05499, 2023

work page internal anchor Pith review arXiv 2023
[51]

Formalizing and benchmark- ing prompt injection attacks and defenses

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and benchmark- ing prompt injection attacks and defenses. InUSENIX Security, 2024

2024
[52]

Datasentinel: A game-theoretic detection of prompt injection attacks

Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, and Neil Zhenqiang Gong. Datasentinel: A game-theoretic detection of prompt injection attacks. In2025 IEEE Symposium on Security and Privacy (SP), pages 2190–
[53]

PromptGuard Prompt Injection Guardrail

Meta. PromptGuard Prompt Injection Guardrail. https://www.llama.com/docs/model-cards-a nd-prompt-formats/prompt-guard/, 2024

2024
[54]

Schulhoff, Jamie Hayes, Michael Ilie, Juliette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shumailov, Abhradeep Thakurta, Kai Yuanqing Xiao, Andreas Terzis, and Florian Tramèr

Milad Nasr, Nicholas Carlini, Chawin Sitawarin, Sander V Schulhoff, Jamie Hayes, Michael Ilie, Juli- ette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shu- mailov, Abhradeep Thakurta, Kai Yuanqing Xiao, An- dreas Terzis, and Florian Tramer. The attacker moves second: Stronger adaptive attacks bypass defenses against llm jailbreaks and prompt injections.arXiv...

work page arXiv 2025
[55]

A comprehen- sive overview of large language models.ACM Transac- tions on Intelligent Systems and Technology, 16(5):1–72, 2025

Humza Naveed, Asad Ullah Khan, Shi Qiu, Muham- mad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. A comprehen- sive overview of large language models.ACM Transac- tions on Intelligent Systems and Technology, 16(5):1–72, 2025

2025
[56]

Neural exec: Learning (and learning from) exe- cution triggers for prompt injection attacks

Dario Pasquini, Martin Strohmeier, and Carmela Tron- coso. Neural exec: Learning (and learning from) exe- cution triggers for prompt injection attacks. InAISec, 2024

2024
[57]

Ignore previous prompt: Attack techniques for language models

Fábio Perez and Ian Ribeiro. Ignore previous prompt: Attack techniques for language models. InNeurIPS ML Safety Workshop, 2022

2022
[58]

Jatmo: Prompt injection defense by task-specific finetuning

Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, and David Wagner. Jatmo: Prompt injection defense by task-specific finetuning. InESORICS, 2024

2024
[59]

Fine-tuned deberta-v3-base for prompt injection detection, 2024

ProtectAI.com. Fine-tuned deberta-v3-base for prompt injection detection, 2024

2024
[60]

Tool learning with large language models: A survey

Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, and Ji-Rong Wen. Tool learning with large language models: A survey. Frontiers of Computer Science, 19(8):198343, 2025

2025
[61]

Large language models can be easily distracted by irrelevant context

Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H Chi, Nathanael Schärli, and Denny Zhou. Large language models can be easily distracted by irrelevant context. InInternational Confer- ence on Machine Learning, pages 31210–31227. PMLR, 2023

2023
[62]

Prompt Injec- tion Attack to Tool Selection in LLM Agents

Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, and Lichao Sun. Prompt Injec- tion Attack to Tool Selection in LLM Agents. InPro- ceedings of the 33rd Annual Network and Distributed System Security Symposium (NDSS 2026), San Diego, CA, February 2026

2026
[63]

Progent: Securing AI Agents with Privilege Control

Tianneng Shi, Jingxuan He, Zhun Wang, Linyu Wu, Hongwei Li, Wenbo Guo, and Dawn Song. Progent: Programmable privilege control for llm agents.arXiv preprint arXiv:2504.11703, 2025

work page internal anchor Pith review arXiv 2025
[64]

Source code summarization in the era of large language models

Weisong Sun, Yun Miao, Yuekang Li, Hongyu Zhang, Chunrong Fang, Yi Liu, Gelei Deng, Yang Liu, and Zhenyu Chen. Source code summarization in the era of large language models. In2025 IEEE/ACM 47th Inter- national Conference on Software Engineering (ICSE), pages 1882–1894. IEEE, 2025

2025
[65]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean- Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalk- wyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[66]

The instruction hier- archy: Training llms to prioritize privileged instructions

Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Jo- hannes Heidecke, and Alex Beutel. The instruction hier- archy: Training llms to prioritize privileged instructions. arXiv, 2024

2024
[67]

Agentarmor: Enforcing program analysis on agent runtime trace to defend against prompt injection.arXiv preprint arXiv:2508.01249, 2025

Peiran Wang, Yang Liu, Yunfei Lu, Yifeng Cai, Hongbo Chen, Qingyou Yang, Jie Zhang, Jue Hong, and Ye Wu. Agentarmor: Enforcing program analysis on agent run- time trace to defend against prompt injection.arXiv preprint arXiv:2508.01249, 2025

work page arXiv 2025
[68]

Mcp-bench: Benchmarking tool-using llm agents with complex real-world tasks via mcp servers.arXiv preprint arXiv:2508.20453,

Zhenting Wang, Qi Chang, Hemani Patel, Shashank Biju, Cheng-En Wu, Quan Liu, Aolin Ding, Alireza Reza- zadeh, Ankit Shah, Yujia Bao, et al. Mcp-bench: Bench- marking tool-using llm agents with complex real-world tasks via mcp servers.arXiv preprint arXiv:2508.20453, 2025

work page arXiv 2025
[69]

Delimiters won’t save you from prompt injection

Simon Willison. Delimiters won’t save you from prompt injection. https://simonwillison.net/2023/Ma y/11/delimiters-wont-save-you, 2023

2023
[70]

Gemini introduces Personal Intelli- gence

Josh Woodward. Gemini introduces Personal Intelli- gence. https://blog.google/innovation-and -ai/products/gemini-app/personal-intelli gence/, 2025. 16

2025
[71]

System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective

Fangzhou Wu, Ethan Cecchetti, and Chaowei Xiao. System-level defense against indirect prompt injection attacks: An information flow control perspective.arXiv preprint arXiv:2409.19091, 2024

work page arXiv 2024
[72]

How easily do irrelevant inputs skew the responses of large language models? In First Conference on Language Modeling

Siye Wu, Jian Xie, Jiangjie Chen, Tinghui Zhu, Kai Zhang, and Yanghua Xiao. How easily do irrelevant inputs skew the responses of large language models? In First Conference on Language Modeling
[73]

In- structional segment embedding: Improving llm safety with instruction hierarchy

Tong Wu, Shujian Zhang, Kaiqiang Song, Silei Xu, Sanqiang Zhao, Ravi Agrawal, Sathish Reddy Indurthi, Chong Xiang, Prateek Mittal, and Wenxuan Zhou. In- structional segment embedding: Improving llm safety with instruction hierarchy. InThe Thirteenth Interna- tional Conference on Learning Representations, 2025

2025
[74]

Isolategpt: An execution iso- lation architecture for llm-based agentic systems

Yuhao Wu, Franziska Roesner, Tadayoshi Kohno, Ning Zhang, and Umar Iqbal. Isolategpt: An execution iso- lation architecture for llm-based agentic systems. In NDSS, 2025

2025
[75]

Craft: Customizing llms by creating and retrieving from specialized toolsets.arXiv preprint arXiv:2309.17428, 2023

Lifan Yuan, Yangyi Chen, Xingyao Wang, Yi R Fung, Hao Peng, and Heng Ji. Craft: Customizing llms by creating and retrieving from specialized toolsets.arXiv preprint arXiv:2309.17428, 2023

work page arXiv 2023
[76]

Browsesafe: Understanding and preventing prompt injection within AI browser agents.arXiv preprint arXiv:2511.20597, 2025

Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley, Jerry Ma, Denis Yarats, and Ninghui Li. Browsesafe: Un- derstanding and preventing prompt injection within ai browser agents.arXiv preprint arXiv:2511.20597, 2025

work page arXiv 2025
[77]

A Survey of Large Language Models

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. A survey of large language mod- els.arXiv preprint arXiv:2303.18223, 1(2), 2023

work page internal anchor Pith review arXiv 2023
[78]

Judging llm-as- a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595– 46623, 2023

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuo- han Li, Dacheng Li, Eric Xing, et al. Judging llm-as- a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595– 46623, 2023

2023
[79]

Attention is all you need to defend against indirect prompt injection attacks in llms

Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, and Wenyuan Xu. Attention is all you need to defend against indirect prompt injection attacks in llms. InNDSS, 2026. 17

2026