Recognition: unknown
TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation
Pith reviewed 2026-05-10 17:12 UTC · model grok-4.3
The pith
TRUSTDESC generates accurate tool descriptions from code implementations to block implicit tool poisoning attacks in LLM applications.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TRUSTDESC is a three-stage framework that produces implementation-faithful tool descriptions by first using reachability-aware static analysis and LLM-guided debloating in SliceMin to isolate minimal relevant code, then synthesizing descriptions in DescGen that mitigate adversarial artifacts, and finally refining them in DynVer through dynamic task execution and behavioral validation, thereby preventing implicit tool poisoning attacks at their source.
What carries the argument
The three-stage pipeline of SliceMin for reachability-aware static analysis and LLM-guided code debloating to extract minimal tool slices, DescGen for synthesizing descriptions from those slices, and DynVer for dynamic verification via task execution.
If this is right
- LLMs achieve higher task completion rates when using tools described by TRUSTDESC.
- Implicit tool poisoning attacks are mitigated directly at the description source rather than through later detection.
- The framework applies across 52 real-world tools from multiple ecosystems with minimal added time and monetary cost.
- Descriptions become more trustworthy because they derive from actual code execution paths instead of user-provided text.
- Existing detection-based defenses can be supplemented or replaced by this generation method.
Where Pith is reading between the lines
- Tool ecosystems could adopt automatic description generation as a standard upload requirement to reduce reliance on manual or untrusted metadata.
- The same slicing and verification steps might extend to other LLM agent components like memory stores or API wrappers.
- If scaled, this shifts security focus from prompt-level filtering to code-level faithfulness in AI tool use.
- Developers could integrate similar pipelines into IDEs to produce safe descriptions before tools are shared.
Load-bearing premise
That reachability-aware static analysis combined with debloating and dynamic task runs will always capture every relevant behavior and hidden artifact in the tool code without missing details that affect description accuracy.
What would settle it
A real-world tool where the TRUSTDESC-generated description omits or misstates a behavior, causing an LLM to select or misuse the tool in a way that matches an implicit poisoning attack.
Figures
read the original abstract
Large language models (LLMs) increasingly rely on external tools to perform time-sensitive tasks and real-world actions. While tool integration expands LLM capabilities, it also introduces a new prompt-injection attack surface: tool poisoning attacks (TPAs). Attackers manipulate tool descriptions by embedding malicious instructions (explicit TPAs) or misleading claims (implicit TPAs) to influence model behavior and tool selection. Existing defenses mainly detect anomalous instructions and remain ineffective against implicit TPAs. In this paper, we present TRUSTDESC, the first framework for preventing tool poisoning by automatically generating trusted tool descriptions from implementations. TRUSTDESC derives implementation-faithful descriptions through a three-stage pipeline. SliceMin performs reachability-aware static analysis and LLM-guided debloating to extract minimal tool-relevant code slices. DescGen synthesizes descriptions from these slices while mitigating misleading or adversarial code artifacts. DynVer refines descriptions through dynamic verification by executing synthesized tasks and validating behavioral claims. We evaluate TRUSTDESC on 52 real-world tools across multiple tool ecosystems. Results show that TRUSTDESC produces accurate tool descriptions that improve task completion rates while mitigating implicit TPAs at their root, with minimal time and monetary overhead.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents TRUSTDESC, the first framework to prevent tool poisoning attacks (TPAs) in LLM tool-using applications by automatically generating trusted, implementation-faithful tool descriptions. The approach uses a three-stage pipeline: SliceMin applies reachability-aware static analysis and LLM-guided debloating to extract minimal relevant code; DescGen synthesizes descriptions while mitigating adversarial artifacts; and DynVer performs dynamic verification by synthesizing and executing tasks to validate behavioral claims. Evaluation on 52 real-world tools across ecosystems claims that the resulting descriptions improve task completion rates, mitigate implicit TPAs at their root, and incur minimal time/monetary overhead.
Significance. If the pipeline reliably produces accurate descriptions, the work would be significant for LLM security: it shifts defense from reactive detection of poisoned descriptions to proactive generation of trusted ones grounded in code, addressing a gap in handling implicit TPAs that current methods miss. Strengths include the end-to-end pipeline design and evaluation across multiple tool ecosystems; reproducible artifacts or machine-checked elements are not mentioned.
major comments (2)
- [Evaluation] Evaluation section: the abstract and results claim positive outcomes (improved task completion, TPA mitigation) on 52 tools, yet no quantitative metrics, baselines, error bars, statistical tests, or exclusion criteria are reported. This prevents verification that the data supports the central claims about accuracy and overhead.
- [§3] §3 (SliceMin): reachability-aware static analysis on the call graph combined with LLM-guided debloating can under-approximate behaviors involving reflection, dynamic dispatch, callbacks, or environment-dependent branches. Any such omitted behaviors are never checked by DynVer (which only validates synthesized tasks), risking incomplete descriptions that fail to fully block implicit TPAs.
minor comments (2)
- [Abstract] Abstract: the claim of 'minimal time and monetary overhead' is stated without concrete measured values, comparison to baselines, or breakdown by stage.
- [Introduction] Notation: the distinction between explicit and implicit TPAs is introduced but not formalized with precise definitions or examples tied to the pipeline stages.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback on our manuscript. We have carefully reviewed the major comments and provide point-by-point responses below, indicating where revisions will be made to address the concerns.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the abstract and results claim positive outcomes (improved task completion, TPA mitigation) on 52 tools, yet no quantitative metrics, baselines, error bars, statistical tests, or exclusion criteria are reported. This prevents verification that the data supports the central claims about accuracy and overhead.
Authors: We agree that the evaluation section would benefit from more explicit quantitative details to support verification of our claims. In the revised manuscript, we will expand Section 5 to report concrete metrics including task completion rates (with and without TRUSTDESC), direct comparisons to baseline description-generation approaches, error bars from repeated experimental runs, statistical significance tests, and clear exclusion criteria for the 52 tools. We will also provide precise numerical values for time and monetary overhead. These additions will be reflected in the abstract and results discussion as appropriate. revision: yes
-
Referee: [§3] §3 (SliceMin): reachability-aware static analysis on the call graph combined with LLM-guided debloating can under-approximate behaviors involving reflection, dynamic dispatch, callbacks, or environment-dependent branches. Any such omitted behaviors are never checked by DynVer (which only validates synthesized tasks), risking incomplete descriptions that fail to fully block implicit TPAs.
Authors: This is a valid point regarding inherent limitations of static analysis in the presence of dynamic language features. Our reachability analysis and debloating are designed to capture core tool behaviors for the evaluated real-world tools, and DynVer's task synthesis plus manual checks confirmed description accuracy in practice. However, we acknowledge that complete coverage of reflection, callbacks, and environment-dependent paths is undecidable in general. In the revision, we will add an explicit discussion of these limitations in §3 and §6, clarifying how DynVer mitigates risks for typical usage patterns while noting that the approach prioritizes practical TPA prevention over theoretical completeness. revision: partial
Circularity Check
No circularity: pipeline derives descriptions from independent static/dynamic analysis
full rationale
The paper's core claim rests on a three-stage pipeline (SliceMin reachability analysis + debloating, DescGen synthesis, DynVer dynamic task execution) that extracts and validates tool behavior directly from implementations. No equations, parameters, or results are shown to reduce to their own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, and no fitted quantities are relabeled as predictions. The evaluation on 52 tools is presented as external validation rather than tautological. This is the common case of a self-contained empirical pipeline with no detectable circular reduction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Reachability-aware static analysis combined with LLM-guided debloating can extract minimal tool-relevant code slices without losing critical behavior
- domain assumption Dynamic execution of synthesized tasks can validate that generated descriptions accurately reflect tool behavior
Reference graph
Works this paper leans on
-
[1]
https://learnprompting.o rg/docs/prompt_hacking/defensive_measure s/instruction, 2023
Instruction defense. https://learnprompting.o rg/docs/prompt_hacking/defensive_measure s/instruction, 2023
2023
-
[2]
https://learnprompting.o rg/docs/prompt_hacking/defensive_measure s/sandwich_defense, 2023
Sandwitch defense. https://learnprompting.o rg/docs/prompt_hacking/defensive_measure s/sandwich_defense, 2023
2023
-
[3]
Cline.https://github.com/cline/cline, 2024
2024
-
[4]
https: //github.com/langchain-ai/langchain, 2024
LangChain: the platform for reliable agents. https: //github.com/langchain-ai/langchain, 2024
2024
-
[5]
ht tps://menlovc.com/perspective/2025-the-s tate-of-generative-ai-in-the-enterprise/ , 2025
2025: The State of Generative AI in the Enterprise. ht tps://menlovc.com/perspective/2025-the-s tate-of-generative-ai-in-the-enterprise/ , 2025
2025
-
[6]
https://github.com/Tencent/A I-Infra-Guard, 2025
AI-Infra-Guard. https://github.com/Tencent/A I-Infra-Guard, 2025
2025
-
[7]
https://modelcontextprotocol.io/docs/dev elop/connect-remote-servers, 2025
Best Practices for Using Remote MCP Servers. https://modelcontextprotocol.io/docs/dev elop/connect-remote-servers, 2025
2025
-
[8]
https://github.com/invar iantlabs-ai/mcp-scan, 2025
Constrain, log and scan your MCP connections for se- curity vulnerabilities. https://github.com/invar iantlabs-ai/mcp-scan, 2025
2025
-
[9]
https://github.com/A syncFuncAI/deepwiki-open, 2025
DeepWiki: AI-Powered Wiki Generator for GitHub/Git- lab/Bitbucket Repositories. https://github.com/A syncFuncAI/deepwiki-open, 2025
2025
-
[10]
https://github.c om/github/github-mcp-server, 2025
GitHub’s official MCP Server. https://github.c om/github/github-mcp-server, 2025
2025
-
[11]
https://huggingface.co/h ub, 2025
Hugging Face Hub. https://huggingface.co/h ub, 2025
2025
-
[12]
https://openrouter.ai/rankin gs, 2025
LLM Rankings. https://openrouter.ai/rankin gs, 2025
2025
-
[13]
https://invariantlabs.ai/blog/mcp-secur ity-notification-tool-poisoning-attacks , 2025
MCP Security Notification: Tool Poisoning Attacks. https://invariantlabs.ai/blog/mcp-secur ity-notification-tool-poisoning-attacks , 2025
2025
-
[14]
https://blog.virtueai.com/2025/ 08/22/mcpguard-first-agent-based-mcp-sca nner-to-protect-ai-agents/, 2025
MCPGuard: First Agent-based MCP Scanner to Protect AI Agents. https://blog.virtueai.com/2025/ 08/22/mcpguard-first-agent-based-mcp-sca nner-to-protect-ai-agents/, 2025
2025
-
[15]
https://github.com/antgroup/MCPS can, 2025
MCPScan. https://github.com/antgroup/MCPS can, 2025
2025
-
[16]
https: //openrouter.ai/, 2025
OpenRouter: The Unified Interface For LLMs. https: //openrouter.ai/, 2025
2025
-
[17]
https://owasp.org/www-project-top -10-for-large-language-model-application s/, 2025
OWASP Top 10 for Large Language Model Appli- cations. https://owasp.org/www-project-top -10-for-large-language-model-application s/, 2025
2025
-
[18]
https://github.com/mic rosoft/playwright-mcp, 2025
Playwright MCP server. https://github.com/mic rosoft/playwright-mcp, 2025
2025
-
[19]
https://github.com/cisco-ai-defen se/mcp-scanner, 2025
Scan MCP servers for potential threats and security findings. https://github.com/cisco-ai-defen se/mcp-scanner, 2025
2025
-
[20]
https://tree-sitter.github.io/tr ee-sitter/, 2025
Tree-sitter. https://tree-sitter.github.io/tr ee-sitter/, 2025
2025
-
[21]
https://invariantlabs.ai/b log/whatsapp-mcp-exploited, 2025
WhatsApp MCP Exploited: Exfiltrating your message history via MCP. https://invariantlabs.ai/b log/whatsapp-mcp-exploited, 2025
2025
-
[22]
Get my drift? catching llm task drift with activation deltas
Sahar Abdelnabi, Aideen Fay, Giovanni Cherubin, Ahmed Salem, Mario Fritz, and Andrew Paverd. Get my drift? catching llm task drift with activation deltas. InSaTML, 2025
2025
-
[23]
A Survey of Malware Detection Using Deep Learning.Machine Learning with Applications, 16:100546, 2024
Ahmed Bensaoud, Jugal Kalita, and Mahmoud Ben- saoud. A Survey of Malware Detection Using Deep Learning.Machine Learning with Applications, 16:100546, 2024
2024
-
[24]
Automated Machine Learning for Deep Learn- ing Based Malware Detection.Computers & Security, 137:103582, 2024
Austin Brown, Maanak Gupta, and Mahmoud Abdel- salam. Automated Machine Learning for Deep Learn- ing Based Malware Detection.Computers & Security, 137:103582, 2024
2024
-
[25]
Yu, Qiang Yang, and Xing Xie
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxi- ang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie. A survey on evaluation of large language models.ACM transactions on intelligent systems and technology, 15(3):1–45, 2024
2024
-
[26]
StruQ: Defending against Prompt Injection with Structured Queries
Sizhe Chen, Julien Piet, Chawin Sitawarin, and David Wagner. StruQ: Defending against Prompt Injection with Structured Queries. InProceedings of the 34th USENIX Security Symposium (USENIX Security 25), pages 2383–2400, 2025
2025
-
[27]
Secalign: Defending against prompt injection with preference optimization
Sizhe Chen, Arman Zharmagambetov, Saeed Mahlou- jifar, Kamalika Chaudhuri, David Wagner, and Chuan Guo. Secalign: Defending against prompt injection with preference optimization. InProceedings of the 2025 ACM SIGSAC Conference on Computer and Communi- cations Security, pages 2833–2847, 2025
2025
-
[28]
Meta secalign: A secure foundation llm against prompt injection attacks,
Sizhe Chen, Arman Zharmagambetov, David Wagner, and Chuan Guo. Meta secalign: A secure foundation llm against prompt injection attacks.arXiv preprint arXiv:2507.02735, 2025. 14
-
[29]
Securing AI agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025
Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. Se- curing ai agents with information-flow control.arXiv preprint arXiv:2505.23643, 2025
-
[30]
Mathsensei: a tool-augmented large language model for mathematical reasoning
Debrup Das, Debopriyo Banerjee, Somak Aditya, and Ashish Kulkarni. Mathsensei: a tool-augmented large language model for mathematical reasoning.arXiv preprint arXiv:2402.17231, 2024
-
[31]
Defeating Prompt Injections by Design
Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025
work page internal anchor Pith review arXiv 2025
-
[32]
Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.NeurIPS, 2024
Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. Agentdojo: A dynamic environment to evaluate prompt injection attacks and defenses for llm agents.NeurIPS, 2024
2024
-
[33]
Awesome MCP Servers
Frank Fiegel. Awesome MCP Servers. https://gith ub.com/punkpeye/awesome-mcp-servers, 2025
2025
-
[34]
Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injec- tion. InProceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023
2023
-
[35]
A survey on llm-as-a-judge
Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xue- hao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, et al. A survey on llm-as-a-judge. The Innovation, 2024
2024
-
[36]
A survey on hallucination in large language models: Prin- ciples, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 43(2):1–55, 2025
Lei Huang, Weijiang Yu, Weitao Ma, Zhangyin Zhong, Weihong Feng, Haotian Wang, Qianglong Chen, Wei- hua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. A survey on hallucination in large language models: Prin- ciples, taxonomy, challenges, and open questions.ACM Transactions on Information Systems, 43(2):1–55, 2025
2025
-
[37]
Attention tracker: Detecting prompt injection attacks in llms
Kuo-Han Hung, Ching-Yun Ko, Ambrish Rawat, I-Hsin Chung, Winston H Hsu, and Pin-Yu Chen. Attention tracker: Detecting prompt injection attacks in llms. In NAACL, 2025
2025
-
[38]
Llms can be easily confused by instructional distractions.arXiv preprint arXiv:2502.04362, 2025
Yerin Hwang, Yongil Kim, Jahyun Koo, Taegwan Kang, Hyunkyung Bae, and Kyomin Jung. Llms can be easily confused by instructional distractions.arXiv preprint arXiv:2502.04362, 2025
-
[39]
MCP Tool Poisoning Experiments
Invariant Labs. MCP Tool Poisoning Experiments. https://github.com/invariantlabs-ai/mcp-i njection-experiments/tree/main, 2025
2025
-
[40]
Promptshield: Deployable detection for prompt injection attacks
Dennis Jacob, Hend Alzahrani, Zhanhao Hu, Basel Alo- mair, and David Wagner. Promptshield: Deployable detection for prompt injection attacks. InProceedings of the Fifteenth ACM Conference on Data and Applica- tion Security and Privacy, pages 341–352, 2024
2024
-
[41]
Survey of hallucination in natural language generation.ACM computing surveys, 55(12):1– 38, 2023
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation.ACM computing surveys, 55(12):1– 38, 2023
2023
-
[42]
Automatic code documentation generation using gpt-3
Junaed Younus Khan and Gias Uddin. Automatic code documentation generation using gpt-3. InProceedings of the 37th IEEE/ACM International Conference on Au- tomated Software Engineering, pages 1–6, 2022
2022
-
[43]
Juhee Kim, Woohyuk Choi, and Byoungyoung Lee. Prompt flow integrity to prevent privilege escalation in llm agents.arXiv preprint arXiv:2503.15547, 2025
-
[44]
Effective and Efficient Malware Detec- tion at the End Host
Clemens Kolbitsch, Paolo Milani Comparetti, Christo- pher Kruegel, Engin Kirda, Xiao yong Zhou, and Xi- aoFeng Wang. Effective and Efficient Malware Detec- tion at the End Host. InProceedings of the USENIX Security Symposium (USENIX Security 2009), pages 351–366, 2009
2009
-
[45]
arXiv preprint arXiv:2410.22770 , year=
Hao Li and Xiaogeng Liu. Injecguard: Benchmark- ing and mitigating over-defense in prompt injection guardrail models.arXiv preprint arXiv:2410.22770, 2024
-
[46]
Piguard: Prompt injection guardrail via mitigating overdefense for free
Hao Li, Xiaogeng Liu, Ning Zhang, and Chaowei Xiao. Piguard: Prompt injection guardrail via mitigating overdefense for free. InACL, 2025
2025
-
[47]
Hao Li, Yankai Yang, G Edward Suh, Ning Zhang, and Chaowei Xiao. Reasalign: Reasoning enhanced safety alignment against prompt injection attack.arXiv preprint arXiv:2601.10173, 2026
-
[48]
Les Dissonances: Cross-Tool Harvesting and Polluting in Multi-Tool Empowered LLM Agents
Zichuan Li, Jian Cui, Xiaojing Liao, and Luyi Xing. Les Dissonances: Cross-Tool Harvesting and Polluting in Multi-Tool Empowered LLM Agents. InProceedings of the 33rd Annual Network and Distributed System Secu- rity Symposium (NDSS 2026), San Diego, CA, February 2026
2026
-
[49]
Automatic and universal prompt injection attacks against large language models.arXiv, 2024
Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, and Chaowei Xiao. Automatic and universal prompt injection attacks against large language models.arXiv, 2024. 15
2024
-
[50]
Prompt Injection attack against LLM-integrated Applications
Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Zihao Wang, Xiaofeng Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, and Leo Yu Zhang. Prompt in- jection attack against llm-integrated applications.arXiv preprint arXiv:2306.05499, 2023
work page internal anchor Pith review arXiv 2023
-
[51]
Formalizing and benchmark- ing prompt injection attacks and defenses
Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. Formalizing and benchmark- ing prompt injection attacks and defenses. InUSENIX Security, 2024
2024
-
[52]
Datasentinel: A game-theoretic detection of prompt injection attacks
Yupei Liu, Yuqi Jia, Jinyuan Jia, Dawn Song, and Neil Zhenqiang Gong. Datasentinel: A game-theoretic detection of prompt injection attacks. In2025 IEEE Symposium on Security and Privacy (SP), pages 2190–
-
[53]
PromptGuard Prompt Injection Guardrail
Meta. PromptGuard Prompt Injection Guardrail. https://www.llama.com/docs/model-cards-a nd-prompt-formats/prompt-guard/, 2024
2024
-
[54]
Milad Nasr, Nicholas Carlini, Chawin Sitawarin, Sander V Schulhoff, Jamie Hayes, Michael Ilie, Juli- ette Pluto, Shuang Song, Harsh Chaudhari, Ilia Shu- mailov, Abhradeep Thakurta, Kai Yuanqing Xiao, An- dreas Terzis, and Florian Tramer. The attacker moves second: Stronger adaptive attacks bypass defenses against llm jailbreaks and prompt injections.arXiv...
-
[55]
A comprehen- sive overview of large language models.ACM Transac- tions on Intelligent Systems and Technology, 16(5):1–72, 2025
Humza Naveed, Asad Ullah Khan, Shi Qiu, Muham- mad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. A comprehen- sive overview of large language models.ACM Transac- tions on Intelligent Systems and Technology, 16(5):1–72, 2025
2025
-
[56]
Neural exec: Learning (and learning from) exe- cution triggers for prompt injection attacks
Dario Pasquini, Martin Strohmeier, and Carmela Tron- coso. Neural exec: Learning (and learning from) exe- cution triggers for prompt injection attacks. InAISec, 2024
2024
-
[57]
Ignore previous prompt: Attack techniques for language models
Fábio Perez and Ian Ribeiro. Ignore previous prompt: Attack techniques for language models. InNeurIPS ML Safety Workshop, 2022
2022
-
[58]
Jatmo: Prompt injection defense by task-specific finetuning
Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, and David Wagner. Jatmo: Prompt injection defense by task-specific finetuning. InESORICS, 2024
2024
-
[59]
Fine-tuned deberta-v3-base for prompt injection detection, 2024
ProtectAI.com. Fine-tuned deberta-v3-base for prompt injection detection, 2024
2024
-
[60]
Tool learning with large language models: A survey
Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, and Ji-Rong Wen. Tool learning with large language models: A survey. Frontiers of Computer Science, 19(8):198343, 2025
2025
-
[61]
Large language models can be easily distracted by irrelevant context
Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H Chi, Nathanael Schärli, and Denny Zhou. Large language models can be easily distracted by irrelevant context. InInternational Confer- ence on Machine Learning, pages 31210–31227. PMLR, 2023
2023
-
[62]
Prompt Injec- tion Attack to Tool Selection in LLM Agents
Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, and Lichao Sun. Prompt Injec- tion Attack to Tool Selection in LLM Agents. InPro- ceedings of the 33rd Annual Network and Distributed System Security Symposium (NDSS 2026), San Diego, CA, February 2026
2026
-
[63]
Progent: Securing AI Agents with Privilege Control
Tianneng Shi, Jingxuan He, Zhun Wang, Linyu Wu, Hongwei Li, Wenbo Guo, and Dawn Song. Progent: Programmable privilege control for llm agents.arXiv preprint arXiv:2504.11703, 2025
work page internal anchor Pith review arXiv 2025
-
[64]
Source code summarization in the era of large language models
Weisong Sun, Yun Miao, Yuekang Li, Hongyu Zhang, Chunrong Fang, Yi Liu, Gelei Deng, Yang Liu, and Zhenyu Chen. Source code summarization in the era of large language models. In2025 IEEE/ACM 47th Inter- national Conference on Software Engineering (ICSE), pages 1882–1894. IEEE, 2025
2025
-
[65]
Gemini: A Family of Highly Capable Multimodal Models
Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean- Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalk- wyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[66]
The instruction hier- archy: Training llms to prioritize privileged instructions
Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Jo- hannes Heidecke, and Alex Beutel. The instruction hier- archy: Training llms to prioritize privileged instructions. arXiv, 2024
2024
-
[67]
Peiran Wang, Yang Liu, Yunfei Lu, Yifeng Cai, Hongbo Chen, Qingyou Yang, Jie Zhang, Jue Hong, and Ye Wu. Agentarmor: Enforcing program analysis on agent run- time trace to defend against prompt injection.arXiv preprint arXiv:2508.01249, 2025
-
[68]
Zhenting Wang, Qi Chang, Hemani Patel, Shashank Biju, Cheng-En Wu, Quan Liu, Aolin Ding, Alireza Reza- zadeh, Ankit Shah, Yujia Bao, et al. Mcp-bench: Bench- marking tool-using llm agents with complex real-world tasks via mcp servers.arXiv preprint arXiv:2508.20453, 2025
-
[69]
Delimiters won’t save you from prompt injection
Simon Willison. Delimiters won’t save you from prompt injection. https://simonwillison.net/2023/Ma y/11/delimiters-wont-save-you, 2023
2023
-
[70]
Gemini introduces Personal Intelli- gence
Josh Woodward. Gemini introduces Personal Intelli- gence. https://blog.google/innovation-and -ai/products/gemini-app/personal-intelli gence/, 2025. 16
2025
-
[71]
Fangzhou Wu, Ethan Cecchetti, and Chaowei Xiao. System-level defense against indirect prompt injection attacks: An information flow control perspective.arXiv preprint arXiv:2409.19091, 2024
-
[72]
How easily do irrelevant inputs skew the responses of large language models? In First Conference on Language Modeling
Siye Wu, Jian Xie, Jiangjie Chen, Tinghui Zhu, Kai Zhang, and Yanghua Xiao. How easily do irrelevant inputs skew the responses of large language models? In First Conference on Language Modeling
-
[73]
In- structional segment embedding: Improving llm safety with instruction hierarchy
Tong Wu, Shujian Zhang, Kaiqiang Song, Silei Xu, Sanqiang Zhao, Ravi Agrawal, Sathish Reddy Indurthi, Chong Xiang, Prateek Mittal, and Wenxuan Zhou. In- structional segment embedding: Improving llm safety with instruction hierarchy. InThe Thirteenth Interna- tional Conference on Learning Representations, 2025
2025
-
[74]
Isolategpt: An execution iso- lation architecture for llm-based agentic systems
Yuhao Wu, Franziska Roesner, Tadayoshi Kohno, Ning Zhang, and Umar Iqbal. Isolategpt: An execution iso- lation architecture for llm-based agentic systems. In NDSS, 2025
2025
-
[75]
Lifan Yuan, Yangyi Chen, Xingyao Wang, Yi R Fung, Hao Peng, and Heng Ji. Craft: Customizing llms by creating and retrieving from specialized toolsets.arXiv preprint arXiv:2309.17428, 2023
-
[76]
Kaiyuan Zhang, Mark Tenenholtz, Kyle Polley, Jerry Ma, Denis Yarats, and Ninghui Li. Browsesafe: Un- derstanding and preventing prompt injection within ai browser agents.arXiv preprint arXiv:2511.20597, 2025
-
[77]
A Survey of Large Language Models
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. A survey of large language mod- els.arXiv preprint arXiv:2303.18223, 1(2), 2023
work page internal anchor Pith review arXiv 2023
-
[78]
Judging llm-as- a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595– 46623, 2023
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuo- han Li, Dacheng Li, Eric Xing, et al. Judging llm-as- a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595– 46623, 2023
2023
-
[79]
Attention is all you need to defend against indirect prompt injection attacks in llms
Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, and Wenyuan Xu. Attention is all you need to defend against indirect prompt injection attacks in llms. InNDSS, 2026. 17
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.