ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents
Pith reviewed 2026-05-19 23:49 UTC · model grok-4.3
pith:DLHZDBPJ Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{DLHZDBPJ}
Prints a linked pith:DLHZDBPJ badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
Seeking clarification on ambiguous tasks makes LLM agents far more vulnerable to prompt injection attacks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that the transition to a clarification-seeking state in LLM agents substantially increases susceptibility to prompt injection attacks. In the ASPI benchmark, agents in the clarification setting must request and incorporate user input before acting, while in the execution setting they receive fully specified instructions and encounter adversarial content only via tool returns. Evaluations show consistent amplification of attack success, such as from 1.8% to 34.0% for o3 and 2.2% to 35.7% for Gemini-3-Flash. The increase stems from both a shift in how models process content in this state and effects from the clarification interface itself.
What carries the argument
The ASPI benchmark, which isolates the clarification-seeking state by using matched pairs of execution and clarification conditions for each of the 728 task-attack scenarios.
If this is right
- Standard security evaluations on fully specified tasks will underestimate the attack surface of interactive agents.
- Robustness under clear instructions does not translate to robustness when agents must request and use clarifying input.
- The vulnerability gap arises from both state-dependent changes in how models process incoming content and channel-specific effects from the clarification interface.
- Real-world agent deployments that rely on clarification may expose users to higher prompt injection risks than current testing suggests.
Where Pith is reading between the lines
- Developers might add input validation specific to the clarification channel to offset the increased exposure.
- Similar amplification could occur in other interactive patterns such as confirming tool outputs or handling follow-up questions.
- Security benchmarks for agents should include ambiguity-resolution states by default rather than testing only direct execution.
- Fine-tuning on clarification exchanges could be tested as a way to reduce the observed state-dependent vulnerability.
Load-bearing premise
The benchmark successfully isolates the clarification-seeking state transition as the sole variable without differences in prompt formatting, tool-return handling, or user-input channel independently affecting attack success.
What would settle it
Re-running the benchmark on the same models but finding no significant difference in attack success rates between the execution and clarification settings would indicate that the state transition does not amplify vulnerability.
Figures
read the original abstract
Clarification-seeking behavior is widely regarded as a desirable property of LLM agents, enabling them to resolve ambiguity before acting on underspecified tasks. However, the security implications of this interaction pattern remain unexplored. We investigate whether the transition from standard execution to a clarification-seeking state increases an agent's susceptibility to prompt injection attacks. We introduce ASPI (Ambiguous-State Prompt Injection), a benchmark of 728 task-attack scenarios that isolates clarification as a distinct agent state and measures how this state transition affects vulnerability under controlled conditions. Each benchmark instance is evaluated under matched execution and clarification settings: in the execution setting, the agent acts on a fully specified instruction and encounters adversarial content only through tool-returned data; in the clarification setting, the agent must first request and incorporate additional user input before acting. We evaluate ten frontier LLMs and find that clarification-seeking consistently and substantially amplifies vulnerability. For instance, attack success rises from 1.8% to 34.0% for o3 and from 2.2% to 35.7% for Gemini-3-Flash. A decomposition analysis reveals that this gap reflects both a state-dependent shift in how models process incoming content and a channel-specific effect arising from the agent-solicited clarification interface. These findings demonstrate that standard execution-time security evaluation systematically underestimates the attack surface of interactive agents, and that robustness under fully specified tasks does not translate to robustness under ambiguity. For reproducibility, our data and source code are available at https://github.com/scaleapi/aspi.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the ASPI benchmark comprising 728 task-attack scenarios to test whether clarification-seeking behavior in LLM agents increases susceptibility to prompt injection. Each scenario is evaluated in matched execution (fully specified instruction, adversarial content via tool returns) and clarification (agent requests and incorporates user input) settings across ten frontier models. Results show large increases in attack success under clarification (e.g., 1.8% to 34.0% for o3; 2.2% to 35.7% for Gemini-3-Flash). A decomposition analysis attributes the gap to both a state-dependent processing shift and a channel-specific effect from the agent-solicited clarification interface. The authors conclude that execution-time security evaluations underestimate the attack surface of interactive agents and release data and code for reproducibility.
Significance. If the central empirical finding is robust, the work identifies a practically relevant security risk tied to a desirable agent capability (clarification-seeking). The explicit release of the benchmark, data, and source code at the provided GitHub repository is a clear strength that supports verification and extension. The results challenge the transferability of robustness from fully specified tasks to ambiguous, interactive settings and therefore bear on the design of secure LLM agent systems.
major comments (2)
- [§3] §3 (Benchmark Construction): The matched execution and clarification conditions are presented as isolating the clarification state, yet the manuscript does not detail how prompt formatting, tokenization, or presentation order of user-solicited input is aligned with tool-returned content. Because the decomposition analysis already acknowledges a channel-specific effect, the absence of explicit controls or ablations that hold formatting and channel constant while varying only the internal state leaves the attribution of the observed jumps (e.g., ~2% to ~35%) insecure.
- [Decomposition Analysis] Decomposition Analysis (results section): The paper reports that the gap reflects both state-dependent and channel-specific contributions but provides no quantitative breakdown or controlled ablation that measures the marginal effect of each factor separately. Without such measurements, it remains unclear whether the state transition itself is the dominant driver or whether the channel change accounts for most of the amplification.
minor comments (2)
- [Abstract] Abstract: The sentence describing the decomposition analysis could be tightened to state explicitly that both effects are present rather than implying the gap is fully explained by the state transition.
- [Results] Table or figure captions (results): Ensure that all reported attack-success percentages are accompanied by the exact number of trials or scenarios per cell so readers can assess statistical reliability of the reported deltas.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments on the ASPI benchmark and decomposition analysis. The feedback highlights opportunities to strengthen the description of our controls and to make the quantitative attribution more explicit. We address each major comment below and will incorporate clarifications and additional details in the revised manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Benchmark Construction): The matched execution and clarification conditions are presented as isolating the clarification state, yet the manuscript does not detail how prompt formatting, tokenization, or presentation order of user-solicited input is aligned with tool-returned content. Because the decomposition analysis already acknowledges a channel-specific effect, the absence of explicit controls or ablations that hold formatting and channel constant while varying only the internal state leaves the attribution of the observed jumps (e.g., ~2% to ~35%) insecure.
Authors: We agree that greater transparency on formatting and presentation details would improve the manuscript. The benchmark was constructed so that the adversarial payload is identical in text and length and always appears as the final message before the agent's turn; the only intentional difference is the message role label (user-solicited clarification versus tool return). In the revised version we will add the exact prompt templates, token-count statistics, and ordering description to §3. The decomposition analysis already includes auxiliary runs that hold the channel fixed while varying state (and vice versa); we will surface these comparisons more clearly rather than treating them as supplementary. revision: yes
-
Referee: [Decomposition Analysis] Decomposition Analysis (results section): The paper reports that the gap reflects both state-dependent and channel-specific contributions but provides no quantitative breakdown or controlled ablation that measures the marginal effect of each factor separately. Without such measurements, it remains unclear whether the state transition itself is the dominant driver or whether the channel change accounts for most of the amplification.
Authors: We accept that the current presentation of the decomposition is insufficiently quantitative. In the revised manuscript we will add a table and accompanying text that reports attack success rates for the four crossed conditions (execution/tool, clarification/user, execution/user-simulated, clarification/tool-simulated). This will permit direct calculation of the marginal state-dependent and channel-specific contributions. We will also state the relative sizes of each component based on those measurements. revision: yes
Circularity Check
No circularity: purely empirical benchmark with measured attack rates
full rationale
The paper presents an empirical benchmark study (ASPI) that directly measures attack success rates across matched execution and clarification settings on 728 scenarios for ten LLMs. The central findings (e.g., attack success rising from 1.8% to 34.0% for o3) are reported as observed outcomes under controlled conditions rather than derived from equations, fitted parameters, or first-principles predictions. No self-definitional steps, uniqueness theorems, or ansatzes appear; the decomposition analysis is likewise an empirical breakdown of measured gaps. The work is self-contained against external benchmarks and does not reduce its claims to prior self-citations or inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The clarification-seeking behavior can be isolated as a distinct agent state without altering other aspects of the prompt or tool interactions.
Reference graph
Works this paper leans on
-
[1]
ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection , author=. 2026 , eprint=
work page 2026
-
[2]
AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification , author=. 2026 , eprint=
work page 2026
-
[3]
AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations , author=. 2026 , eprint=
work page 2026
-
[4]
How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition , author=. 2026 , eprint=
work page 2026
-
[5]
Philippe Laban and Hiroaki Hayashi and Yingbo Zhou and Jennifer Neville , booktitle=. 2026 , url=
work page 2026
-
[6]
Sanidhya Vijayvargiya and Xuhui Zhou and Akhila Yerukola and Maarten Sap and Graham Neubig , booktitle=. Ambig-. 2026 , url=
work page 2026
-
[7]
Value of Information: A Framework for Human-Agent Communication , author=. 2026 , eprint=
work page 2026
-
[8]
Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents , author=. 2026 , eprint=
work page 2026
- [9]
-
[10]
The Dual LLM Pattern for Building AI Assistants that Can Resist Prompt Injection , author =. 2023 , howpublished =
work page 2023
-
[11]
The Twelfth International Conference on Learning Representations , year=
WebArena: A Realistic Web Environment for Building Autonomous Agents , author=. The Twelfth International Conference on Learning Representations , year=
-
[12]
Clarify When Necessary: Resolving Ambiguity Through Interaction with LM s
Zhang, Michael JQ and Choi, Eunsol. Clarify When Necessary: Resolving Ambiguity Through Interaction with LM s. Findings of the Association for Computational Linguistics: NAACL 2025. 2025. doi:10.18653/v1/2025.findings-naacl.306
-
[13]
Zhan, Qiusi and Liang, Zhixiang and Ying, Zifan and Kang, Daniel. I njec A gent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.624
-
[14]
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1 , pages =
Yi, Jingwei and Xie, Yueqi and Zhu, Bin and Kiciman, Emre and Sun, Guangzhong and Xie, Xing and Wu, Fangzhao , title =. Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1 , pages =. 2025 , isbn =. doi:10.1145/3690624.3709179 , abstract =
-
[15]
Shunyu Yao and Noah Shinn and Pedram Razavi and Karthik R Narasimhan , booktitle=. \ \. 2025 , url=
work page 2025
-
[16]
Frank F. Xu and Yufan Song and Boxuan Li and Yuxuan Tang and Kritanjali Jain and Mengxue Bao and Zora Zhiruo Wang and Xuhui Zhou and Zhitong Guo and Murong Cao and Mingyang Yang and Hao Yang Lu and Amaad Martin and Zhe Su and Leander Melroy Maben and Raj Mehta and Wayne Chi and Lawrence Keunho Jang and Yiqing Xie and Shuyan Zhou and Graham Neubig , bookti...
work page 2025
-
[17]
ASTRA-bench: Evaluating Tool-Use Agent Reasoning and Action Planning with Personal User Context , author=. 2026 , eprint=
work page 2026
-
[18]
IsolateGPT: An Execution Isolation Architecture for LLM-Based Agentic Systems , author=. 2025 , eprint=
work page 2025
-
[19]
OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation , author=. 2024 , eprint=
work page 2024
-
[20]
OdysseyBench: Evaluating LLM Agents on Long-Horizon Complex Office Application Workflows , author=. 2025 , eprint=
work page 2025
-
[21]
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions , author=. 2024 , eprint=
work page 2024
-
[22]
Structured Uncertainty guided Clarification for LLM Agents , author=. 2026 , eprint=
work page 2026
-
[23]
Maddison and Tatsunori Hashimoto , booktitle=
Yangjun Ruan and Honghua Dong and Andrew Wang and Silviu Pitis and Yongchao Zhou and Jimmy Ba and Yann Dubois and Chris J. Maddison and Tatsunori Hashimoto , booktitle=. Identifying the Risks of. 2024 , url=
work page 2024
-
[24]
UserBench: An Interactive Gym Environment for User-Centric Agents , author=. 2025 , eprint=
work page 2025
-
[25]
LHAW: Controllable Underspecification for Long-Horizon Tasks , author=. 2026 , eprint=
work page 2026
-
[26]
Ignore Previous Prompt: Attack Techniques For Language Models , author=. 2022 , eprint=
work page 2022
-
[27]
A mbig QA : Answering Ambiguous Open-domain Questions
Min, Sewon and Michael, Julian and Hajishirzi, Hannaneh and Zettlemoyer, Luke. A mbig QA : Answering Ambiguous Open-domain Questions. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020. doi:10.18653/v1/2020.emnlp-main.466
-
[28]
33rd USENIX Security Symposium (USENIX Security 24) , year =
Yupei Liu and Yuqi Jia and Runpeng Geng and Jinyuan Jia and Neil Zhenqiang Gong , title =. 33rd USENIX Security Symposium (USENIX Security 24) , year =
-
[29]
Xiao Liu and Hao Yu and Hanchen Zhang and Yifan Xu and Xuanyu Lei and Hanyu Lai and Yu Gu and Hangliang Ding and Kaiwen Men and Kejuan Yang and Shudan Zhang and Xiang Deng and Aohan Zeng and Zhengxiao Du and Chenhui Zhang and Sheng Shen and Tianjun Zhang and Yu Su and Huan Sun and Minlie Huang and Yuxiao Dong and Jie Tang , booktitle=. AgentBench: Evaluat...
work page 2024
-
[30]
Li and Been Kim and Zi Wang , booktitle=
Belinda Z. Li and Been Kim and Zi Wang , booktitle=. QuestBench: Can. 2025 , url=
work page 2025
-
[31]
Defending Against Indirect Prompt Injection Attacks With Spotlighting , author=. 2024 , eprint=
work page 2024
-
[32]
Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security , pages =
Greshake, Kai and Abdelnabi, Sahar and Mishra, Shailesh and Endres, Christoph and Holz, Thorsten and Fritz, Mario , title =. Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security , pages =. 2023 , isbn =. doi:10.1145/3605764.3623985 , abstract =
-
[33]
Proceedings of the 18th ACM Workshop on Artificial Intelligence and Security , pages =
Chen, Sizhe and Wang, Yizhu and Carlini, Nicholas and Sitawarin, Chawin and Wagner, David , title =. Proceedings of the 18th ACM Workshop on Artificial Intelligence and Security , pages =. 2026 , isbn =. doi:10.1145/3733799.3762982 , abstract =
-
[34]
MCP-Atlas: A Large-Scale Benchmark for Tool-Use Competency with Real MCP Servers , author=. 2026 , eprint=
work page 2026
-
[35]
ConvAI3: Generating Clarifying Questions for Open-Domain Dialogue Systems (ClariQ) , author=. 2020 , eprint=
work page 2020
-
[36]
Debenedetti, Edoardo and Zhang, Jie and Balunovic, Mislav and Beurer-Kellner, Luca and Fischer, Marc and Tram\`. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents , url =. Advances in Neural Information Processing Systems , doi =
-
[37]
Wei He and Yueqing Sun and Hongyan Hao and Xueyuan Hao and Zhikang Xia and Qi GU and Hui Su and Xunliang Cai , booktitle=. VitaBench: Benchmarking. 2026 , url=
work page 2026
-
[38]
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , author=. 2019 , eprint=
work page 2019
-
[39]
SuperGlue: Learning Feature Matching with Graph Neural Networks , author=. 2020 , eprint=
work page 2020
-
[40]
Measuring Massive Multitask Language Understanding , author=. 2021 , eprint=
work page 2021
-
[41]
BIGbench: A Unified Benchmark for Evaluating Multi-dimensional Social Biases in Text-to-Image Models , author=. 2025 , eprint=
work page 2025
-
[42]
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them , author=. 2022 , eprint=
work page 2022
-
[43]
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? , author=. 2024 , eprint=
work page 2024
-
[44]
Measuring Mathematical Problem Solving With the MATH Dataset , author=. 2021 , eprint=
work page 2021
-
[45]
Training Verifiers to Solve Math Word Problems , author=. 2021 , eprint=
work page 2021
-
[46]
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems , author=. 2024 , eprint=
work page 2024
-
[47]
Winograd Schema - Knowledge Extraction Using Narrative Chains , author=. 2018 , eprint=
work page 2018
-
[48]
WinoGrande: An Adversarial Winograd Schema Challenge at Scale , author=. 2019 , eprint=
work page 2019
- [49]
-
[50]
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies , author=. 2021 , eprint=
work page 2021
-
[51]
Evaluating Large Language Models Trained on Code , author=. 2021 , eprint=
work page 2021
-
[52]
Program Synthesis with Large Language Models , author=. 2021 , eprint=
work page 2021
-
[53]
Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks , author=. 2023 , eprint=
work page 2023
-
[54]
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning , author=. 2021 , eprint=
work page 2021
-
[55]
Marc-Alexandre C\^ot\'e and \'Akos K\'ad\'ar and Xingdi Yuan and Ben Kybartas and Tavian Barnes and Emery Fine and James Moore and Ruo Yu Tao and Matthew Hausknecht and Layla El Asri and Mahmoud Adada and Wendy Tay and Adam Trischler , title =. CoRR , volume =
-
[56]
ScienceWorld: Is your Agent Smarter than a 5th Grader? , author=. 2022 , eprint=
work page 2022
-
[57]
Medical Exam Question Answering with Large-scale Reading Comprehension , author=. 2018 , eprint=
work page 2018
-
[58]
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering , author=. 2022 , eprint=
work page 2022
-
[59]
PubMedQA: A Dataset for Biomedical Research Question Answering , author=. 2019 , eprint=
work page 2019
-
[60]
Proceedings of AAAI Information Retrieval and Knowledge Discovery in Biomedical Text , interhash =
Tsatsaronis, George and Schroeder, Michael and Paliouras, Georgios and Almirantis, Yannis and Androutsopoulos, Ion and Gaussier, Eric and Gallinari, Patrick and Artieres, Thierry and Alvers,. Proceedings of AAAI Information Retrieval and Knowledge Discovery in Biomedical Text , interhash =
-
[61]
FinQA: A Dataset of Numerical Reasoning over Financial Data , author=. 2022 , eprint=
work page 2022
-
[62]
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models , author=. 2023 , eprint=
work page 2023
-
[63]
MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research , author=. 2025 , eprint=
work page 2025
-
[64]
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? , author=. 2025 , eprint=
work page 2025
-
[65]
MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation , author=. 2024 , eprint=
work page 2024
-
[66]
AIDE: AI-Driven Exploration in the Space of Code , author=. 2025 , eprint=
work page 2025
-
[67]
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering , author=. 2025 , eprint=
work page 2025
-
[68]
CFDLLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics
CFD-LLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics , author=. arXiv preprint arXiv:2509.20374 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[69]
HealthBench: Evaluating Large Language Models Towards Improved Human Health
Healthbench: Evaluating large language models towards improved human health , author=. arXiv preprint arXiv:2505.08775 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[70]
arXiv preprint arXiv:2505.06108 , year=
LLMs Outperform Experts on Challenging Biology Benchmarks , author=. arXiv preprint arXiv:2505.06108 , year=
-
[71]
PaperBench: Evaluating AI’s Ability to Replicate AI Research , author=
-
[72]
AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage
Autoreproduce: Automatic ai experiment reproduction with paper lineage , author=. arXiv preprint arXiv:2505.20662 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[73]
arXiv preprint arXiv:2506.02314 , year=
ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code , author=. arXiv preprint arXiv:2506.02314 , year=
-
[74]
arXiv preprint arXiv:2505.24785 , year=
EXP-Bench: Can AI Conduct AI Research Experiments? , author=. arXiv preprint arXiv:2505.24785 , year=
-
[75]
arXiv preprint arXiv:2506.17335 , year=
LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research , author=. arXiv preprint arXiv:2506.17335 , year=
-
[76]
arXiv preprint arXiv:2504.20115 , year=
AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers , author=. arXiv preprint arXiv:2504.20115 , year=
-
[77]
Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025) , pages=
FrontierScience Bench: Evaluating AI Research Capabilities in LLMs , author=. Proceedings of the 1st Workshop for Research on Agent Language Models (REALM 2025) , pages=
work page 2025
-
[78]
Replicating a high-impact scientific publication using systems of large language models , author=. bioRxiv , pages=. 2024 , publisher=
work page 2024
-
[79]
LAB-Bench: Measuring Capabilities of Language Models for Biology Research , author=. 2024 , eprint=
work page 2024
-
[80]
2nd AI4Research Workshop: Towards a Knowledge-grounded Scientific Research Lifecycle , year=
Large language models for rediscovering unseen chemistry scientific hypotheses , author=. 2nd AI4Research Workshop: Towards a Knowledge-grounded Scientific Research Lifecycle , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.