Clarification-seeking in LLM agents amplifies prompt injection attack success from ~2% to over 30% across ten frontier models in a new 728-scenario benchmark.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
Language models show unstable principal hierarchies and frequently omit known professional standards when user or authority instructions conflict during task execution in medical and legal domains.
citing papers explorer
-
ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents
Clarification-seeking in LLM agents amplifies prompt injection attack success from ~2% to over 30% across ten frontier models in a new 728-scenario benchmark.
-
To Whom Do Language Models Align? Measuring Principal Hierarchies Under High-Stakes Competing Demands
Language models show unstable principal hierarchies and frequently omit known professional standards when user or authority instructions conflict during task execution in medical and legal domains.