ShellSieve, an LLM-driven pipeline, detects command denylist fragility in terminal AI agents and finds 69.0-98.6% of 1,709 GitHub-collected denylists to be bypassable.
Reframing LLM Agent Security as an Agent-Human Interaction Problem
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
We argue that LLM agent security is fundamentally an agent-human interaction (AHI) problem, not a purely algorithmic one. To substantiate this position, we conduct a systematic analysis of 59 academic papers, 21 production agent systems, and 26 security plugins as of April 2026. Our analysis reveals a striking pattern: the three widely deployed human-centric security mechanisms (policy specification, runtime approval, and scope configuration) dominate industry practice, each adopted by at least 14 of 21 systems (14, 15, and 16, respectively), while the categories most heavily studied in academia (intent anchoring and trust labeling) see zero production deployment. Yet current human participation mechanisms are far from satisfactory: they suffer from a fundamental trade-off between cognitive burden and security guarantees, leaving users caught between approval fatigue and uncontrolled agent autonomy. We make three contributions. First, through a systematic comparison of LLM-based and human-based intent alignment, we argue that human participation in agent security decisions is indispensable given current capabilities. Second, we quantify a pronounced industry-academia mismatch: the security mechanisms that practitioners actually deploy receive scant research attention, while the approaches that researchers favor remain undeployed. Third, we propose a three-direction research agenda and call for AHI security to be recognized as a first-class research citizen, one that demands its own design principles, evaluation methods, and theoretical foundations.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Janus is a publicly available playground system and evaluation harness for testing user-involved permission management designs in AI agents, demonstrating benefits of user input and the need for context-sensitive approaches.
Human oversight for LLM agent actions is capacity-limited by subjective disagreement (kappa 0.52) and fatigue, producing an inverted-U safety curve and vulnerability to flooding attacks in a modeling study.
citing papers explorer
-
One Goal, Many Commands: Characterizing Denylist Fragility in AI Agents
ShellSieve, an LLM-driven pipeline, detects command denylist fragility in terminal AI agents and finds 69.0-98.6% of 1,709 GitHub-collected denylists to be bypassable.
-
Janus: a Playground for User-Involved Agentic Permission Management
Janus is a publicly available playground system and evaluation harness for testing user-involved permission management designs in AI agents, demonstrating benefits of user input and the need for context-sensitive approaches.
-
Oversight Has a Capacity: Calibrating Agent Guards to a Subjective, Fatiguing Human
Human oversight for LLM agent actions is capacity-limited by subjective disagreement (kappa 0.52) and fatigue, producing an inverted-U safety curve and vulnerability to flooding attacks in a modeling study.