AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
hub
SecGPT: An Execution Isolation Architecture for LLM-Based Systems
20 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
ShellSieve, an LLM-driven pipeline, detects command denylist fragility in terminal AI agents and finds 69.0-98.6% of 1,709 GitHub-collected denylists to be bypassable.
AutoDojo adaptively optimizes IPI attacks to bypass defenses, recovering substantial ASR on action-open tasks where static attacks fail.
The paper introduces Consent Integrity as the property that actions shown for approval must be rendered by a trusted mediator from the real boundary action over an unspoofable path and bound to execution, with uninspectable actions surfaced rather than silently approved.
AgenTEE isolates LLM agent runtime, inference, and apps in independently attested cVMs on Arm-based edge devices, achieving under 5.15% overhead versus commodity OS deployments.
A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.
PI-Hunter automates red-teaming of LLM agents by generating and iteratively evolving source-aware test cases to induce retrieval of embedded malicious instructions from external environments.
AuthGraph aligns an execution provenance graph with a clean authorization graph to detect parameter-source deviations from user intent, reducing attack success rates to 1-2% on AgentDojo and AgentDyn while retaining most task utility.
PrivScope enforces task-scoped disclosure at the local-cloud boundary in hybrid agents, eliminating profile leakage and halving re-identification risk on medical workflows while preserving task success.
BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.
Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on expert-labeled samples.
Parallax enforces structural separation between AI thinking and acting via independent multi-tier validation, information flow control, and state rollback, blocking 98.9% of 280 adversarial attacks with zero false positives even when the reasoning system is fully compromised.
CAEC adds confidential shared memory to Arm CCA, cutting inter-CVM communication cost by up to 209x versus encryption through hypervisor-visible memory while preserving isolation and adding attestable sharing.
Systematic testing of ten LLM agents across 20 tool scenarios and 14 attacks finds universal vulnerability to prompt injection enabling data exfiltration, with tooling amplifying leakage.
LLM agent security is reframed as an agent-human interaction issue, supported by a survey showing industry preference for human-centric mechanisms over academic favorites and proposing a new research agenda.
Conleash uses a risk lattice, policy engine, and refinement loop to deliver scoped, consent-driven authorization for MCP tool calls, reaching 98.2% accuracy and 99.4% escalation catch rate on 984 traces with 8.2 ms overhead and higher user preference in a 16-person study.
ClawLess introduces a formal fine-grained security model for AI agents with runtime-adaptive policies enforced via user-space kernel and BPF syscall interception.
AgentGuard is an ABAC framework for tool-use LLM agents with lightweight client integration and three server-side inspection mechanisms for single-tool and cross-tool risks.
A semi-structured thematic synthesis identifies core challenges in FM selection, alignment, prompting, orchestration, testing, deployment, and cross-cutting concerns like observability for production-ready FMware.
citing papers explorer
-
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
-
One Goal, Many Commands: Characterizing Denylist Fragility in AI Agents
ShellSieve, an LLM-driven pipeline, detects command denylist fragility in terminal AI agents and finds 69.0-98.6% of 1,709 GitHub-collected denylists to be bypassable.
-
AutoDojo: Adaptive Black-Box Attacks Reveal the Limits of IPI Defenses and Task-Specification Effects in LLM Agents
AutoDojo adaptively optimizes IPI attacks to bypass defenses, recovering substantial ASR on action-open tasks where static attacks fail.
-
What You Approve Is What Executes: Consent Integrity for Black-Box LLM Agents
The paper introduces Consent Integrity as the property that actions shown for approval must be rendered by a trusted mediator from the real boundary action over an unspoofable path and bound to execution, with uninspectable actions surfaced rather than silently approved.
-
AgenTEE: Confidential LLM Agent Execution on Edge Devices
AgenTEE isolates LLM agent runtime, inference, and apps in independently attested cVMs on Arm-based edge devices, achieving under 5.15% overhead versus commodity OS deployments.
-
PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections
PI-Hunter automates red-teaming of LLM agents by generating and iteratively evolving source-aware test cases to induce retrieval of embedded malicious instructions from external environments.
-
Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents
AuthGraph aligns an execution provenance graph with a clean authorization graph to detect parameter-source deviations from user intent, reducing attack success rates to 1-2% on AgentDojo and AgentDyn while retaining most task utility.
-
PrivScope: Task-scoped Disclosure Control for Hybrid Agentic Systems
PrivScope enforces task-scoped disclosure at the local-cloud boundary in hybrid agents, eliminating profile leakage and halving re-identification risk on medical workflows while preserving task success.
-
Behavioral Integrity Verification for AI Agent Skills
BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
-
ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection
ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.
-
Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis
Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on expert-labeled samples.
-
Parallax: Why AI Agents That Think Must Never Act
Parallax enforces structural separation between AI thinking and acting via independent multi-tier validation, information flow control, and state rollback, blocking 98.9% of 280 adversarial attacks with zero false positives even when the reasoning system is fully compromised.
-
CAEC: Confidential, Attestable, and Efficient Inter-CVM Communication with Arm CCA
CAEC adds confidential shared memory to Arm CCA, cutting inter-CVM communication cost by up to 209x versus encryption through hypervisor-visible memory while preserving isolation and adding attestable sharing.
-
Whispers in the Machine: Confidentiality in Agentic Systems
Systematic testing of ten LLM agents across 20 tool scenarios and 14 attacks finds universal vulnerability to prompt injection enabling data exfiltration, with tooling amplifying leakage.
-
Reframing LLM Agent Security as an Agent-Human Interaction Problem
LLM agent security is reframed as an agent-human interaction issue, supported by a survey showing industry preference for human-centric mechanisms over academic favorites and proposing a new research agenda.
-
Options, Not Clicks: Lattice Refinement for Consent-Driven MCP Authorization
Conleash uses a risk lattice, policy engine, and refinement loop to deliver scoped, consent-driven authorization for MCP tool calls, reaching 98.2% accuracy and 99.4% escalation catch rate on 984 traces with 8.2 ms overhead and higher user preference in a 16-person study.
-
ClawLess: A Security Model of AI Agents
ClawLess introduces a formal fine-grained security model for AI agents with runtime-adaptive policies enforced via user-space kernel and BPF syscall interception.
-
AgentGuard: An Attribute-Based Access Control Framework for Tool-Use LLM-Based Agent
AgentGuard is an ABAC framework for tool-use LLM agents with lightweight client integration and three server-side inspection mechanisms for single-tool and cross-tool risks.