hub

SecGPT: An Execution Isolation Architecture for LLM-Based Systems

· 2025 · arXiv 2403.04960

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 dataset 1 method 1

citation-polarity summary

background 2 use dataset 1 use method 1

representative citing papers

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

cs.CR · 2024-06-19 · unverdicted · novelty 8.0

AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.

One Goal, Many Commands: Characterizing Denylist Fragility in AI Agents

cs.CR · 2026-06-14 · unverdicted · novelty 7.0

ShellSieve, an LLM-driven pipeline, detects command denylist fragility in terminal AI agents and finds 69.0-98.6% of 1,709 GitHub-collected denylists to be bypassable.

AutoDojo: Adaptive Black-Box Attacks Reveal the Limits of IPI Defenses and Task-Specification Effects in LLM Agents

cs.CR · 2026-06-13 · unverdicted · novelty 7.0

AutoDojo adaptively optimizes IPI attacks to bypass defenses, recovering substantial ASR on action-open tasks where static attacks fail.

What You Approve Is What Executes: Consent Integrity for Black-Box LLM Agents

cs.CR · 2026-06-01 · unverdicted · novelty 7.0

The paper introduces Consent Integrity as the property that actions shown for approval must be rendered by a trusted mediator from the real boundary action over an unspoofable path and bound to execution, with uninspectable actions surfaced rather than silently approved.

AgenTEE: Confidential LLM Agent Execution on Edge Devices

cs.CR · 2026-04-20 · unverdicted · novelty 7.0

AgenTEE isolates LLM agent runtime, inference, and apps in independently attested cVMs on Arm-based edge devices, achieving under 5.15% overhead versus commodity OS deployments.

From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems

cs.MA · 2025-06-05 · accept · novelty 7.0

A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.

PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections

cs.CR · 2026-06-10 · unverdicted · novelty 6.0

PI-Hunter automates red-teaming of LLM agents by generating and iteratively evolving source-aware test cases to induce retrieval of embedded malicious instructions from external environments.

Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents

cs.CR · 2026-05-26 · unverdicted · novelty 6.0

AuthGraph aligns an execution provenance graph with a clean authorization graph to detect parameter-source deviations from user intent, reducing attack success rates to 1-2% on AgentDojo and AgentDyn while retaining most task utility.

PrivScope: Task-scoped Disclosure Control for Hybrid Agentic Systems

cs.CR · 2026-05-15 · unverdicted · novelty 6.0

PrivScope enforces task-scoped disclosure at the local-cloud boundary in hybrid agents, eliminating profile leakage and halving re-identification risk on medical workflows while preserving task success.

Behavioral Integrity Verification for AI Agent Skills

cs.CR · 2026-05-12 · unverdicted · novelty 6.0

BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.

ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection

cs.CR · 2026-05-05 · unverdicted · novelty 6.0

ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.

Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

cs.CR · 2026-05-01 · unverdicted · novelty 6.0

Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on expert-labeled samples.

Parallax: Why AI Agents That Think Must Never Act

cs.CR · 2026-04-14 · unverdicted · novelty 6.0

Parallax enforces structural separation between AI thinking and acting via independent multi-tier validation, information flow control, and state rollback, blocking 98.9% of 280 adversarial attacks with zero false positives even when the reasoning system is fully compromised.

CAEC: Confidential, Attestable, and Efficient Inter-CVM Communication with Arm CCA

cs.CR · 2025-12-01 · unverdicted · novelty 6.0

CAEC adds confidential shared memory to Arm CCA, cutting inter-CVM communication cost by up to 209x versus encryption through hypervisor-visible memory while preserving isolation and adding attestable sharing.

Whispers in the Machine: Confidentiality in Agentic Systems

cs.CR · 2024-02-10 · unverdicted · novelty 6.0

Systematic testing of ten LLM agents across 20 tool scenarios and 14 attacks finds universal vulnerability to prompt injection enabling data exfiltration, with tooling amplifying leakage.

Reframing LLM Agent Security as an Agent-Human Interaction Problem

cs.CR · 2026-05-23 · unverdicted · novelty 5.0

LLM agent security is reframed as an agent-human interaction issue, supported by a survey showing industry preference for human-centric mechanisms over academic favorites and proposing a new research agenda.

Options, Not Clicks: Lattice Refinement for Consent-Driven MCP Authorization

cs.CR · 2026-05-12 · unverdicted · novelty 5.0

Conleash uses a risk lattice, policy engine, and refinement loop to deliver scoped, consent-driven authorization for MCP tool calls, reaching 98.2% accuracy and 99.4% escalation catch rate on 984 traces with 8.2 ms overhead and higher user preference in a 16-person study.

ClawLess: A Security Model of AI Agents

cs.CR · 2026-04-07 · unverdicted · novelty 5.0

ClawLess introduces a formal fine-grained security model for AI agents with runtime-adaptive policies enforced via user-space kernel and BPF syscall interception.

AgentGuard: An Attribute-Based Access Control Framework for Tool-Use LLM-Based Agent

cs.CR · 2026-05-27 · unverdicted · novelty 4.0

AgentGuard is an ABAC framework for tool-use LLM agents with lightweight client integration and three server-side inspection mechanisms for single-tool and cross-tool risks.

From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap

cs.SE · 2024-10-28 · unverdicted · novelty 4.0

A semi-structured thematic synthesis identifies core challenges in FM selection, alignment, prompting, orchestration, testing, deployment, and cross-cutting concerns like observability for production-ready FMware.

citing papers explorer

Showing 18 of 18 citing papers after filters.

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents cs.CR · 2024-06-19 · unverdicted · none · ref 66
AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.
One Goal, Many Commands: Characterizing Denylist Fragility in AI Agents cs.CR · 2026-06-14 · unverdicted · none · ref 43
ShellSieve, an LLM-driven pipeline, detects command denylist fragility in terminal AI agents and finds 69.0-98.6% of 1,709 GitHub-collected denylists to be bypassable.
AutoDojo: Adaptive Black-Box Attacks Reveal the Limits of IPI Defenses and Task-Specification Effects in LLM Agents cs.CR · 2026-06-13 · unverdicted · none · ref 34
AutoDojo adaptively optimizes IPI attacks to bypass defenses, recovering substantial ASR on action-open tasks where static attacks fail.
What You Approve Is What Executes: Consent Integrity for Black-Box LLM Agents cs.CR · 2026-06-01 · unverdicted · none · ref 17
The paper introduces Consent Integrity as the property that actions shown for approval must be rendered by a trusted mediator from the real boundary action over an unspoofable path and bound to execution, with uninspectable actions surfaced rather than silently approved.
AgenTEE: Confidential LLM Agent Execution on Edge Devices cs.CR · 2026-04-20 · unverdicted · none · ref 58
AgenTEE isolates LLM agent runtime, inference, and apps in independently attested cVMs on Arm-based edge devices, achieving under 5.15% overhead versus commodity OS deployments.
PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections cs.CR · 2026-06-10 · unverdicted · none · ref 12
PI-Hunter automates red-teaming of LLM agents by generating and iteratively evolving source-aware test cases to induce retrieval of embedded malicious instructions from external environments.
Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents cs.CR · 2026-05-26 · unverdicted · none · ref 23
AuthGraph aligns an execution provenance graph with a clean authorization graph to detect parameter-source deviations from user intent, reducing attack success rates to 1-2% on AgentDojo and AgentDyn while retaining most task utility.
PrivScope: Task-scoped Disclosure Control for Hybrid Agentic Systems cs.CR · 2026-05-15 · unverdicted · none · ref 27
PrivScope enforces task-scoped disclosure at the local-cloud boundary in hybrid agents, eliminating profile leakage and halving re-identification risk on medical workflows while preserving task success.
Behavioral Integrity Verification for AI Agent Skills cs.CR · 2026-05-12 · unverdicted · none · ref 37
BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection cs.CR · 2026-05-05 · unverdicted · none · ref 149
ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.
Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis cs.CR · 2026-05-01 · unverdicted · none · ref 44
Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on expert-labeled samples.
Parallax: Why AI Agents That Think Must Never Act cs.CR · 2026-04-14 · unverdicted · none · ref 49
Parallax enforces structural separation between AI thinking and acting via independent multi-tier validation, information flow control, and state rollback, blocking 98.9% of 280 adversarial attacks with zero false positives even when the reasoning system is fully compromised.
CAEC: Confidential, Attestable, and Efficient Inter-CVM Communication with Arm CCA cs.CR · 2025-12-01 · unverdicted · none · ref 23
CAEC adds confidential shared memory to Arm CCA, cutting inter-CVM communication cost by up to 209x versus encryption through hypervisor-visible memory while preserving isolation and adding attestable sharing.
Whispers in the Machine: Confidentiality in Agentic Systems cs.CR · 2024-02-10 · unverdicted · none · ref 41
Systematic testing of ten LLM agents across 20 tool scenarios and 14 attacks finds universal vulnerability to prompt injection enabling data exfiltration, with tooling amplifying leakage.
Reframing LLM Agent Security as an Agent-Human Interaction Problem cs.CR · 2026-05-23 · unverdicted · none · ref 58
LLM agent security is reframed as an agent-human interaction issue, supported by a survey showing industry preference for human-centric mechanisms over academic favorites and proposing a new research agenda.
Options, Not Clicks: Lattice Refinement for Consent-Driven MCP Authorization cs.CR · 2026-05-12 · unverdicted · none · ref 54
Conleash uses a risk lattice, policy engine, and refinement loop to deliver scoped, consent-driven authorization for MCP tool calls, reaching 98.2% accuracy and 99.4% escalation catch rate on 984 traces with 8.2 ms overhead and higher user preference in a 16-person study.
ClawLess: A Security Model of AI Agents cs.CR · 2026-04-07 · unverdicted · none · ref 19
ClawLess introduces a formal fine-grained security model for AI agents with runtime-adaptive policies enforced via user-space kernel and BPF syscall interception.
AgentGuard: An Attribute-Based Access Control Framework for Tool-Use LLM-Based Agent cs.CR · 2026-05-27 · unverdicted · none · ref 22
AgentGuard is an ABAC framework for tool-use LLM agents with lightweight client integration and three server-side inspection mechanisms for single-tool and cross-tool risks.

SecGPT: An Execution Isolation Architecture for LLM-Based Systems

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer