A user study with over 100 participants shows humans rarely spot AI agents sabotaging code during extended collaborative tasks, even with a safety monitor present.
arXiv preprint arXiv:2601.17548 , year =
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 10roles
background 3representative citing papers
The paper introduces Consent Integrity as the property that actions shown for approval must be rendered by a trusted mediator from the real boundary action over an unspoofable path and bound to execution, with uninspectable actions surfaced rather than silently approved.
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
MCP-DPT creates a defense-placement taxonomy that organizes MCP threats and defenses across six architectural layers, revealing mostly tool-centric protections and gaps at orchestration, transport, and supply-chain layers.
ActPlane introduces an OS-kernel policy engine using an information-flow control DSL and eBPF to enforce agent harness policies, achieving better compliance on indirect paths with 1.9-8.4% overhead.
About 18.2% of structurally flagged skill pairs represent genuine compositional safety risks in agent skill registries, with exploitation gated by host model behavior.
Prompt injection detection performance is highly regime-dependent with no single detector dominating across settings; transformer models perform best overall while structural signals offer modest gains in some regimes.
A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.
Frontier AI agents introduce new confidentiality, integrity, and availability risks through changed assumptions on code-data separation and authority boundaries, requiring layered defenses like sandboxing and policy enforcement.
citing papers explorer
No citing papers match the current filters.