arXiv preprint arXiv:2601.17548 , year =

Narek Maloyan, Dmitry Namiot · 2026 · arXiv 2601.17548

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 3

citation-polarity summary

background 2 support 1

representative citing papers

Coding with "Enemy": Can Human Developers Detect AI Agent Sabotage?

cs.AI · 2026-06-04 · unverdicted · novelty 8.0

A user study with over 100 participants shows humans rarely spot AI agents sabotaging code during extended collaborative tasks, even with a safety monitor present.

What You Approve Is What Executes: Consent Integrity for Black-Box LLM Agents

cs.CR · 2026-06-01 · unverdicted · novelty 7.0

The paper introduces Consent Integrity as the property that actions shown for approval must be rendered by a trusted mediator from the real boundary action over an unspoofable path and bound to execution, with uninspectable actions surfaced rather than silently approved.

A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

cs.CR · 2026-04-25 · unverdicted · novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security

cs.CR · 2026-04-08 · conditional · novelty 7.0

MCP-DPT creates a defense-placement taxonomy that organizes MCP threats and defenses across six architectural layers, revealing mostly tool-centric protections and gaps at orchestration, transport, and supply-chain layers.

ActPlane: Programmable OS-Level Policy Enforcement for Agent Harnesses

cs.OS · 2026-06-23 · unverdicted · novelty 6.0

ActPlane introduces an OS-kernel policy engine using an information-flow control DSL and eBPF to enforce agent harness policies, achieving better compliance on indirect paths with 1.9-8.4% overhead.

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems

cs.SE · 2026-05-30 · unverdicted · novelty 6.0

About 18.2% of structurally flagged skill pairs represent genuine compositional safety risks in agent skill registries, with exploitation gated by host model behavior.

Prompt Injection Detection is Regime-Dependent: A Deployment-Aware Evaluation with Interpretable Structural Signals

cs.CL · 2026-05-26 · unverdicted · novelty 5.0

Prompt injection detection performance is highly regime-dependent with no single detector dominating across settings; transformer models perform best overall while structural signals offer modest gains in some regimes.

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

cs.CR · 2026-05-07 · unverdicted · novelty 5.0

A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.

Security Considerations for Artificial Intelligence Agents

cs.LG · 2026-03-12 · unverdicted · novelty 3.0

Frontier AI agents introduce new confidentiality, integrity, and availability risks through changed assumptions on code-data separation and authority boundaries, requiring layered defenses like sandboxing and policy enforcement.

How Your Credentials Are Leaked by LLM Agent Skills: An Empirical Study

cs.CR · 2026-04-03

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

arXiv preprint arXiv:2601.17548 , year =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer