{"total":21,"items":[{"citing_arxiv_id":"2606.00925","ref_index":51,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Benchmarking Security Risk Detection and Verification in Open Agentic Skill Ecosystems","primary_cat":"cs.CR","submitted_at":"2026-05-30T23:19:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SkillVetBench is a two-stage benchmark combining natural-language semantic vetting and instrumented sandbox execution to detect and provide runtime evidence for malicious skills in open agent platforms, with experiments showing static methods miss up to 89% of threats.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00566","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Same Payload, Different Channel: Measuring Trust Asymmetry in Tool-Using Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-30T06:38:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Agent-native LLMs are substantially more vulnerable to adversarial instructions arriving in tool descriptions than user messages (with the pattern reversing for general-purpose models and inverting again for tool outputs), as quantified by the new Safety Asymmetry Score across six models and three a","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21392","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"VIPER-MCP: Detecting and Exploiting Taint-Style Vulnerabilities in Model Context Protocol Servers","primary_cat":"cs.CR","submitted_at":"2026-05-20T16:46:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"VIPER-MCP detects and exploits taint-style vulnerabilities in Model Context Protocol servers via anchor-query static analysis and feedback-driven prompt evolution, uncovering 106 zero-day vulnerabilities across 39,884 repositories with 67 CVEs assigned.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14038","ref_index":27,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use","primary_cat":"cs.AI","submitted_at":"2026-05-13T18:59:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Model-adaptive tool necessity shows 26-54% mismatch with actual tool calls across LLMs, driven by nearly orthogonal hidden-state signals for cognition versus action.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13213","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning","primary_cat":"cs.AI","submitted_at":"2026-05-13T09:06:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13044","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills","primary_cat":"cs.CR","submitted_at":"2026-05-13T05:57:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Sefz discovers specification violations in 29.9% of 402 real-world agent skills by translating guardrails into reachability goals and guiding LLM mutations with a multi-armed bandit.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11770","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Behavioral Integrity Verification for AI Agent Skills","primary_cat":"cs.CR","submitted_at":"2026-05-12T08:41:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Abdelnabi, and M. Andriushchenko. Skill-Inject: Measuring agent vulnerability to skill file attacks.arXiv preprint arXiv:2602.20156, 2026. [28] Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. [29] Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, and Lichao Sun. Prompt injection attack to tool selection in LLM agents. InNetwork and Distributed System Security Symposium (NDSS), 2026. arXiv preprint arXiv:2504.19793. [30] Juhee Kim, Woohyuk Choi, and Byoungyoung Lee. Prompt flow integrity to prevent privilege escalation in LLM agents."},{"citing_arxiv_id":"2605.11514","ref_index":49,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"FlowSteer: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems","primary_cat":"cs.CR","submitted_at":"2026-05-12T04:35:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"FlowSteer is a prompt-only attack that biases multi-agent LLM workflow planning to propagate malicious signals, raising success rates by up to 55%, with FlowGuard as an input-side defense reducing it by up to 34%.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Division-of-thoughts: Harnessing hybrid language model synergy for efficient on-device agents. InProceedings of the ACM on Web Conference 2025, pages 1822-1833, 2025. [48] Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, and Lichao Sun. Prompt injection attack to tool selection in llm agents.arXiv preprint arXiv:2504.19793, 2025. [49] Maojia Song, Tej Deep Pala, Ruiwen Zhou, Weisheng Jin, Amir Zadeh, Chuan Li, Dorien Herremans, and Soujanya Poria. Llms can't handle peer pressure: Crumbling under multi-agent social interactions.arXiv preprint arXiv:2508.18321, 2025. 12 [50] Harold Triedman, Rishi Dev Jha, and Vitaly Shmatikov. Multi-agent systems execute arbitrary malicious code."},{"citing_arxiv_id":"2605.11039","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck","primary_cat":"cs.CR","submitted_at":"2026-05-11T04:09:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in AgentDojo evaluations.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"interface: semantic arguments of arbitrary tools rather than browser operations alone. Gradual contracts as utility recovery.PACT's contract hierarchy is inspired by gradual typing and higher-order contracts, where specifications can be refined without requiring every interface to be fully precise from the outset. This is useful for agent security [20] because many tools admit safe conservative policies before their complete argument semantics are known. Coarser contracts may over-block, but should remain safe; finer contracts can recover benign behavior by exposing more structure. In PACT, contract precision moves enforcement from opaque tool-level blocking toward role-aware argument checks, recovering utility without relaxing the prohibition on untrusted data"},{"citing_arxiv_id":"2605.07135","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Demystifying and Detecting Agentic Workflow Injection Vulnerabilities in GitHub Actions","primary_cat":"cs.CR","submitted_at":"2026-05-08T02:13:04+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Agentic Workflow Injection is a new injection vulnerability class in LLM-augmented GitHub Actions, with two patterns (P2A and P2S) detected via the TaintAWI tool yielding 496 confirmed exploitable instances across 13,392 workflows.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Agent (P2A)AWI, untrusted content reaches an agent prompt boundary and may steer the agent's privileged capabilities. In Prompt-to-Script (P2S)AWI, attacker influence first passes through a model- or agent-derived output, which is later consumed by downstream workflow logics. Intuitively, AWI can be viewed as the intersection ofprompt injection[10], [11], [12], [13], [14] andscript injection[15], [16], [17], [18] in agentic workflows. However, AWI is not fully captured by either class alone. Unlike conventional prompt injection, AWI targets a CI/CD workflow agent rather than LLM chatbots [14] or traditional LLM-integrated applications [12], [10], [13]. Unlike traditional script injection, AWI does not require syn- tactic control over an interpreter."},{"citing_arxiv_id":"2605.03378","ref_index":124,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"ARGUS: Defending LLM Agents Against Context-Aware Prompt Injection","primary_cat":"cs.CR","submitted_at":"2026-05-05T05:37:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ARGUS defends LLM agents from context-aware prompt injections by tracking information provenance and verifying decisions against trustworthy evidence, reducing attack success to 3.8% while retaining 87.5% task utility.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00460","ref_index":86,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"CleanBase: Detecting Malicious Documents in RAG Knowledge Databases","primary_cat":"cs.CR","submitted_at":"2026-05-01T06:51:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CleanBase identifies malicious documents in RAG databases by detecting cliques in a semantic similarity graph constructed using embedding models and a statistical threshold.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00314","ref_index":39,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis","primary_cat":"cs.CR","submitted_at":"2026-05-01T00:48:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on expert-labeled samples.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"LLM surface facts while outsourcing precise reasoning to a formal engine. PropertyGPT [24] generates formal verification properties from smart contracts for bounded model checking. On the dynamic side, TAI3 [9] stress-tests intent interpretation via input mutation, AgentFuzz [21] applies fuzzing to detect source-to-sink vulnera- bilities (34 zero-days), and AgentSpec [39] provides customizable runtime enforcement. Unlike IRIS and LLMDFA, which target con- ventional code,Semiatargets hybrid documents whose security- relevant content is English prose, and produces pre-deployment findings without invoking the LLM at verdict time. Information Flow and Access Control.As agents transition to state-modifying entities, enforcing access control becomes critical."},{"citing_arxiv_id":"2604.27464","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study","primary_cat":"cs.CR","submitted_at":"2026-04-30T06:04:34+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The survey organizes security threats and defenses in autonomous LLM agents into four layers and identifies that risks can propagate across layers from inputs to ecosystem impacts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16762","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"CapSeal: Capability-Sealed Secret Mediation for Secure Agent Execution","primary_cat":"cs.CR","submitted_at":"2026-04-18T00:23:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CapSeal introduces a capability-sealed broker architecture that lets AI agents perform constrained secret-using actions without ever receiving the secrets themselves.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10286","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems","primary_cat":"cs.AI","submitted_at":"2026-04-11T17:06:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"STARS fuses static priors and contextual risk scoring for agent skill invocations, achieving modest AUPRC gains on prompt injection attacks in a new SIA-Bench but concluding it supplements rather than replaces static auditing.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09378","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning","primary_cat":"cs.CR","submitted_at":"2026-04-10T14:48:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BadSkill poisons embedded models in agent skills to achieve up to 99.5% attack success rate on triggered tasks with only 3% poison rate while preserving normal behavior on non-trigger inputs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04426","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"ShieldNet: Network-Level Guardrails against Emerging Supply-Chain Injections in Agentic Systems","primary_cat":"cs.AI","submitted_at":"2026-04-06T05:15:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ShieldNet detects supply-chain poisoned tools in LLM agents by monitoring network interactions with a MITM proxy and lightweight classifier, reaching 0.995 F1 and 0.8% false positives on a new benchmark of 25+ attack types.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.03070","ref_index":52,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"How Your Credentials Are Leaked by LLM Agent Skills: An Empirical Study","primary_cat":"cs.CR","submitted_at":"2026-04-03T14:50:16+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"icantly extending Large Language Model (LLM) agent functional- ity. Skills are external, file-based modules that allow LLM agents to seamlessly invoke external tools and services (e.g., databases, cloud platforms) through third-party APIs [3, 40]. To date, the number of skills released per day has increased from hundreds to tens of thou- sands since 2026 [52], and major platforms such as Claude [4] and ChatGPT [41] have increasingly integrated skill support. Figure 1 illustrates a representative case from our dataset. A skill file pairs a natural-language description with executable source code; here, the developer hardcodes a Base64-encoded client secret directly in the skill's source code. Because skills are publicly dis-"},{"citing_arxiv_id":"2603.09002","ref_index":39,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Security Considerations for Multi-agent Systems","primary_cat":"cs.CR","submitted_at":"2026-03-09T22:46:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"agent architectures amplify risk because context propagates across agent boundaries; Tool Selection Agent C may invoke high-risk tools based on context established by User Interac- tion Agent A and Data Retrieval Agent B. Users beginning with low-risk queries may inadvertently enable high-risk in- vocations through contextual drift without UI indication. [18], [36]-[39]. RATC 1 6 - Trace Visualization Abstraction Concealing Tool Chain Complexity. Trace visualization creates hierarchi- cal drill-down to reveal nested operations, but multi-agent systems exploit this abstraction as a visibility gap to hide execution complexity. Tool invocations appear as simplified operations in summary views (\"Analyze data\"), with de-"},{"citing_arxiv_id":"2510.23853","ref_index":35,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Your LLM Agents are Temporally Blind: The Misalignment Between Tool Use Decisions and Human Time Perception","primary_cat":"cs.CL","submitted_at":"2025-10-27T20:51:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM agents exhibit temporal blindness, achieving no better than 65% normalized alignment with human preferences on tool-use decisions across time-sensitive scenarios in the new TicToc dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}