{"total":35,"items":[{"citing_arxiv_id":"2605.18133","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"An Empirical Study of Privacy Leakage Chains via Prompt Injection in Black-Box Chatbot Environments","primary_cat":"cs.CR","submitted_at":"2026-05-18T09:38:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Empirical demonstration that prompt injection combined with web-tool use creates a feasible privacy-leakage chain in deployed black-box chatbot agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15377","ref_index":63,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Ensemble Monitoring for AI Control: Diverse Signals Outweigh More Compute","primary_cat":"cs.AI","submitted_at":"2026-05-14T20:06:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Diverse ensembles of prompted and fine-tuned GPT-4.1-Mini monitors achieve 2.4x better detection of flawed code solutions than homogeneous ensembles on adversarial inputs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14290","ref_index":35,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Web Agents Should Adopt the Plan-Then-Execute Paradigm","primary_cat":"cs.CR","submitted_at":"2026-05-14T02:48:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Web agents should default to planning a complete task program before observing live web content to reduce prompt injection exposure, since WebArena tasks are compatible and 80% need no runtime LLM calls.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22842","ref_index":35,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems","primary_cat":"cs.CR","submitted_at":"2026-05-12T20:21:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Memory poisoning via lost-provenance documents in agent memory stores creates agent misconduct that safety systems misattribute to model failure; the paper defines Semantic Norm Drift, releases a benchmark, and proposes a new testing method plus a defense.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11868","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"IPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injection","primary_cat":"cs.CR","submitted_at":"2026-05-12T09:48:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"IPI-proxy is a toolkit using an intercepting proxy to inject indirect prompt injection attacks into live web pages for testing AI browsing agents against hidden instructions.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"36 commercial LLM-integrated applications, providing the first concrete evidence that injection in production systems is high- yield rather than incidental. Liu et al. [ 8] subsequently unified attacks and defenses in a single formal framework and conducted the first systematic head-to-head evaluation of five attack families against ten defenses across ten LLMs, establishing the de facto evaluation baseline that BIPIA [ 9], InjecAgent [ 10], and AgentDojo [ 11] all extend (Section 2.2). Most recently, EchoLeak [ 2] contributed a chained-bypass methodology demonstrating that even production-grade injection classifiers can be defeated end- to-end when the payload is smuggled through downstream rendering and link-resolution paths, showing that single-layer defenses are insufficient against well-staged in-the-wild attacks."},{"citing_arxiv_id":"2605.10481","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Safe Multi-Agent Behavior Must Be Maintained, Not Merely Asserted: Constraint Drift in LLM-Based Multi-Agent Systems","primary_cat":"cs.MA","submitted_at":"2026-05-11T12:43:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Safety constraints in LLM-based multi-agent systems commonly weaken during execution through memory, communication, and tool use, requiring them to be maintained as explicit state rather than asserted once.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11039","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck","primary_cat":"cs.CR","submitted_at":"2026-05-11T04:09:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in AgentDojo evaluations.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"AgentDojo, and attributes the remaining deployment gap to provenance and contract fidelity. 2 Related Work Tool-level defenses and the granularity assumption.Indirect prompt injection attacks tool-using agents by placing adversarial instructions in webpages, emails, retrieved files, or API responses that later influence replanning [6, 15, 27]. Benchmarks such as AgentDojo [3] and InjecAgent [31] make this threat concrete by measuring both benign task completion and adversarial robustness. The closest defenses to PACT make untrusted influence visible to the runtime. FIDES [2] extends information-flow control to LLM agents through dynamic labels and integrity policies; CaMeL [4] separates privileged planning from quarantined processing of untrusted tool outputs."},{"citing_arxiv_id":"2605.08876","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"OTora: A Unified Red Teaming Framework for Reasoning-Level Denial-of-Service in LLM Agents","primary_cat":"cs.LG","submitted_at":"2026-05-09T10:55:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"OTora provides the first unified framework for reasoning-level denial-of-service attacks on LLM agents, achieving up to 10x more reasoning tokens and order-of-magnitude latency increases while preserving task accuracy across multiple agent types and models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06393","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation","primary_cat":"cs.CR","submitted_at":"2026-05-07T15:08:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"TABLE III COMPARISON WITH REPRESENTATIVE EXISTING WORK. Work category Representative works Targets SHCUA host- level abuse Operation- level risk modeling Trusted classifica- tion / decision path Remote terminal verification SHCUA/OpenClaw se- curity analysis [12], [13], [16], [25] Yes Partial No No Computer-use agent at- tack benchmarks [14], [19], [20], [22], [24] Mostly yes Partial No No Input/model-side defenses [27]-[30] Partial No No No Policy/runtime enforce- ment for agents [31]-[33], [35]-[39] Partial Partial Usually no No Sandboxing and con- strained execution [11], [41], [42] Partial No / partial No No General system-level protection [43]-[46] No No Partial No This work-Yes Yes Yes Yes if the REE-side runtime is compromised."},{"citing_arxiv_id":"2605.00314","ref_index":48,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis","primary_cat":"cs.CR","submitted_at":"2026-05-01T00:48:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Semia synthesizes Datalog representations of agent skills via constraint-guided loops to enable reachability queries for semantic risks, finding critical issues in over half of 13,728 real skills with 97.7% recall on expert-labeled samples.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Recent work frames agent security as an information- flow problem: CaMeL [6] uses a custom interpreter to track data provenance and enforce that untrusted inputs cannot influence security-sensitive calls, solving 67% of AgentDojo tasks with prov- able guarantees. Fides [5] implements dynamic taint-tracking with integrity and confidentiality labels, and RTBAS [ 48] applies de- pendency screening against prompt injection and privacy leakage. These systems enforce policies at runtime;Semiacomplements them with a pre-deployment auditing layer that flags problematic flows before the agent is ever installed. Architectural approaches advocate execution isolation: SEAgent [16] proposes ABAC to monitor agent-tool interactions"},{"citing_arxiv_id":"2604.24700","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Green Shielding: A User-Centric Approach Towards Trustworthy AI","primary_cat":"cs.CL","submitted_at":"2026-04-27T17:04:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Green Shielding introduces CUE criteria and the HCM-Dx benchmark to demonstrate that routine prompt variations systematically alter LLM diagnostic behavior along clinically relevant dimensions, producing Pareto-like tradeoffs in plausibility versus coverage.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Bernhard A Moser, Alina Oprea, Battista Biggio, Marcello Pelillo, and Fabio Roli. Wild patterns reloaded: Asurveyofmachinelearningsecurityagainsttrainingdatapoisoning.ACM Computing Surveys, 55(13s):1-39, 2023. [32] Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking in- direct prompt injections in tool-integrated large language model agents.arXiv preprint arXiv:2403.02691, 2024. [33] Pengzhou Cheng, Yidong Ding, Tianjie Ju, Zongru Wu, Wei Du, Ping Yi, Zhuosheng Zhang, and Gongshen Liu. Trojanrag: Retrieval-augmented generation can be backdoor driver in large language models.arXiv preprint arXiv:2405.13401, 2024. [34] Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia.{PoisonedRAG}: Knowledge corrup- tion attacks to{Retrieval-Augmented}generation of large language models."},{"citing_arxiv_id":"2604.23338","ref_index":93,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework","primary_cat":"cs.CR","submitted_at":"2026-04-25T14:57:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"uniformity of their root cause; the section is organized to make this point. A. Indirect Prompt Injection Greshakeet al.[18], [90] and Liuet al.[91], [92] introduced and systematized the concept ofindirect prompt injection: em- bedding attacker instructions in external content that the agent retrieves during task execution. The threat was subsequently formalized and benchmarked in InjecAgent [93] and Agent Security Bench [94], which demonstrated attack success rates exceeding 60% across leading models in realistic multi-tool scenarios. WIPI [95] extends this threat to web-browsing agents, showing that adversarial instructions embedded in web page content can hijack browsing tasks with high reliability. Web agent benchmarks such as WebArena [96] and WebShop [97]"},{"citing_arxiv_id":"2604.20704","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing","primary_cat":"cs.CR","submitted_at":"2026-04-22T15:46:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Auto-ART delivers the first structured synthesis of adversarial robustness consensus plus an executable multi-norm testing framework that flags gradient masking in 92% of cases on RobustBench and reveals a 23.5 pp robustness gap.","context_count":1,"top_context_role":"background","top_context_polarity":"support","context_text":"(1,054 test cases, 17 user tools), showing ReAct-prompted GPT-4 is vulnerable 24% of the time. The OWASP Agentic Applications Top 10 [ 36] standardises agent-specific threat categories (ASI01: Goal Hijack through ASI10), and MITRE ATLAS v5.3.0 [37] now includes case studies for MCP server compromises and malicious agent deployment. Adversarial resilience and regulatory analysis.Daiet al.[ 38] formalisedcontinual adaptive robustness(CAR), argu- ing that defences should adapt to new attack types rather than only resist known ones-connecting multi-attack robustness to the open-world setting. Panfiliet al.[ 39] provided the first systematic analysis connecting EU AI Act Article 15 to ML robustness terminology, identifying legal challenges in the"},{"citing_arxiv_id":"2604.18874","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"How Adversarial Environments Mislead Agentic AI?","primary_cat":"cs.AI","submitted_at":"2026-04-20T21:53:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Adversarial compromise of tool outputs misleads agentic AI via breadth and depth attacks, revealing that epistemic and navigational robustness are distinct and often trade off against each other.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18500","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance","primary_cat":"cs.MA","submitted_at":"2026-04-20T16:52:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"QRAFTI is a multi-agent framework using tool-calling and reflection-based planning to emulate quant research tasks like factor replication and signal testing on financial data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17562","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SafeAgent: A Runtime Protection Architecture for Agentic Systems","primary_cat":"cs.AI","submitted_at":"2026-04-19T18:02:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SafeAgent is a stateful runtime protection system that improves LLM agent robustness to prompt injections over baselines while preserving task performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15415","ref_index":77,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?","primary_cat":"cs.CR","submitted_at":"2026-04-16T17:31:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Harmful skills in open agent ecosystems raise average harm scores from 0.27 to 0.76 across six LLMs by lowering refusal rates when tasks are presented via pre-installed skills.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Research on LLM-based agent secu- rity has largely studied attacks where the agent or its user is the victim. Liu et al. [36] evaluate the robustness of LLMs against prompt injection attacks, showing that instruction- following capability and vulnerability to injected instructions are positively correlated. Subsequent benchmarks such as In- jecAgent [77], AgentDojo [14], and ASB [78] quantify this and related threats across tool-integrated agents. WIPI [70], EIA [37], and Imprompter [21] extend the attack surface to web agents and adversarial tool misuse. A parallel line of work studies supply-chain backdoors that poison the agent itself through fine-tuning [74, 69] or through retrieval-layer poisoning of memory and knowledge bases [11]."},{"citing_arxiv_id":"2604.12259","ref_index":84,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"A Periodic Space of Distributed Computing: Vision & Framework","primary_cat":"cs.DC","submitted_at":"2026-04-14T04:23:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A periodic framework is proposed to characterize, compare, and predict behaviors across distributed computing solutions by mapping system properties in a structured space inspired by the chemical periodic table.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10134","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification","primary_cat":"cs.CR","submitted_at":"2026-04-11T09:59:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PlanGuard cuts indirect prompt injection attack success rate to 0% on the InjecAgent benchmark by verifying agent actions against a user-instruction-only plan while keeping false positives at 1.49%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16282","ref_index":64,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents","primary_cat":"cs.CY","submitted_at":"2026-04-11T04:25:19+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that benchmark choice systematically alters reported safety.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.05289","ref_index":64,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"FLARE: Agentic Coverage-Guided Fuzzing for LLM-Based Multi-Agent Systems","primary_cat":"cs.SE","submitted_at":"2026-04-07T00:47:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"FLARE extracts specifications from multi-agent LLM code and applies coverage-guided fuzzing to achieve 96.9% inter-agent and 91.1% intra-agent coverage while uncovering 56 new failures across 16 applications.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"6 Related Work 6.1 Multi-Agent LLM Systems Recent advances in LLM have drawn significant attention to MAS in AI [ 5, 15, 17, 51, 61]. This interest stems from shifting benchmark tasks and a strong demand for autonomous capabilities [25, 29]. By coordinating specialized agents, MAS can perform stepwise reasoning and tackle complex, multi-stage problems [64]. Existing MAS have been explored across domains such as drug discovery [7, 8], financial trad- ing [26, 66], software engineering [ 23, 28], and society simulation [ 1, 39]. However, their high degrees of freedom, complexity, and stochastic outputs make them prone to defects in coordi- nation, task execution, inter-agent interaction, and transparency, sometimes with severe conse-"},{"citing_arxiv_id":"2604.04759","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw","primary_cat":"cs.CR","submitted_at":"2026-04-06T15:27:05+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Poisoning any single CIK dimension of an AI agent raises average attack success rate from 24.6% to 64-74% across models, and tested defenses leave substantial residual risk.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04035","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents","primary_cat":"cs.CR","submitted_at":"2026-04-05T09:28:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"The paper defines causality laundering as an attack leaking information from denial outcomes in LLM tool calls and proposes the Agentic Reference Monitor to block it using denial-aware provenance graphs.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"aware model over the flat baseline: denied-action provenance, transitive taint propagation, and field-level provenance. This is sufficient to validate the paper's central claim that these structural features matter, but it does not establish false-positive rates, coverage across diverse agent workflows, or behavior under long provenance histories. A stronger evaluation should include benchmark suites such as AgentDojo [7] and InjectA- gent [30], longer-running trajectories, and experiments with frontier tool-calling agents in the loop. Those studies are important future work, but they would extend rather than replace the present controlled comparison. 9.5 Multi-Agent Extension The current paper studies a single-agent setting. Extending the provenance graph to multi- agent workflows is plausible but non-trivial because delegation, concurrency, and composition"},{"citing_arxiv_id":"2604.03121","ref_index":81,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"An Independent Safety Evaluation of Kimi K2.5","primary_cat":"cs.CR","submitted_at":"2026-04-03T15:45:35+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Kimi K2.5 matches closed models on dual-use tasks but refuses fewer CBRNE requests and shows some sabotage and self-replication tendencies.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"URL https://aisel.aisnet .org/treos_icis2025/104. [79] Ryan Greenblatt, Buck Shlegeris, Mrinank Sachan, and Fabien Roger. Alignment faking in large language models. arXiv preprint arXiv:2412.14093 , 2024. [80] Alexander Meinke, Mislav Balesni, Rusheb Shah, and Fabien Roger. Frontier models are capable of in-context scheming. arXiv preprint arXiv:2412.04984 , 2024. [81] Joe Benton, Jérémy Scheurer, Ryan Greenblatt, Cem Anil, and Ethan Perez. Sabotage evaluations for frontier models. arXiv preprint arXiv:2410.21514 , 2024. [82] Tejal Patwardhan, Rachel Dias, Elizabeth Proehl, Grace Kim, Michele Wang, Olivia Watkins, Simón Posada Fishman, Marwan Aljubeh, Phoebe Thacker, Laurance Fau- connet, Natalie S. Kim, Patrick Chao, Samuel Miserendino, Gildas Chabot, David Li,"},{"citing_arxiv_id":"2604.03070","ref_index":63,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study","primary_cat":"cs.CR","submitted_at":"2026-04-03T14:50:16+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Analysis of 17k LLM agent skills reveals 520 vulnerable ones with 1,708 leakage issues, primarily from debug output exposure, with a 10-pattern taxonomy and released dataset for future detection.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.28013","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers","primary_cat":"cs.CR","submitted_at":"2026-03-30T04:07:18+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Stage-level tracking of prompt injection reveals that write-node placement and model-specific behaviors determine attack outcomes more than initial exposure in LLM pipelines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.16708","ref_index":67,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Formal Policy Enforcement for Real-World Agentic Systems","primary_cat":"cs.CR","submitted_at":"2026-02-18T18:57:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"FORGE enforces security policies in agentic systems via Datalog over abstract predicates with an observability service and reference monitor that guarantees policy semantics when the environment contract holds.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.23883","ref_index":37,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges","primary_cat":"cs.AI","submitted_at":"2025-10-27T21:48:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"URL https: //www.security.com/threat-intelligence/ai-agent-attacks. Accessed: 2025-08-16. [36] Hadi Askari, Anshuman Chhabra, Muhao Chen, and Prasant Mohapatra. Assessing llms for zero-shot abstractive summarization through the lens of relevance paraphrasing. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 2187-2201, 2025. [37] Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents.arXiv preprint arXiv:2403.02691, 2024. [38] Donghyun Lee and Mo Tiwari. Prompt infection: Llm-to-llm prompt injection within multi-agent systems.arXiv preprint arXiv:2410.07283, 2024. [39] Shahriar Kabir Nahin, Hadi Askari, Muhao Chen, and Anshuman Chhabra."},{"citing_arxiv_id":"2504.20472","ref_index":46,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction","primary_cat":"cs.CR","submitted_at":"2025-04-29T07:13:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The method prompts LLMs to output both answers and references to the executed instructions, then filters out any answers not linked to the original input instructions, reducing attack success rates to zero in tested scenarios while preserving utility.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2504.19793","ref_index":63,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Prompt Injection Attack to Tool Selection in LLM Agents","primary_cat":"cs.CR","submitted_at":"2025-04-28T13:36:43+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ToolHijacker optimizes malicious tool documents via a two-phase strategy to hijack LLM agents' tool selection in no-box settings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2503.21460","ref_index":205,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Large Language Model Agent: A Survey on Methodology, Applications and Challenges","primary_cat":"cs.CL","submitted_at":"2025-03-27T12:50:17+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"User Input Falsifying. Modifying the user input is the most straightforward and widely used data-centric attacks. These injections [176] can lead to uncontrolled and dangerous outputs. Though it is simple, it always achieves the highest Attack Success Rate (ASR) [176], [203]. Li et al. [204] propose malicious prefix prompts, such as \"ignore the document\". InjectAgent [205] and Agentdojo [203] are two prompt injection benchmarks, which test the single and multi-turn attacks in LLM agents. As the widespread effect of injections on user inputs increases, various defense models have been designed. Mantis [206] defenses through hacking back to attackers' own systems. [207] offers a defense module called the Input Firewall, which extracts key points from"},{"citing_arxiv_id":"2502.01241","ref_index":42,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Peering Behind the Shield: Guardrail Identification in Large Language Models","primary_cat":"cs.CR","submitted_at":"2025-02-03T11:02:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AP-Test identifies deployed guardrails in LLMs via adversarial prompt testing and a match score metric, reporting perfect accuracy on four open-source guardrails.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2410.09024","ref_index":33,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents","primary_cat":"cs.LG","submitted_at":"2024-10-11T17:39:22+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AgentHarm benchmark shows leading LLMs comply with malicious agent requests and simple jailbreaks enable coherent harmful multi-step execution while retaining capabilities.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2406.13352","ref_index":71,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents","primary_cat":"cs.CR","submitted_at":"2024-06-19T08:55:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"return malicious data. Even when restricted to benign settings, our tasks are at least challenging as existing function-calling benchmarks, see Figure 2.2 Prior benchmarks for prompt injections focus on sim- ple scenarios without tool-calling, such as document QA [70], prompt stealing [12, 57], or simpler goal/rule hijacking [39, 52]. The recent InjecAgent benchmark [71] is close in spirit to AgentDojo, but focuses on simulated single-turn scenarios, where an LLM is directly fed a single (adversarial) piece of data as a tool output (without evaluating the model's planning). In contrast, AgentDojo's design aims to emulate a realistic agent execution, where the agent has to decide which tool(s) to call and must solve"},{"citing_arxiv_id":"2404.08144","ref_index":25,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"LLM Agents can Autonomously Exploit One-day Vulnerabilities","primary_cat":"cs.CR","submitted_at":"2024-04-11T22:07:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"GPT-4 LLM agents autonomously exploit 87% of tested one-day vulnerabilities when given CVE descriptions, far outperforming other models and tools.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}