{"total":13,"items":[{"citing_arxiv_id":"2606.10749","ref_index":84,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation","primary_cat":"cs.CR","submitted_at":"2026-06-09T12:01:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"an agentic loop in which the large language model participates in goal interpretation, planning, tool selection, action execution, state update, progress monitoring, or coordination with other agents [76, 143, 156]. This framing allows web agents, coding agents, memory-augmented assistants, embodied systems, and multi-agent workflows to be analyzed within a shared security vocabulary [84, 130, 237]. It also clarifies why LLM agent security cannot be reduced to generic LLM security. In text-only settings, unsafe behavior typically produces problematic textual content [113, 177]. In agentic settings, the same underlying vulnerability may instead redirect a workflow, invoke privileged tools, leak sensitive data, poison persistent or temporary memory, or propagate malicious"},{"citing_arxiv_id":"2606.00497","ref_index":137,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"\"I Strongly Suspect This Website Is a Scam\": Benchmarking PII Leakage and Detection without Defense in Autonomous Web Agents","primary_cat":"cs.CR","submitted_at":"2026-05-30T03:00:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"New benchmark Scammer4U finds 54-93% critical PII leakage from frontier web agents on scam sites versus 0% on benign twins, plus a 30-point gap between verbalized suspicion and actual submission.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29082","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane","primary_cat":"cs.AI","submitted_at":"2026-05-27T20:37:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"The Redpanda Agentic Data Plane uses out-of-band metadata channels to enforce data scoping, action constraints, and tamper-proof auditing on autonomous AI agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28999","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening","primary_cat":"cs.CR","submitted_at":"2026-05-27T18:56:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Roughly 1% of real resumes contain hidden prompt injections against LLM screeners, prevalence has risen over 1-2 years, and over 90% avoid explicit instructions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26542","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ChainCaps: Composition-Safe Tool-Using Agents via Monotonic Capability Attenuation","primary_cat":"cs.CR","submitted_at":"2026-05-26T04:44:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ChainCaps prevents permission laundering in tool-using agents by enforcing monotonic capability attenuation through budget intersection, reducing attack success from 25-68% to 0-4.8% on 82 tasks while maintaining 96-100% benign performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26497","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Aligning Provenance with Authorization: A Dual-Graph Defense for LLM Agents","primary_cat":"cs.CR","submitted_at":"2026-05-26T03:20:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AuthGraph aligns an execution provenance graph with a clean authorization graph to detect parameter-source deviations from user intent, reducing attack success rates to 1-2% on AgentDojo and AgentDyn while retaining most task utility.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11770","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Behavioral Integrity Verification for AI Agent Skills","primary_cat":"cs.CR","submitted_at":"2026-05-12T08:41:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"agents via poisoning memory or knowledge bases. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. [29] Jiawen Shi, Zenghui Yuan, Guiyao Tie, Pan Zhou, Neil Zhenqiang Gong, and Lichao Sun. Prompt injection attack to tool selection in LLM agents. InNetwork and Distributed System Security Symposium (NDSS), 2026. arXiv preprint arXiv:2504.19793. [30] Juhee Kim, Woohyuk Choi, and Byoungyoung Lee. Prompt flow integrity to prevent privilege escalation in LLM agents.arXiv preprint arXiv:2503.15547, 2025. 12 [31] Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you've signed up for: Compromising real-world LLM-integrated applications with indirect prompt"},{"citing_arxiv_id":"2605.05868","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills","primary_cat":"cs.CR","submitted_at":"2026-05-07T08:34:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Liu, and Philip Torr. 2026. Skillject: Automating stealthy skill-based prompt injection for coding agents with trace-driven closed-loop refinement.arXiv preprint arXiv:2602.14211(2026). [23] Juhee Kim, Woohyuk Choi, and Byoungyoung Lee. 2025. Prompt flow integrity to prevent privilege escalation in llm agents.arXiv preprint arXiv:2503.15547 (2025). [24] Hao Li, Xiaogeng Liu, Hung-Chun Chiu, Dianqi Li, Ning Zhang, and Chaowei Xiao. 2025. Drift: Dynamic rule-based defense with injection isolation for securing llm agents.arXiv preprint arXiv:2506.12104(2025). [25] Yixi Lin, Jiangrong Wu, Yuhong Nan, Xueqiang Wang, Xinyuan Zhang, and Zibin Zheng. 2026. AgentRaft: Automated Detection of Data Over-Exposure in LLM"},{"citing_arxiv_id":"2604.24026","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills","primary_cat":"cs.CL","submitted_at":"2026-04-27T04:25:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SSL representation disentangles skill scheduling, structure, and logic using an LLM normalizer, improving skill discovery MRR@50 from 0.649 to 0.729 and risk assessment macro F1 from 0.409 to 0.509 over text baselines.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769-6781, Online, 2020. Association for Computational Linguistics. doi: 10. 18653/v1/2020.emnlp-main.550. URLhttps://aclanthology.org/2020.emnlp-main.550/. 10 Juhee Kim, Woohyuk Choi, and Byoungyoung Lee. Prompt flow integrity to prevent privilege escalation in LLM agents.arXiv preprint arXiv:2503.15547, 2025. doi: 10.48550/arXiv.2503. 15547. URLhttps://arxiv.org/abs/2503.15547. Koren Lazar, Matan Vetzler, Kiran Kate, Jason Tsay, David Boaz, Himanshu Gupta, Avraham Shinnar,"},{"citing_arxiv_id":"2604.15579","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility","primary_cat":"cs.SE","submitted_at":"2026-04-16T23:18:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Symbolic guardrails enforce 74% of specified safety policies in agent benchmarks and boost safety without hurting utility.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"systematic literature review, we analyze a large corpus of AI agent benchmarks that evaluate safety or security, and we extract and classify the safety or security policies they define. 3.1.1 Identifying Benchmark Papers.To capture a comprehensive set of benchmarks on AI agent safety or security, we perform a systematic literature review following established guidelines [34]. Search Criteria.We aim to identify papers that (1) propose one or more benchmarks, (2) evaluate tool-use LLM-based agents, and (3) incorporate safety or security considerations into the evaluation. We use the arXiv API as our search interface because most recent papers relevant to AI agents are available on arXiv, often well before formal publication."},{"citing_arxiv_id":"2604.08499","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PIArena: A Platform for Prompt Injection Evaluation","primary_cat":"cs.CR","submitted_at":"2026-04-09T17:42:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PIArena provides a unified evaluation platform for prompt injection attacks and defenses, featuring a new adaptive attack that reveals major weaknesses in existing protections.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.07536","ref_index":43,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TRUSTDESC: Preventing Tool Poisoning in LLM Applications via Trusted Description Generation","primary_cat":"cs.CR","submitted_at":"2026-04-08T19:18:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"TRUSTDESC prevents tool poisoning in LLM applications by automatically generating accurate tool descriptions from code via a three-stage pipeline of reachability analysis, description synthesis, and dynamic verification.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.23978","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLM Agents Are the Antidote to Walled Gardens","primary_cat":"cs.LG","submitted_at":"2025-06-30T15:45:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"LLM agents enable universal interoperability by serving as automatic translators and adapters between proprietary digital services.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}