Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

Chunrong Fang; Shengcheng Yu; Yuchen Ling; Zhenyu Chen

arxiv: 2606.10749 · v1 · pith:PC7WX4DDnew · submitted 2026-06-09 · 💻 cs.CR · cs.AI

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

Yuchen Ling , Shengcheng Yu , Zhenyu Chen , Chunrong Fang This is my paper

Pith reviewed 2026-06-27 12:53 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords LLM agentssecurity threatsprompt injectiontool misusestate managementtrust boundariesevaluation benchmarksmulti-agent systems

0 comments

The pith

Secure LLM agents require explicit trust boundaries, principled privilege control, provenance-aware state management, and evaluation practices aligned with realistic operational settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reviews 247 studies on LLM agent security, where agents plan actions, call tools, keep memory, and interact with outside systems. It frames the problem through a systems view of how information flows, authority gets delegated, and state persists across steps. The review shows prompt injection and tool-based control hijacking as the leading threats, while state corruption and issues spreading across multiple agents are rising. Defenses exist as separate pieces but do not combine reliably, and current tests overlook extended, stateful, or real-deployment scenarios. The authors conclude that agents will stay insecure without deliberate design choices around trust limits and state tracking.

Core claim

By synthesizing 247 papers through a lifecycle-based, systems-oriented framework, the authors determine that prompt injection and tool-mediated control-flow hijacking dominate LLM agent threats, with persistent state corruption and multi-agent propagation emerging as key concerns. Defenses offer useful components but lack strong composition, and evaluations often miss long-horizon and deployment realities. This leads to the claim that secure LLM agents depend on explicit trust boundaries, principled privilege control, provenance-aware state management, and aligned evaluation practices.

What carries the argument

The lifecycle-based, systems-oriented framework modeling agent security around the interaction of information flow, delegated authority, and persistent state.

If this is right

Prompt injection and tool-mediated control-flow hijacking dominate the field of LLM agent attacks.
Persistent state corruption and multi-agent propagation are becoming central emerging concerns.
Current defenses provide useful building blocks but remain weakly compositional.
Existing benchmarks still underrepresent long-horizon, stateful, and deployment-sensitive risks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adopting provenance tracking in agent memory could stop corrupted data from affecting later decisions in long-running tasks.
The framework suggests developers should treat tool calls with the same care as external API access in conventional software.
Evaluation methods might improve by testing agents across multiple sessions that carry state forward.
These requirements echo least-privilege ideas but must handle the open-ended planning that LLMs perform.

Load-bearing premise

The 247 selected papers sufficiently represent the broader literature on LLM agent security to identify the main threats, defenses, and gaps.

What would settle it

A new survey covering a substantially different collection of papers that finds other attack types dominant or shows that existing defenses compose effectively in practice would undermine the synthesis.

Figures

Figures reproduced from arXiv: 2606.10749 by Chunrong Fang, Shengcheng Yu, Yuchen Ling, Zhenyu Chen.

**Figure 2.** Figure 2: Corpus Distributions by Year, Primary Paper Type, and System Setting [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Lifecycle-by-Surface Matrix of Major Threat Surfaces and Focal Lifecycle Stages [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Representative Propagation Patterns of LLM Agent Attacks [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Layered Defense Stack and Coverage Gaps for LLM Agents [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: Benchmark Co-occurrence Counts Across Threat Surfaces, Lifecycle Stages, Scenarios, and Reporting [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

read the original abstract

Large language model (LLM) agents are rapidly moving from conversational interfaces to software components that plan, invoke tools, maintain memory, and act on external environments. This transition changes the nature of security risk. In agentic settings, failures are no longer limited to unsafe text generation. Untrusted content may redirect control flow, misuse tool privileges, corrupt persistent state, leak sensitive information, or trigger harmful external actions. At the same time, research on LLM agent security is expanding quickly but remains fragmented across attack families, defense layers, application domains, and evaluation settings. This paper synthesizes 247 papers through a lifecycle-based, systems-oriented framework that models agent security around the interaction of information flow, delegated authority, and persistent state. We organize the literature around four questions: how LLM agent security should be modeled, which threat surfaces and attack families dominate, what defenses have been proposed and with what tradeoffs, and how security claims are evaluated. We find that prompt injection and tool-mediated control-flow hijacking still dominate the field, while persistent state corruption and multi-agent propagation are becoming central emerging concerns. We further find that current defenses provide useful building blocks but remain weakly compositional, and that existing benchmarks still underrepresent long-horizon, stateful, and deployment-sensitive risks. We argue that secure LLM agents require explicit trust boundaries, principled privilege control, provenance-aware state management, and evaluation practices aligned with realistic operational settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A straightforward survey that maps 247 papers onto a lifecycle framework for LLM agent security but introduces no new technical results or verified claims.

read the letter

This paper pulls together existing work on LLM agent security into one place using a systems view centered on information flow, delegated authority, and persistent state. It flags prompt injection and tool hijacking as the current main threats, notes state corruption and multi-agent spread as growing issues, and points out that defenses do not compose well and that benchmarks miss long-running, stateful cases. The call for explicit trust boundaries, privilege controls, provenance tracking, and realistic evaluation follows directly from that mapping.

The main contribution is the organization itself. Grouping the literature around those four questions gives readers a usable structure for thinking about where the field stands. That is useful for anyone trying to get oriented quickly.

The soft spot is the lack of visible detail on how the 247 papers were selected, searched, or categorized. Without that, the statements about what dominates or what is emerging rest on an uncheckable corpus. Surveys often have this issue, but it makes the dominance claims harder to rely on. No new data, experiments, or formal arguments appear.

This is for people already working in AI security who need a reference map rather than a novel technique. It is worth sending to peer review because a cleaned-up version could serve as a standard starting point for the subfield, even if the conclusions stay at the level of synthesis.

Referee Report

2 major / 0 minor

Summary. The manuscript is a survey synthesizing 247 papers on LLM agent security. It introduces a lifecycle-based, systems-oriented framework organized around information flow, delegated authority, and persistent state. The work addresses four questions: modeling of agent security, dominant threat surfaces and attack families, proposed defenses and tradeoffs, and evaluation practices. Findings include continued dominance of prompt injection and tool-mediated control-flow hijacking, emergence of persistent state corruption and multi-agent propagation risks, weak compositionality of defenses, and underrepresentation of long-horizon/stateful risks in benchmarks. The authors conclude that secure LLM agents require explicit trust boundaries, principled privilege control, provenance-aware state management, and evaluation aligned with realistic operational settings.

Significance. If the corpus selection is systematic and representative, the survey offers a useful organizing taxonomy for a rapidly expanding but fragmented area. It could help focus research on compositional defenses and more realistic benchmarks, particularly by distinguishing text-generation risks from agentic control and state issues. The framework itself provides a concrete lens for future work on trust boundaries and provenance.

major comments (2)

[Abstract and Introduction] Abstract and Introduction: The synthesis of 247 papers is presented as the basis for identifying dominant threats (prompt injection, tool hijacking) and evaluation gaps, yet no search methodology, inclusion/exclusion criteria, database sources, or categorization protocol is described. This is load-bearing because the claims about what 'dominates' and what is 'emerging' cannot be verified for coverage bias across domains or multi-agent settings without these details.
[Findings sections on defenses and evaluation] Findings on defenses and evaluation (the four-question organization): The assertion that 'current defenses provide useful building blocks but remain weakly compositional' and that 'existing benchmarks still underrepresent long-horizon, stateful risks' rests on the reviewed corpus; without an explicit mapping or protocol showing how papers were classified into these categories, the trade-off and gap conclusions cannot be assessed for completeness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback emphasizing the need for methodological transparency. We address each major comment below.

read point-by-point responses

Referee: [Abstract and Introduction] Abstract and Introduction: The synthesis of 247 papers is presented as the basis for identifying dominant threats (prompt injection, tool hijacking) and evaluation gaps, yet no search methodology, inclusion/exclusion criteria, database sources, or categorization protocol is described. This is load-bearing because the claims about what 'dominates' and what is 'emerging' cannot be verified for coverage bias across domains or multi-agent settings without these details.

Authors: We agree that the search methodology, inclusion/exclusion criteria, database sources, and categorization protocol must be described to support the claims about dominant and emerging threats. The provided manuscript text does not include these details in the abstract or introduction. In the revised version we will insert a dedicated 'Survey Methodology' subsection that specifies the databases (arXiv, Google Scholar, major security and AI conferences), search strings, date range, inclusion criteria (papers on LLM agents involving planning, tools, memory, or external actions with security implications), exclusion criteria (non-agent LLM safety papers), and the multi-author categorization protocol used to assign works to the four organizing questions. This addition will allow readers to evaluate potential coverage bias. revision: yes
Referee: [Findings sections on defenses and evaluation] Findings on defenses and evaluation (the four-question organization): The assertion that 'current defenses provide useful building blocks but remain weakly compositional' and that 'existing benchmarks still underrepresent long-horizon, stateful risks' rests on the reviewed corpus; without an explicit mapping or protocol showing how papers were classified into these categories, the trade-off and gap conclusions cannot be assessed for completeness.

Authors: We concur that the conclusions on weak compositionality of defenses and underrepresentation of long-horizon risks require an explicit classification protocol and mapping. The current manuscript states the findings but does not provide a detailed mapping or protocol description. We will revise by adding an appendix that summarizes the distribution of the 247 papers across categories (with representative examples) and a description of the classification process (independent review of abstracts and key sections by multiple authors, with consensus resolution). This will make the evidentiary basis for the trade-off and gap claims verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: survey synthesis rests on external literature

full rationale

This is a survey paper that organizes and synthesizes 247 external papers under a lifecycle framework. No mathematical derivations, fitted parameters, equations, or ansatzes appear. Central claims about required security properties (trust boundaries, privilege control, provenance-aware state) are presented as conclusions drawn from the reviewed literature rather than self-defined or forced by internal construction. Self-citations, if present, are not load-bearing for the synthesis itself. The enumerated circularity patterns (self-definitional, fitted-input prediction, uniqueness imported from authors, etc.) do not apply. Representativeness of the corpus is a methodological validity question outside the circularity criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper; no free parameters, mathematical axioms, or invented entities are introduced.

pith-pipeline@v0.9.1-grok · 5791 in / 1032 out tokens · 16927 ms · 2026-06-27T12:53:30.520648+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Agents That Know Too Much: A Data-Centric Survey of Privacy in LLM Agents
cs.CR 2026-06 unverdicted novelty 5.0

A data-centric survey finds that only information-flow control covers compositional and cross-session leakage in LLM agents and that no single benchmark tests an agent across all its data surfaces under one policy.

Reference graph

Works this paper leans on

260 extracted references · 208 canonical work pages · cited by 1 Pith paper · 49 internal anchors

[1]

Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, João Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi, Byeonghyeon Kim, Hyeonjin Lee, Benjamin Pannell, Conor McCauley, Mark Russinovich, And...

work page doi:10.48550/arxiv.2506.09956 2025
[2]

Sahar Abdelnabi, Amr Gomaa, Eugene Bagdasarian, Per Ola Kristensson, and Reza Shokri. 2025. Firewalls to Secure Dynamic LLM Agentic Networks. arXiv:2502.01822 [cs.CR] doi:10.48550/arXiv.2502.01822

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.01822 2025
[3]

Alsharif Abuadbba, Nazatul Sultan, Surya Nepal, and Sanjay Jha. 2026. Human Society-Inspired Approaches to Agentic AI Security: The 4C Framework. arXiv:2602.01942 [cs.CR] doi:10.48550/arXiv.2602.01942

work page doi:10.48550/arxiv.2602.01942 2026
[4]

Tanzim Ahad, Ismail Hossain, Md Jahangir Alam, Sai Puppala, Yoonpyo Lee, Syed Bahauddin Alam, and Sajedul Talukder. 2026. Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines. arXiv:2604.08608 [cs.CR] doi:10.48550/arXiv.2604.08608

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.08608 2026
[5]

Hengyu An, Minxi Li, Jinghuai Zhang, Naen Xu, Chunyi Zhou, Changjiang Li, Xiaogang Xu, Tianyu Du, and Shouling Ji. 2026. ACIArena: Toward Unified Evaluation for Agent Cascading Injection. arXiv:2604.07775 [cs.AI] doi:10.48550/arXiv.2604.07775

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.07775 2026
[6]

Hengyu An, Jinghuai Zhang, Tianyu Du, Chunyi Zhou, Qingming Li, Tao Lin, and Shouling Ji. 2025. IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1023–1039. doi:10.18653/V...

work page doi:10.18653/v1/2025.emnlp-main.53 2025
[7]

Zico Kolter, Matt Fredrikson, Yarin Gal, and Xander Davies

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, J. Zico Kolter, Matt Fredrikson, Yarin Gal, and Xander Davies. 2025. AgentHarm: A Bench- mark for Measuring Harmfulness of LLM Agents. InInternational Conference on Learning Representations (ICLR). OpenReview.net

2025
[8]

Anthropic. 2024. Developing a computer use model. Retrieved May 20, 2026 from https://www.anthropic.com/news/ developing-computer-use Research report

2024
[9]

Anthropic. 2025. Claude Sonnet 4 and Opus 4 System Card. Retrieved May 20, 2026 from https://www-cdn.anthropic. com/07b2a3f9902ee19fe39a36ca638e5ae987bc64dd.pdf System card

2025
[10]

Nirmit Arora, Sathvik Joel, Ishan Kavathekar, Palak, Rohan Gandhi, Yash Pandya, Tanuja Ganu, Aditya Kanade, and Akshay Nambi. 2025. Exposing Weak Links in Multi-Agent Systems under Adversarial Prompting. arXiv:2511.10949 [cs.MA] doi:10.48550/arXiv.2511.10949

work page doi:10.48550/arxiv.2511.10949 2025
[11]

Sadia Asif and Mohammad Mohammadi Amiri. 2026. Information-Theoretic Privacy Control for Sequential Multi- Agent LLM Systems. arXiv:2603.05520 [cs.MA] doi:10.48550/arXiv.2603.05520 ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: January 2026. 1:30 Yuchen Ling, Shengcheng Yu, Zhenyu Chen, and Chunrong Fang

work page doi:10.48550/arxiv.2603.05520 2026
[12]

Eugene Bagdasarian, Ren Yi, Sahra Ghalebikesabi, Peter Kairouz, Marco Gruteser, Sewoong Oh, Borja Balle, and Daniel Ramage. 2024. AirGapAgent: Protecting Privacy-Conscious Conversational Agents. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. ACM, 3868–3882. doi:10.1145/3658644.3690350

work page doi:10.1145/3658644.3690350 2024
[13]

Sunil Kumar Jang Bahadur and Gopala Dhar. 2025. Securing Generative AI Agentic Workflows: Risks, Mitigation, and a Proposed Firewall Architecture. arXiv:2506.17266 [cs.CR] doi:10.48550/arXiv.2506.17266

work page doi:10.48550/arxiv.2506.17266 2025
[14]

Julia Bazinska, Max Mathys, Francesco Casucci, Mateo Rojas-Carulla, Xander Davies, Alexandra Souly, and Niklas Pfis- ter. 2025. Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents. arXiv:2510.22620 [cs.CR] doi:10.48550/arXiv.2510.22620

work page doi:10.48550/arxiv.2510.22620 2025
[15]

Roy Betser, Shamik Bose, Amit Giloni, Chiara Picardi, Sindhu Padakandla, and Roman Vainshtein. 2026. AgenTRIM: Tool Risk Mitigation for Agentic AI. arXiv:2601.12449 [cs.CR] doi:10.48550/arXiv.2601.12449

work page doi:10.48550/arxiv.2601.12449 2026
[16]

Luca Beurer-Kellner, Beat Buesser, Ana-Maria Cretu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, and Václav Volhejn. 2025. Design Patterns for Securing LLM Agents against Prompt Injections. arXiv:2506.08837 [cs.LG] doi:10.48550/arXiv.2506.08837

work page doi:10.48550/arxiv.2506.08837 2025
[17]

Rohini Bhosale, Pankaj Chandre, Sushma Mehetre, Swati Powar, Shubhra Mathur, and Arun Ghandat. 2026. The Dark Side of Autonomous Intelligence: a survey on data leakage and privacy failures in agentic AI.Frontiers in Computer Science8 (2026), 1802727. doi:10.3389/fcomp.2026.1802727

work page doi:10.3389/fcomp.2026.1802727 2026
[18]

Léo Boisvert, Mihir Bansal, Chandra Kiran Reddy Evuru, Gabriel Huang, Abhay Puri, Avinandan Bose, Maryam Fazel, Quentin Cappart, Jason Stanley, Alexandre Lacoste, Alexandre Drouin, and Krishnamurthy Dvijotham. 2025. DoomArena: A framework for Testing AI Agents Against Evolving Security Threats. arXiv:2504.14064 [cs.CR] doi:10.48550/arXiv.2504.14064

work page doi:10.48550/arxiv.2504.14064 2025
[19]

Christoph Bühler, Matteo Biagiola, Luca Di Grazia, and Guido Salvaneschi. 2025. AgentBound: Securing Execution Boundaries of AI Agents. arXiv:2510.21236 [cs.CR] doi:10.48550/arXiv.2510.21236

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.21236 2025
[20]

Tri Cao, Bennett Lim, Yue Liu, Yuan Sui, Yuexin Li, Shumin Deng, Lin Lu, Nay Oo, Shuicheng Yan, and Bryan Hooi. 2025. VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents. arXiv:2506.02456 [cs.AI] doi:10.48550/arXiv.2506.02456

work page doi:10.48550/arxiv.2506.02456 2025
[21]

Aceves-Fernández, Luis R

Victor Castro-Maldonado, Marco A. Aceves-Fernández, Luis R. García-Noguez, and Jesús C. Pedraza-Ortega. 2026. Semantic Firewalls with Online Ensemble Learning for Secure Agentic RAG Systems in Financial Chatbots.AI7, 3 (2026), 80. doi:10.3390/ai7030080

work page doi:10.3390/ai7030080 2026
[22]

Hwan Chang, Yonghyun Jun, and Hwanhee Lee. 2025. ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents. arXiv:2509.22830 [cs.CL] doi:10.48550/arXiv.2509.22830

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509.22830 2025
[23]

Fengchao Chen, Tingmin Wu, Van Nguyen, and Carsten Rudolph. 2026. Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents. arXiv:2601.10758 [cs.CR] doi:10.48550/arXiv.2601.10758

work page doi:10.48550/arxiv.2601.10758 2026
[24]

Jizhou Chen and Samuel Lee Cong. 2025. AgentGuard: Repurposing Agentic Orchestrator for Safety Evaluation of Tool Orchestration. arXiv:2502.09809 [cs.CR] doi:10.48550/arXiv.2502.09809

work page doi:10.48550/arxiv.2502.09809 2025
[25]

Ruolin Chen, Yinqian Sun, Jihang Wang, Mingyang Lv, Qian Zhang, and Yi Zeng. 2025. SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents. arXiv:2509.25885 [cs.AI] doi:10.48550/arXiv.2509.25885

work page doi:10.48550/arxiv.2509.25885 2025
[26]

Yurun Chen, Xueyu Hu, Keting Yin, Juncheng Li, and Shengyu Zhang. 2025. Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection Attacks. arXiv:2502.13053 [cs.CL] doi:10.48550/arXiv.2502.13053

work page doi:10.48550/arxiv.2502.13053 2025
[27]

Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. 2024. AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases. InAdvances in Neural Information Processing Systems. doi:10.52202/079017- 4136

work page doi:10.52202/079017- 2024
[28]

Pengzhou Cheng, Zheng Wu, Zongru Wu, Tianjie Ju, Aston Zhang, Zhuosheng Zhang, and Gongshen Liu. 2025. OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents. InFindings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, 6701–6725. doi:10.18653/v1/2025.findings-acl.348

work page doi:10.18653/v1/2025.findings-acl.348 2025
[29]

Sahana Chennabasappa, Cyrus Nikolaidis, Daniel Song, David Molnar, Stephanie Ding, Shengye Wan, Spencer Whitman, Lauren Deason, Nicholas Doucette, Abraham Montilla, Alekhya Gampa, Beto de Paola, Dominik Gabi, James Crnkovich, Jean-Christophe Testud, Kat He, Rashnil Chaturvedi, Wu Zhou, and Joshua Saxe. 2025. LlamaFirewall: An open source guardrail system ...

work page doi:10.48550/arxiv.2505.03574 2025
[30]

Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. 2025. Securing AI Agents with Information-Flow Control. arXiv:2505.23643 [cs.CR] doi:10.48550/arXiv.2505.23643

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.23643 2025
[31]

Pedro M. P. Curvo. 2025. The Traitors: Deception and Trust in Multi-Agent Language Model Simulations. arXiv:2505.12923 [cs.AI] doi:10.48550/arXiv.2505.12923

work page doi:10.48550/arxiv.2505.12923 2025
[32]

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. 2025. Defeating Prompt Injections by Design. arXiv:2503.18813 [cs.CR] doi:10.48550/arXiv.2503.18813 ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: January 2026. Tow...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.18813 2025
[33]

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents. InAdvances in Neural Information Processing Systems. doi:10.52202/079017-2636

work page doi:10.52202/079017-2636 2024
[34]

Xinhao Deng, Yixiang Zhang, Jiaqing Wu, Jiaqi Bai, Sibo Yi, Zhuoheng Zou, Yue Xiao, Rennai Qiu, Jianan Ma, Jialuo Chen, Xiaohu Du, Xiaofang Yang, Shiwen Cui, Changhua Meng, Weiqiang Wang, Jiaxing Song, Ke Xu, and Qi Li. 2026. Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats. arXiv:2603.11619 [cs.CR] doi:10.48550/arXiv.2603.11619

work page doi:10.48550/arxiv.2603.11619 2026
[35]

Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, and Yang Xiang. 2025. AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways.Comput. Surveys57, 7 (2025), 1–36. doi:10.1145/3716628

work page doi:10.1145/3716628 2025
[36]

Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. 2025. Memory Injection Attacks on LLM Agents via Query-Only Interaction. InAdvances in Neural Information Processing Systems

2025
[37]

Zenghao Duan, Yuxin Tian, Zhiyi Yin, Liang Pang, Jingcheng Deng, Zihao Wei, Shicheng Xu, Yuyao Ge, and Xueqi Cheng. 2026. SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement. arXiv:2604.04989 [cs.CR] doi:10.48550/arXiv.2604.04989

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.04989 2026
[38]

Mateusz Dziemian, Maxwell Lin, Xiaohan Fu, Micha Nowak, Nick Winter, Eliot Krzysztof Jones, Andy Zou, Lama Ahmad, Kamalika Chaudhuri, Sahana Chennabasappa, Xander Davies, Lauren Deason, Benjamin L. Edelman, Tanner Emek, Ivan Evtimov, Jim Gust, Maia Hamin, Kat He, Klaudia Krawiecka, Riccardo Patana, Neil Perry, Troy Peterson, Xiangyu Qi, Javier Rando, Zifa...

work page doi:10.48550/arxiv.2603.15714 2026
[40]

Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang. 2024. LLM Agents can Autonomously Exploit One-day Vulnerabilities. arXiv:2404.08144 [cs.CR] doi:10.48550/arXiv.2404.08144

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.08144 2024
[41]

Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, and Daniel Kang. 2024. LLM Agents can Autonomously Hack Websites. arXiv:2402.06664 [cs.CR] doi:10.48550/arXiv.2402.06664

work page doi:10.48550/arxiv.2402.06664 2024
[42]

Yunhao Feng, Yifan Ding, Yingshui Tan, Boren Zheng, Yanming Guo, Xiaolong Li, Kun Zhai, Yishan Li, and Wenke Huang. 2026. SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems. arXiv:2604.06811 [cs.CR] doi:10.48550/ arXiv.2604.06811

Pith/arXiv arXiv 2026
[43]

Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, Abderrahmane Lakas, and Merouane Debbah. 2026. From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows.ICT Express12, 2 (2026), 353–383. doi:10.1016/J.ICTE.2025.12.001

work page doi:10.1016/j.icte.2025.12.001 2026
[44]

Hanna Foerster, Robert Mullins, Tom Blanchard, Nicolas Papernot, Kristina Nikolic, Florian Tramèr, Ilia Shumailov, Cheng Zhang, and Yiren Zhao. 2026. CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents. arXiv:2601.09923 [cs.AI] doi:10.48550/arXiv.2601.09923

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.09923 2026
[45]

Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes

Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K. Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes
[46]

Rethinking VLMs and LLMs for Image Classification.arXiv e-prints, art

Imprompter: Tricking LLM Agents into Improper Tool Use. arXiv:2410.14923 [cs.CR] doi:10.48550/arXiv.2410. 14923

work page doi:10.48550/arxiv.2410
[47]

Yuchuan Fu, Xiaohan Yuan, and Dongxia Wang. 2025. RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments. arXiv:2506.15253 [cs.CR] doi:10.48550/arXiv.2506.15253

work page doi:10.48550/arxiv.2506.15253 2025
[48]

Yuyou Gan, Yong Yang, Zhe Ma, Ping He, Rui Zeng, Yiming Wang, Qingming Li, Chunyi Zhou, Songze Li, Ting Wang, Yunjun Gao, Yingcai Wu, and Shouling Ji. 2024. Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents. arXiv:2411.09523 [cs.AI] doi:10.48550/arXiv.2411.09523

work page doi:10.48550/arxiv.2411.09523 2024
[49]

Yiang Gao and Shanshan Wu. 2025. A Four-Layer Security Governance Framework for LLM-Based AI Agents.Journal of Artificial Intelligence Practice8, 4 (2025), 49–55. doi:10.23977/jaip.2025.080406

work page doi:10.23977/jaip.2025.080406 2025
[50]

Tarek Gasmi, Ramzi Guesmi, Jihene Bennaceur, and Ines Belhadj. 2026. Bridging AI and software security: A comparative vulnerability assessment of LLM agent deployment paradigms.Information Sciences740 (2026), 123231. doi:10.1016/J.INS.2026.123231

work page doi:10.1016/j.ins.2026.123231 2026
[51]

Eric Wong

Tongcheng Geng, Yubin Qu, and W. Eric Wong. 2026. A white-box prompt injection attack on embodied AI agents driven by large language models.Journal of Systems and Software235 (2026), 112782. doi:10.1016/J.JSS.2026.112782

work page doi:10.1016/j.jss.2026.112782 2026
[52]

GitHub. 2026. openclaw Arbitrary Malicious Code Execution Vulnerability. Retrieved May 20, 2026 from https: //github.com/openclaw/openclaw/security/advisories/GHSA-m3mh-3mpg-37hw Security advisory

2026
[53]

Guangyu Gong and Zizhuang Deng. 2026. PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification. arXiv:2604.10134 [cs.CR] doi:10.48550/arXiv.2604.10134 ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: January 2026. 1:32 Yuchen Ling, Shengcheng Yu, Zhenyu Chen, and Chunrong Fang

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.10134 2026
[54]

Google DeepMind. 2025. Gemini 2.5 Computer Use. Retrieved May 20, 2026 from https://storage.googleapis.com/ deepmind-media/Model-Cards/Gemini-2-5-Computer-Use-Model-Card.pdf Model card

2025
[55]

Google DeepMind. 2026. Gemini Robotics-ER 1.6. Retrieved May 20, 2026 from https://deepmind.google/models/ model-cards/gemini-robotics-er-1-6/ Model card

2026
[56]

Feng He, Tianqing Zhu, Dayong Ye, Bo Liu, Wanlei Zhou, and Philip S. Yu. 2026. The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies.Comput. Surveys58, 6 (2026), 1–36. doi:10.1145/3773080

work page doi:10.1145/3773080 2026
[57]

Ping He, Changjiang Li, Binbin Zhao, Tianyu Du, and Shouling Ji. 2025. Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools. arXiv:2509.21011 [cs.CR] doi:10.48550/arXiv.2509.21011

work page doi:10.48550/arxiv.2509.21011 2025
[58]

Pengfei He, Yuping Lin, Shen Dong, Han Xu, Yue Xing, and Hui Liu. 2025. Red-Teaming LLM Multi-Agent Systems via Communication Attacks. InFindings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, 6726–6747. doi:10.18653/v1/2025.findings-acl.349

work page doi:10.18653/v1/2025.findings-acl.349 2025
[59]

Yifeng He, Ethan Wang, Yuyang Rong, Zifei Cheng, and Hao Chen. 2025. Security of AI Agents. In2025 IEEE/ACM International Workshop on Responsible AI Engineering (RAIE). IEEE, 45–52. doi:10.1109/RAIE66699.2025.00013

work page doi:10.1109/raie66699.2025.00013 2025
[60]

Yu He, Haozhe Zhu, Yiming Li, Shuo Shao, Hongwei Yao, Zhihao Liu, and Zhan Qin. 2026. AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations. arXiv:2603.10749 [cs.CR] doi:10.48550/arXiv.2603.10749

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.10749 2026
[61]

Manuel Herrador and Johann Rehberger. 2026. SpAIware: Uncovering a novel artificial intelligence attack vector through persistent memory in LLM applications and agents.Future Generation Computer Systems174 (2026), 107994. doi:10.1016/J.FUTURE.2025.107994

work page doi:10.1016/j.future.2025.107994 2026
[62]

Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, and Emre Kiciman. 2024. Defending Against Indirect Prompt Injection Attacks With Spotlighting. InProceedings of the Conference on Applied Machine Learning in Information Security. CEUR-WS.org, 48–62

2024
[63]

Omer Hofman, Jonathan Brokman, Oren Rachmil, Shamik Bose, Vikas Pahuja, Toshiya Shimizu, Trisha Starostina, Kelly Marchisio, Seraphina Goldfarb-Tarrant, and Roman Vainshtein. 2026. MAPS: A Multilingual Benchmark for Agent Performance and Security. InFindings of the Association for Computational Linguistics: EACL 2026. Association for Computational Linguis...

work page doi:10.18653/v1/2026.findings-eacl.42 2026
[64]

Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations

Shahin Honarvar, Amber Gorzynski, James Lee-Jones, Harry Coppock, Marek Rei, Joseph Ryan, and Alastair F. Don- aldson. 2026. Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations. arXiv:2602.05523 [cs.SE] doi:10.48550/arXiv.2602.05523

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.05523 2026
[65]

Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. 2025. Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions. arXiv:2503.23278 [cs.CR] doi:10.48550/arXiv.2503.23278

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.23278 2025
[66]

Yuepeng Hu, Yuqi Jia, Mengyuan Li, Dawn Song, and Neil Gong. 2026. MalTool: Malicious Tool Attacks on LLM Agents. arXiv:2602.12194 [cs.CR] doi:10.48550/arXiv.2602.12194

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.12194 2026
[67]

Charoes Huang, Xin Huang, and Amin Milani Fard. 2026. Are AI-assisted Development Tools Immune to Prompt Injection? arXiv:2603.21642 [cs.CR] doi:10.48550/arXiv.2603.21642

work page doi:10.48550/arxiv.2603.21642 2026
[68]

Kung-Hsiang Huang, Akshara Prabhakar, Onkar Thorat, Divyansh Agarwal, Prafulla Kumar Choubey, Yixin Mao, Silvio Savarese, Caiming Xiong, and Chien-Sheng Wu. 2026. CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions.Transactions on Machine Learning Research2026 (2026)

2026
[69]

Umar Iqbal, Tadayoshi Kohno, and Franziska Roesner. 2024. LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI’s ChatGPT Plugins.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society7 (2024), 611–623. doi:10.1609/AIES.V7I1.31664

work page doi:10.1609/aies.v7i1.31664 2024
[70]

Jafar Isbarov and Murat Kantarcioglu. 2026. Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks. arXiv:2602.05066 [cs.CR] doi:10.48550/arXiv.2602.05066

work page doi:10.48550/arxiv.2602.05066 2026
[71]

Dennis Jacob, Emad Alghamdi, Zhanhao Hu, Basel Alomair, and David A. Wagner. 2025. Preventing Prompt Injection with Type-Directed Privilege Separation. arXiv:2509.25926 [cs.CR] doi:10.48550/arXiv.2509.25926

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509.25926 2025
[72]

Jha, Harold Triedman, Justin Wagle, and Vitaly Shmatikov

Rishi D. Jha, Harold Triedman, Justin Wagle, and Vitaly Shmatikov. 2025. Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems. arXiv:2510.17276 [cs.LG] doi:10.48550/arXiv.2510.17276

work page doi:10.48550/arxiv.2510.17276 2025
[73]

Zimo Ji, Daoyuan Wu, Wenyuan Jiang, Pingchuan Ma, Zongjie Li, Yudong Gao, Shuai Wang, and Yingjiu Li
[74]

arXiv:2601.11893 [cs.CR] doi:10.48550/arXiv.2601.11893

Taming Various Privilege Escalation in LLM-Based Agent Systems: A Mandatory Access Control Frame- work. arXiv:2601.11893 [cs.CR] doi:10.48550/arXiv.2601.11893

work page doi:10.48550/arxiv.2601.11893
[75]

Feiran Jia, Tong Wu, Xin Qin, and Anna Cinzia Squicciarini. 2025. The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 29680–29697. doi:10.18653/v1/202...

work page doi:10.18653/v1/2025.acl-long.1435 2025
[76]

Xiaojun Jia, Jie Liao, Simeng Qin, Jindong Gu, Wenqi Ren, Xiaochun Cao, Yang Liu, and Philip Torr. 2026. SkillJect: Effectively Automating Skill-Based Prompt Injection for Skill-Enabled Agents. arXiv:2602.14211 [cs.CR] doi:10.48550/ arXiv.2602.14211 ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: January 2026. Toward Secure L...

Pith/arXiv arXiv 2026
[77]

Tanqiu Jiang, Yuhui Wang, Jiacheng Liang, and Ting Wang. 2026. AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks. arXiv:2602.16901 [cs.AI] doi:10.48550/arXiv.2602.16901

work page doi:10.48550/arxiv.2602.16901 2026
[78]

Xiaochong Jiang, Shiqi Yang, Wenting Yang, Yichen Liu, and Cheng Ji. 2026. SOK: A Taxonomy of Attack Vectors and Defense Strategies for Agentic Supply Chain Runtime. arXiv:2602.19555 [cs.CR] doi:10.48550/arXiv.2602.19555

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.19555 2026
[79]

Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang, and Guangsheng Yu. 2026. SoK: Agentic Skills - Beyond Tool Use in LLM Agents. arXiv:2602.20867 [cs.CR] doi:10.48550/arXiv.2602.20867

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.20867 2026
[80]

Ruochen Jiao, Shaoyuan Xie, Justin Yue, Takami Sato, Lixu Wang, Yixuan Wang, Qi Alfred Chen, and Qi Zhu. 2025. Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-Based Decision-Making Systems. InInternational Conference on Learning Representations (ICLR). OpenReview.net

2025
[82]

Daniel Jones, Giorgio Severi, Martin Pouliot, Gary Lopez, Joris de Gruyter, Santiago Zanella-Béguelin, Justin Song, Blake Bullwinkel, Pamela Cortez, and Amanda J. Minnich. 2025. A Systematization of Security Vulnerabilities in Computer Use Agents. arXiv:2507.05445 [cs.CR] doi:10.48550/arXiv.2507.05445

work page doi:10.48550/arxiv.2507.05445 2025

Showing first 80 references.

[1] [1]

Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, João Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi, Byeonghyeon Kim, Hyeonjin Lee, Benjamin Pannell, Conor McCauley, Mark Russinovich, And...

work page doi:10.48550/arxiv.2506.09956 2025

[2] [2]

Sahar Abdelnabi, Amr Gomaa, Eugene Bagdasarian, Per Ola Kristensson, and Reza Shokri. 2025. Firewalls to Secure Dynamic LLM Agentic Networks. arXiv:2502.01822 [cs.CR] doi:10.48550/arXiv.2502.01822

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.01822 2025

[3] [3]

Alsharif Abuadbba, Nazatul Sultan, Surya Nepal, and Sanjay Jha. 2026. Human Society-Inspired Approaches to Agentic AI Security: The 4C Framework. arXiv:2602.01942 [cs.CR] doi:10.48550/arXiv.2602.01942

work page doi:10.48550/arxiv.2602.01942 2026

[4] [4]

Tanzim Ahad, Ismail Hossain, Md Jahangir Alam, Sai Puppala, Yoonpyo Lee, Syed Bahauddin Alam, and Sajedul Talukder. 2026. Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines. arXiv:2604.08608 [cs.CR] doi:10.48550/arXiv.2604.08608

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.08608 2026

[5] [5]

Hengyu An, Minxi Li, Jinghuai Zhang, Naen Xu, Chunyi Zhou, Changjiang Li, Xiaogang Xu, Tianyu Du, and Shouling Ji. 2026. ACIArena: Toward Unified Evaluation for Agent Cascading Injection. arXiv:2604.07775 [cs.AI] doi:10.48550/arXiv.2604.07775

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.07775 2026

[6] [6]

Hengyu An, Jinghuai Zhang, Tianyu Du, Chunyi Zhou, Qingming Li, Tao Lin, and Shouling Ji. 2025. IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1023–1039. doi:10.18653/V...

work page doi:10.18653/v1/2025.emnlp-main.53 2025

[7] [7]

Zico Kolter, Matt Fredrikson, Yarin Gal, and Xander Davies

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, J. Zico Kolter, Matt Fredrikson, Yarin Gal, and Xander Davies. 2025. AgentHarm: A Bench- mark for Measuring Harmfulness of LLM Agents. InInternational Conference on Learning Representations (ICLR). OpenReview.net

2025

[8] [8]

Anthropic. 2024. Developing a computer use model. Retrieved May 20, 2026 from https://www.anthropic.com/news/ developing-computer-use Research report

2024

[9] [9]

Anthropic. 2025. Claude Sonnet 4 and Opus 4 System Card. Retrieved May 20, 2026 from https://www-cdn.anthropic. com/07b2a3f9902ee19fe39a36ca638e5ae987bc64dd.pdf System card

2025

[10] [10]

Nirmit Arora, Sathvik Joel, Ishan Kavathekar, Palak, Rohan Gandhi, Yash Pandya, Tanuja Ganu, Aditya Kanade, and Akshay Nambi. 2025. Exposing Weak Links in Multi-Agent Systems under Adversarial Prompting. arXiv:2511.10949 [cs.MA] doi:10.48550/arXiv.2511.10949

work page doi:10.48550/arxiv.2511.10949 2025

[11] [11]

Sadia Asif and Mohammad Mohammadi Amiri. 2026. Information-Theoretic Privacy Control for Sequential Multi- Agent LLM Systems. arXiv:2603.05520 [cs.MA] doi:10.48550/arXiv.2603.05520 ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: January 2026. 1:30 Yuchen Ling, Shengcheng Yu, Zhenyu Chen, and Chunrong Fang

work page doi:10.48550/arxiv.2603.05520 2026

[12] [12]

Eugene Bagdasarian, Ren Yi, Sahra Ghalebikesabi, Peter Kairouz, Marco Gruteser, Sewoong Oh, Borja Balle, and Daniel Ramage. 2024. AirGapAgent: Protecting Privacy-Conscious Conversational Agents. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. ACM, 3868–3882. doi:10.1145/3658644.3690350

work page doi:10.1145/3658644.3690350 2024

[13] [13]

Sunil Kumar Jang Bahadur and Gopala Dhar. 2025. Securing Generative AI Agentic Workflows: Risks, Mitigation, and a Proposed Firewall Architecture. arXiv:2506.17266 [cs.CR] doi:10.48550/arXiv.2506.17266

work page doi:10.48550/arxiv.2506.17266 2025

[14] [14]

Julia Bazinska, Max Mathys, Francesco Casucci, Mateo Rojas-Carulla, Xander Davies, Alexandra Souly, and Niklas Pfis- ter. 2025. Breaking Agent Backbones: Evaluating the Security of Backbone LLMs in AI Agents. arXiv:2510.22620 [cs.CR] doi:10.48550/arXiv.2510.22620

work page doi:10.48550/arxiv.2510.22620 2025

[15] [15]

Roy Betser, Shamik Bose, Amit Giloni, Chiara Picardi, Sindhu Padakandla, and Roman Vainshtein. 2026. AgenTRIM: Tool Risk Mitigation for Agentic AI. arXiv:2601.12449 [cs.CR] doi:10.48550/arXiv.2601.12449

work page doi:10.48550/arxiv.2601.12449 2026

[16] [16]

Luca Beurer-Kellner, Beat Buesser, Ana-Maria Cretu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, and Václav Volhejn. 2025. Design Patterns for Securing LLM Agents against Prompt Injections. arXiv:2506.08837 [cs.LG] doi:10.48550/arXiv.2506.08837

work page doi:10.48550/arxiv.2506.08837 2025

[17] [17]

Rohini Bhosale, Pankaj Chandre, Sushma Mehetre, Swati Powar, Shubhra Mathur, and Arun Ghandat. 2026. The Dark Side of Autonomous Intelligence: a survey on data leakage and privacy failures in agentic AI.Frontiers in Computer Science8 (2026), 1802727. doi:10.3389/fcomp.2026.1802727

work page doi:10.3389/fcomp.2026.1802727 2026

[18] [18]

Léo Boisvert, Mihir Bansal, Chandra Kiran Reddy Evuru, Gabriel Huang, Abhay Puri, Avinandan Bose, Maryam Fazel, Quentin Cappart, Jason Stanley, Alexandre Lacoste, Alexandre Drouin, and Krishnamurthy Dvijotham. 2025. DoomArena: A framework for Testing AI Agents Against Evolving Security Threats. arXiv:2504.14064 [cs.CR] doi:10.48550/arXiv.2504.14064

work page doi:10.48550/arxiv.2504.14064 2025

[19] [19]

Christoph Bühler, Matteo Biagiola, Luca Di Grazia, and Guido Salvaneschi. 2025. AgentBound: Securing Execution Boundaries of AI Agents. arXiv:2510.21236 [cs.CR] doi:10.48550/arXiv.2510.21236

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.21236 2025

[20] [20]

Tri Cao, Bennett Lim, Yue Liu, Yuan Sui, Yuexin Li, Shumin Deng, Lin Lu, Nay Oo, Shuicheng Yan, and Bryan Hooi. 2025. VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents. arXiv:2506.02456 [cs.AI] doi:10.48550/arXiv.2506.02456

work page doi:10.48550/arxiv.2506.02456 2025

[21] [21]

Aceves-Fernández, Luis R

Victor Castro-Maldonado, Marco A. Aceves-Fernández, Luis R. García-Noguez, and Jesús C. Pedraza-Ortega. 2026. Semantic Firewalls with Online Ensemble Learning for Secure Agentic RAG Systems in Financial Chatbots.AI7, 3 (2026), 80. doi:10.3390/ai7030080

work page doi:10.3390/ai7030080 2026

[22] [22]

Hwan Chang, Yonghyun Jun, and Hwanhee Lee. 2025. ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents. arXiv:2509.22830 [cs.CL] doi:10.48550/arXiv.2509.22830

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509.22830 2025

[23] [23]

Fengchao Chen, Tingmin Wu, Van Nguyen, and Carsten Rudolph. 2026. Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents. arXiv:2601.10758 [cs.CR] doi:10.48550/arXiv.2601.10758

work page doi:10.48550/arxiv.2601.10758 2026

[24] [24]

Jizhou Chen and Samuel Lee Cong. 2025. AgentGuard: Repurposing Agentic Orchestrator for Safety Evaluation of Tool Orchestration. arXiv:2502.09809 [cs.CR] doi:10.48550/arXiv.2502.09809

work page doi:10.48550/arxiv.2502.09809 2025

[25] [25]

Ruolin Chen, Yinqian Sun, Jihang Wang, Mingyang Lv, Qian Zhang, and Yi Zeng. 2025. SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents. arXiv:2509.25885 [cs.AI] doi:10.48550/arXiv.2509.25885

work page doi:10.48550/arxiv.2509.25885 2025

[26] [26]

Yurun Chen, Xueyu Hu, Keting Yin, Juncheng Li, and Shengyu Zhang. 2025. Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection Attacks. arXiv:2502.13053 [cs.CL] doi:10.48550/arXiv.2502.13053

work page doi:10.48550/arxiv.2502.13053 2025

[27] [27]

Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. 2024. AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases. InAdvances in Neural Information Processing Systems. doi:10.52202/079017- 4136

work page doi:10.52202/079017- 2024

[28] [28]

Pengzhou Cheng, Zheng Wu, Zongru Wu, Tianjie Ju, Aston Zhang, Zhuosheng Zhang, and Gongshen Liu. 2025. OS-Kairos: Adaptive Interaction for MLLM-Powered GUI Agents. InFindings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, 6701–6725. doi:10.18653/v1/2025.findings-acl.348

work page doi:10.18653/v1/2025.findings-acl.348 2025

[29] [29]

Sahana Chennabasappa, Cyrus Nikolaidis, Daniel Song, David Molnar, Stephanie Ding, Shengye Wan, Spencer Whitman, Lauren Deason, Nicholas Doucette, Abraham Montilla, Alekhya Gampa, Beto de Paola, Dominik Gabi, James Crnkovich, Jean-Christophe Testud, Kat He, Rashnil Chaturvedi, Wu Zhou, and Joshua Saxe. 2025. LlamaFirewall: An open source guardrail system ...

work page doi:10.48550/arxiv.2505.03574 2025

[30] [30]

Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, and Santiago Zanella-Béguelin. 2025. Securing AI Agents with Information-Flow Control. arXiv:2505.23643 [cs.CR] doi:10.48550/arXiv.2505.23643

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.23643 2025

[31] [31]

Pedro M. P. Curvo. 2025. The Traitors: Deception and Trust in Multi-Agent Language Model Simulations. arXiv:2505.12923 [cs.AI] doi:10.48550/arXiv.2505.12923

work page doi:10.48550/arxiv.2505.12923 2025

[32] [32]

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, Andreas Terzis, and Florian Tramèr. 2025. Defeating Prompt Injections by Design. arXiv:2503.18813 [cs.CR] doi:10.48550/arXiv.2503.18813 ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: January 2026. Tow...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.18813 2025

[33] [33]

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. 2024. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents. InAdvances in Neural Information Processing Systems. doi:10.52202/079017-2636

work page doi:10.52202/079017-2636 2024

[34] [34]

Xinhao Deng, Yixiang Zhang, Jiaqing Wu, Jiaqi Bai, Sibo Yi, Zhuoheng Zou, Yue Xiao, Rennai Qiu, Jianan Ma, Jialuo Chen, Xiaohu Du, Xiaofang Yang, Shiwen Cui, Changhua Meng, Weiqiang Wang, Jiaxing Song, Ke Xu, and Qi Li. 2026. Taming OpenClaw: Security Analysis and Mitigation of Autonomous LLM Agent Threats. arXiv:2603.11619 [cs.CR] doi:10.48550/arXiv.2603.11619

work page doi:10.48550/arxiv.2603.11619 2026

[35] [35]

Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, and Yang Xiang. 2025. AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways.Comput. Surveys57, 7 (2025), 1–36. doi:10.1145/3716628

work page doi:10.1145/3716628 2025

[36] [36]

Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. 2025. Memory Injection Attacks on LLM Agents via Query-Only Interaction. InAdvances in Neural Information Processing Systems

2025

[37] [37]

Zenghao Duan, Yuxin Tian, Zhiyi Yin, Liang Pang, Jingcheng Deng, Zihao Wei, Shicheng Xu, Yuyao Ge, and Xueqi Cheng. 2026. SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement. arXiv:2604.04989 [cs.CR] doi:10.48550/arXiv.2604.04989

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.04989 2026

[38] [38]

Mateusz Dziemian, Maxwell Lin, Xiaohan Fu, Micha Nowak, Nick Winter, Eliot Krzysztof Jones, Andy Zou, Lama Ahmad, Kamalika Chaudhuri, Sahana Chennabasappa, Xander Davies, Lauren Deason, Benjamin L. Edelman, Tanner Emek, Ivan Evtimov, Jim Gust, Maia Hamin, Kat He, Klaudia Krawiecka, Riccardo Patana, Neil Perry, Troy Peterson, Xiangyu Qi, Javier Rando, Zifa...

work page doi:10.48550/arxiv.2603.15714 2026

[39] [40]

Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang. 2024. LLM Agents can Autonomously Exploit One-day Vulnerabilities. arXiv:2404.08144 [cs.CR] doi:10.48550/arXiv.2404.08144

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.08144 2024

[40] [41]

Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, and Daniel Kang. 2024. LLM Agents can Autonomously Hack Websites. arXiv:2402.06664 [cs.CR] doi:10.48550/arXiv.2402.06664

work page doi:10.48550/arxiv.2402.06664 2024

[41] [42]

Yunhao Feng, Yifan Ding, Yingshui Tan, Boren Zheng, Yanming Guo, Xiaolong Li, Kun Zhai, Yishan Li, and Wenke Huang. 2026. SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems. arXiv:2604.06811 [cs.CR] doi:10.48550/ arXiv.2604.06811

Pith/arXiv arXiv 2026

[42] [43]

Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, Abderrahmane Lakas, and Merouane Debbah. 2026. From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows.ICT Express12, 2 (2026), 353–383. doi:10.1016/J.ICTE.2025.12.001

work page doi:10.1016/j.icte.2025.12.001 2026

[43] [44]

Hanna Foerster, Robert Mullins, Tom Blanchard, Nicolas Papernot, Kristina Nikolic, Florian Tramèr, Ilia Shumailov, Cheng Zhang, and Yiren Zhao. 2026. CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents. arXiv:2601.09923 [cs.AI] doi:10.48550/arXiv.2601.09923

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.09923 2026

[44] [45]

Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes

Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K. Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes

[45] [46]

Rethinking VLMs and LLMs for Image Classification.arXiv e-prints, art

Imprompter: Tricking LLM Agents into Improper Tool Use. arXiv:2410.14923 [cs.CR] doi:10.48550/arXiv.2410. 14923

work page doi:10.48550/arxiv.2410

[46] [47]

Yuchuan Fu, Xiaohan Yuan, and Dongxia Wang. 2025. RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments. arXiv:2506.15253 [cs.CR] doi:10.48550/arXiv.2506.15253

work page doi:10.48550/arxiv.2506.15253 2025

[47] [48]

Yuyou Gan, Yong Yang, Zhe Ma, Ping He, Rui Zeng, Yiming Wang, Qingming Li, Chunyi Zhou, Songze Li, Ting Wang, Yunjun Gao, Yingcai Wu, and Shouling Ji. 2024. Navigating the Risks: A Survey of Security, Privacy, and Ethics Threats in LLM-Based Agents. arXiv:2411.09523 [cs.AI] doi:10.48550/arXiv.2411.09523

work page doi:10.48550/arxiv.2411.09523 2024

[48] [49]

Yiang Gao and Shanshan Wu. 2025. A Four-Layer Security Governance Framework for LLM-Based AI Agents.Journal of Artificial Intelligence Practice8, 4 (2025), 49–55. doi:10.23977/jaip.2025.080406

work page doi:10.23977/jaip.2025.080406 2025

[49] [50]

Tarek Gasmi, Ramzi Guesmi, Jihene Bennaceur, and Ines Belhadj. 2026. Bridging AI and software security: A comparative vulnerability assessment of LLM agent deployment paradigms.Information Sciences740 (2026), 123231. doi:10.1016/J.INS.2026.123231

work page doi:10.1016/j.ins.2026.123231 2026

[50] [51]

Eric Wong

Tongcheng Geng, Yubin Qu, and W. Eric Wong. 2026. A white-box prompt injection attack on embodied AI agents driven by large language models.Journal of Systems and Software235 (2026), 112782. doi:10.1016/J.JSS.2026.112782

work page doi:10.1016/j.jss.2026.112782 2026

[51] [52]

GitHub. 2026. openclaw Arbitrary Malicious Code Execution Vulnerability. Retrieved May 20, 2026 from https: //github.com/openclaw/openclaw/security/advisories/GHSA-m3mh-3mpg-37hw Security advisory

2026

[52] [53]

Guangyu Gong and Zizhuang Deng. 2026. PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification. arXiv:2604.10134 [cs.CR] doi:10.48550/arXiv.2604.10134 ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: January 2026. 1:32 Yuchen Ling, Shengcheng Yu, Zhenyu Chen, and Chunrong Fang

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.10134 2026

[53] [54]

Google DeepMind. 2025. Gemini 2.5 Computer Use. Retrieved May 20, 2026 from https://storage.googleapis.com/ deepmind-media/Model-Cards/Gemini-2-5-Computer-Use-Model-Card.pdf Model card

2025

[54] [55]

Google DeepMind. 2026. Gemini Robotics-ER 1.6. Retrieved May 20, 2026 from https://deepmind.google/models/ model-cards/gemini-robotics-er-1-6/ Model card

2026

[55] [56]

Feng He, Tianqing Zhu, Dayong Ye, Bo Liu, Wanlei Zhou, and Philip S. Yu. 2026. The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies.Comput. Surveys58, 6 (2026), 1–36. doi:10.1145/3773080

work page doi:10.1145/3773080 2026

[56] [57]

Ping He, Changjiang Li, Binbin Zhao, Tianyu Du, and Shouling Ji. 2025. Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools. arXiv:2509.21011 [cs.CR] doi:10.48550/arXiv.2509.21011

work page doi:10.48550/arxiv.2509.21011 2025

[57] [58]

Pengfei He, Yuping Lin, Shen Dong, Han Xu, Yue Xing, and Hui Liu. 2025. Red-Teaming LLM Multi-Agent Systems via Communication Attacks. InFindings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, 6726–6747. doi:10.18653/v1/2025.findings-acl.349

work page doi:10.18653/v1/2025.findings-acl.349 2025

[58] [59]

Yifeng He, Ethan Wang, Yuyang Rong, Zifei Cheng, and Hao Chen. 2025. Security of AI Agents. In2025 IEEE/ACM International Workshop on Responsible AI Engineering (RAIE). IEEE, 45–52. doi:10.1109/RAIE66699.2025.00013

work page doi:10.1109/raie66699.2025.00013 2025

[59] [60]

Yu He, Haozhe Zhu, Yiming Li, Shuo Shao, Hongwei Yao, Zhihao Liu, and Zhan Qin. 2026. AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations. arXiv:2603.10749 [cs.CR] doi:10.48550/arXiv.2603.10749

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.10749 2026

[60] [61]

Manuel Herrador and Johann Rehberger. 2026. SpAIware: Uncovering a novel artificial intelligence attack vector through persistent memory in LLM applications and agents.Future Generation Computer Systems174 (2026), 107994. doi:10.1016/J.FUTURE.2025.107994

work page doi:10.1016/j.future.2025.107994 2026

[61] [62]

Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan Zunger, and Emre Kiciman. 2024. Defending Against Indirect Prompt Injection Attacks With Spotlighting. InProceedings of the Conference on Applied Machine Learning in Information Security. CEUR-WS.org, 48–62

2024

[62] [63]

Omer Hofman, Jonathan Brokman, Oren Rachmil, Shamik Bose, Vikas Pahuja, Toshiya Shimizu, Trisha Starostina, Kelly Marchisio, Seraphina Goldfarb-Tarrant, and Roman Vainshtein. 2026. MAPS: A Multilingual Benchmark for Agent Performance and Security. InFindings of the Association for Computational Linguistics: EACL 2026. Association for Computational Linguis...

work page doi:10.18653/v1/2026.findings-eacl.42 2026

[63] [64]

Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations

Shahin Honarvar, Amber Gorzynski, James Lee-Jones, Harry Coppock, Marek Rei, Joseph Ryan, and Alastair F. Don- aldson. 2026. Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations. arXiv:2602.05523 [cs.SE] doi:10.48550/arXiv.2602.05523

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.05523 2026

[64] [65]

Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. 2025. Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions. arXiv:2503.23278 [cs.CR] doi:10.48550/arXiv.2503.23278

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2503.23278 2025

[65] [66]

Yuepeng Hu, Yuqi Jia, Mengyuan Li, Dawn Song, and Neil Gong. 2026. MalTool: Malicious Tool Attacks on LLM Agents. arXiv:2602.12194 [cs.CR] doi:10.48550/arXiv.2602.12194

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.12194 2026

[66] [67]

Charoes Huang, Xin Huang, and Amin Milani Fard. 2026. Are AI-assisted Development Tools Immune to Prompt Injection? arXiv:2603.21642 [cs.CR] doi:10.48550/arXiv.2603.21642

work page doi:10.48550/arxiv.2603.21642 2026

[67] [68]

Kung-Hsiang Huang, Akshara Prabhakar, Onkar Thorat, Divyansh Agarwal, Prafulla Kumar Choubey, Yixin Mao, Silvio Savarese, Caiming Xiong, and Chien-Sheng Wu. 2026. CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions.Transactions on Machine Learning Research2026 (2026)

2026

[68] [69]

Umar Iqbal, Tadayoshi Kohno, and Franziska Roesner. 2024. LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI’s ChatGPT Plugins.Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society7 (2024), 611–623. doi:10.1609/AIES.V7I1.31664

work page doi:10.1609/aies.v7i1.31664 2024

[69] [70]

Jafar Isbarov and Murat Kantarcioglu. 2026. Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks. arXiv:2602.05066 [cs.CR] doi:10.48550/arXiv.2602.05066

work page doi:10.48550/arxiv.2602.05066 2026

[70] [71]

Dennis Jacob, Emad Alghamdi, Zhanhao Hu, Basel Alomair, and David A. Wagner. 2025. Preventing Prompt Injection with Type-Directed Privilege Separation. arXiv:2509.25926 [cs.CR] doi:10.48550/arXiv.2509.25926

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509.25926 2025

[71] [72]

Jha, Harold Triedman, Justin Wagle, and Vitaly Shmatikov

Rishi D. Jha, Harold Triedman, Justin Wagle, and Vitaly Shmatikov. 2025. Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems. arXiv:2510.17276 [cs.LG] doi:10.48550/arXiv.2510.17276

work page doi:10.48550/arxiv.2510.17276 2025

[72] [73]

Zimo Ji, Daoyuan Wu, Wenyuan Jiang, Pingchuan Ma, Zongjie Li, Yudong Gao, Shuai Wang, and Yingjiu Li

[73] [74]

arXiv:2601.11893 [cs.CR] doi:10.48550/arXiv.2601.11893

Taming Various Privilege Escalation in LLM-Based Agent Systems: A Mandatory Access Control Frame- work. arXiv:2601.11893 [cs.CR] doi:10.48550/arXiv.2601.11893

work page doi:10.48550/arxiv.2601.11893

[74] [75]

Feiran Jia, Tong Wu, Xin Qin, and Anna Cinzia Squicciarini. 2025. The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 29680–29697. doi:10.18653/v1/202...

work page doi:10.18653/v1/2025.acl-long.1435 2025

[75] [76]

Xiaojun Jia, Jie Liao, Simeng Qin, Jindong Gu, Wenqi Ren, Xiaochun Cao, Yang Liu, and Philip Torr. 2026. SkillJect: Effectively Automating Skill-Based Prompt Injection for Skill-Enabled Agents. arXiv:2602.14211 [cs.CR] doi:10.48550/ arXiv.2602.14211 ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: January 2026. Toward Secure L...

Pith/arXiv arXiv 2026

[76] [77]

Tanqiu Jiang, Yuhui Wang, Jiacheng Liang, and Ting Wang. 2026. AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks. arXiv:2602.16901 [cs.AI] doi:10.48550/arXiv.2602.16901

work page doi:10.48550/arxiv.2602.16901 2026

[77] [78]

Xiaochong Jiang, Shiqi Yang, Wenting Yang, Yichen Liu, and Cheng Ji. 2026. SOK: A Taxonomy of Attack Vectors and Defense Strategies for Agentic Supply Chain Runtime. arXiv:2602.19555 [cs.CR] doi:10.48550/arXiv.2602.19555

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.19555 2026

[78] [79]

Yanna Jiang, Delong Li, Haiyu Deng, Baihe Ma, Xu Wang, Qin Wang, and Guangsheng Yu. 2026. SoK: Agentic Skills - Beyond Tool Use in LLM Agents. arXiv:2602.20867 [cs.CR] doi:10.48550/arXiv.2602.20867

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.20867 2026

[79] [80]

Ruochen Jiao, Shaoyuan Xie, Justin Yue, Takami Sato, Lixu Wang, Yixuan Wang, Qi Alfred Chen, and Qi Zhu. 2025. Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-Based Decision-Making Systems. InInternational Conference on Learning Representations (ICLR). OpenReview.net

2025

[80] [82]

Daniel Jones, Giorgio Severi, Martin Pouliot, Gary Lopez, Joris de Gruyter, Santiago Zanella-Béguelin, Justin Song, Blake Bullwinkel, Pamela Cortez, and Amanda J. Minnich. 2025. A Systematization of Security Vulnerabilities in Computer Use Agents. arXiv:2507.05445 [cs.CR] doi:10.48550/arXiv.2507.05445

work page doi:10.48550/arxiv.2507.05445 2025