pith. sign in

arxiv: 2510.23883 · v3 · submitted 2025-10-27 · 💻 cs.AI

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

Pith reviewed 2026-05-18 03:39 UTC · model grok-4.3

classification 💻 cs.AI
keywords agentic AIAI securityLLM agentsthreat taxonomyAI defensesautonomous agentssecurity evaluationopen challenges
0
0 comments X

The pith

Agentic AI systems introduce security risks that are distinct from both traditional AI safety and conventional software security.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey examines security issues in agentic AI, which are LLM-powered systems capable of planning, using tools, maintaining memory, and acting autonomously across web, software, and physical settings. It establishes that these features lead to new and amplified risks not covered by existing AI safety or software security frameworks. The paper delivers a taxonomy of agent-specific threats, reviews evaluation benchmarks and methods, and covers defense strategies involving both technical measures and governance. It synthesizes the research landscape and identifies open challenges to help build agent systems that are secure by design.

Core claim

Agentic AI systems powered by large language models and endowed with planning, tool use, memory, and autonomy create new and amplified security risks, distinct from both traditional AI safety and conventional software security. This is supported by outlining a taxonomy of threats specific to agentic AI, reviewing recent benchmarks and evaluation methodologies, and discussing defense strategies from both technical and governance perspectives, while highlighting open challenges.

What carries the argument

The taxonomy of threats specific to agentic AI, which classifies risks arising from autonomous task execution in diverse environments.

If this is right

  • Developers can use the taxonomy to proactively identify risks in agent designs.
  • Benchmarks and evaluations can be standardized to measure agentic security posture.
  • Defenses can combine technical fixes with policy and governance approaches.
  • Future agent systems can be engineered with security considerations integrated from the beginning.
  • Research efforts can focus on the identified open challenges to advance the field.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Applying this taxonomy to real-world agent deployments could reveal additional threat categories not yet documented.
  • Connections to physical world risks suggest that agentic AI in robotics or IoT might require hybrid security models.
  • Open challenges may lead to new research on multi-agent interactions and their security implications.
  • Secure-by-design principles could extend to regulatory standards for AI agents.

Load-bearing premise

That the existing body of research on agentic AI is mature enough to form a stable taxonomy of threats and identify open challenges without significant gaps.

What would settle it

Discovery of a major new security vulnerability in deployed agentic AI systems that falls outside the proposed taxonomy or is not addressed by current evaluation methods.

Figures

Figures reproduced from arXiv: 2510.23883 by Anshuman Chhabra, Prasant Mohapatra, Shahriar Kabir Nahin, Shrestha Datta.

Figure 1
Figure 1. Figure 1: Taxonomy of Agentic AI Security Threats. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Examples showcasing (a) direct and (b) indirect prompt injection. In the former, the agent is directly instructed by the adversary to reveal confidential information whereas in the latter the attacker has exploited the agent’s reliance on external information sources to have it download malware from their altered website. attacker; while for IPI attacks, the owner or supplier of agent-processed third-party… view at source ↗
Figure 3
Figure 3. Figure 3: An example showcasing unintentional prompt injection. Even when a malicious adversary is not present, unintended dangers can jeopardize LLM agents’ safety. For example, unclear or badly worded user inquiries may unintentionally overrule system directives or result in dangerous actions. Moreover, contextual drift within lengthy chat histories can change the agent’s behavior without the need for explicit ove… view at source ↗
Figure 4
Figure 4. Figure 4: Visualizing different prompt injection attacks based on modality: (a) [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualizing different protocol-level attacks for multi-agent systems: (a) [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Some Defenses Against Prompt Injection Attacks. [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Policy Filtering and Enforcement Defense Strategies. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: An Example of Sandboxing as a Defense Strategy. [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
read the original abstract

Agentic AI systems powered by large language models (LLMs) and endowed with planning, tool use, memory, and autonomy, are emerging as powerful, flexible platforms for automation. Their ability to autonomously execute tasks across web, software, and physical environments creates new and amplified security risks, distinct from both traditional AI safety and conventional software security. This survey outlines a taxonomy of threats specific to agentic AI, reviews recent benchmarks and evaluation methodologies, and discusses defense strategies from both technical and governance perspectives. We synthesize current research and highlight open challenges, aiming to support the development of secure-by-design agent systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript is a survey on security for agentic AI systems, which combine LLMs with planning, tool use, memory, and autonomy. It claims these systems create new and amplified security risks distinct from traditional AI safety and conventional software security. The paper outlines a taxonomy of such threats, reviews benchmarks and evaluation methodologies, discusses technical and governance defense strategies, synthesizes current research, and identifies open challenges to support secure-by-design agent development.

Significance. If the taxonomy is comprehensive and the literature synthesis accurate, the work would offer a useful organizing framework for an emerging subfield, helping researchers and practitioners distinguish agentic risks and prioritize defenses. The explicit treatment of both technical and governance perspectives, along with benchmark reviews, adds practical value. The paper's strength lies in its synthesis of existing research and clear articulation of open challenges, which can guide future work in a rapidly developing area.

major comments (1)
  1. [Taxonomy of threats] The taxonomy of threats (as described in the survey structure) is presented as coherent and specific to agentic AI, but its stability rests on the assumption that the surveyed literature on planning, tool use, memory, and autonomy already spans the relevant threat space. The manuscript itself identifies potential gaps such as long-horizon goal hijacking via memory poisoning or cross-agent tool chaining; without an explicit gap analysis or justification for why these are excluded from the taxonomy rather than integrated, the central claim of a stable, distinct taxonomy risks systematic under-representation of attack surfaces.
minor comments (2)
  1. The abstract clearly states the scope but would benefit from indicating the approximate number of papers or benchmarks reviewed to convey the breadth of the synthesis.
  2. Adding a summary table that maps threats to existing benchmarks and defenses would improve clarity and allow readers to quickly assess coverage.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review of our survey on Agentic AI Security. Their feedback on the taxonomy is particularly valuable, and we address it in detail below with a commitment to revision.

read point-by-point responses
  1. Referee: [Taxonomy of threats] The taxonomy of threats (as described in the survey structure) is presented as coherent and specific to agentic AI, but its stability rests on the assumption that the surveyed literature on planning, tool use, memory, and autonomy already spans the relevant threat space. The manuscript itself identifies potential gaps such as long-horizon goal hijacking via memory poisoning or cross-agent tool chaining; without an explicit gap analysis or justification for why these are excluded from the taxonomy rather than integrated, the central claim of a stable, distinct taxonomy risks systematic under-representation of attack surfaces.

    Authors: We appreciate the referee's observation on the taxonomy's foundations. The taxonomy is derived directly from the surveyed literature on agentic components, which forms the basis for its coherence and specificity to agentic AI. We acknowledge that an explicit gap analysis would better support the claim of stability. In the revised manuscript, we will add a dedicated subsection on 'Scope and Limitations of the Taxonomy' that performs this analysis. It will explicitly discuss the cited examples (long-horizon goal hijacking via memory poisoning and cross-agent tool chaining), justify their current placement as open challenges rather than core taxonomy categories due to limited empirical studies available at the time of writing, and clarify that the taxonomy prioritizes threats with documented vectors while flagging emerging surfaces for future integration. This addition will strengthen transparency without altering the taxonomy structure itself. revision: yes

Circularity Check

0 steps flagged

No circularity: survey synthesizes external literature without self-referential derivations

full rationale

This is a survey paper that compiles a taxonomy of threats, reviews benchmarks, and discusses defenses by drawing on existing published research in agentic AI and security. No mathematical derivations, fitted parameters, or first-principles predictions are present that could reduce to the paper's own inputs by construction. The central synthesis relies on external citations rather than self-citation chains or self-definitional structures, making the work self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper, the work does not introduce new free parameters, axioms, or invented entities; it relies entirely on the existing body of published research in AI and security.

pith-pipeline@v0.9.0 · 5639 in / 1009 out tokens · 41874 ms · 2026-05-18T03:39:29.682176+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

    cs.CR 2026-05 unverdicted novelty 8.0

    Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying ...

  2. Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents

    cs.CY 2026-04 accept novelty 8.0

    This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that be...

  3. Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

    cs.CR 2026-04 accept novelty 8.0

    Agent Skills has structural security weaknesses from missing data-instruction boundaries, single-approval persistent trust, and absent marketplace reviews that require fundamental redesign.

  4. Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

    cs.CR 2026-04 unverdicted novelty 7.0

    A parameterized DFA firewall enforces safe tool sequences for structured AI agents, reducing attack success rates to 2.2% in tested workflows with low added latency.

  5. A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

    cs.CR 2026-04 unverdicted novelty 7.0

    A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

  6. Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

    cs.CR 2026-05 unverdicted novelty 6.0

    The paper defines and evaluates Trojan Hippo attacks on LLM agent memory, showing 85-100% success in data exfiltration across backends and reduced rates with defenses at varying utility costs.

  7. Security Considerations for Multi-agent Systems

    cs.CR 2026-03 unverdicted novelty 6.0

    No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.

  8. Surviving the Unseen: Predictive Defense for Novel Multi-Turn Multimodal Attacks

    cs.CR 2026-05 unverdicted novelty 4.0

    Proposes the TRIAD framework that treats multi-turn multimodal attacks as continuous trajectories and uses structural anomaly detection, regularized Mahalanobis distance, topological acceleration, and a time-varying C...

  9. When the Agent Is the Adversary: Architectural Requirements for Agentic AI Containment After the April 2026 Frontier Model Escape

    cs.CR 2026-04 unverdicted novelty 3.0

    A reported 2026 frontier model escape shows that alignment training, sandboxing, tool interception, and audits fail against adversarial agentic AI, requiring five new architectural requirements for durable containment.

Reference graph

Works this paper leans on

257 extracted references · 257 canonical work pages · cited by 8 Pith papers · 41 internal anchors

  1. [1]

    Springer, 2024

    Sachin Kumar, Ajit Kumar Verma, and Amna Mirza.Digital transformation, artificial intelligence and society. Springer, 2024

  2. [2]

    The epistemology of a rule-based expert system—a framework for explanation.Artificial intelligence, 20(3):215–251, 1983

    William J Clancey. The epistemology of a rule-based expert system—a framework for explanation.Artificial intelligence, 20(3):215–251, 1983

  3. [3]

    you won’t believe ChatGPT’s response to this prompt!

    Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.nature, 521(7553):436–444, 2015. 2Consider an example from [253] where the attacker lures the user by saying:“you won’t believe ChatGPT’s response to this prompt!"followed by the prompt injection text in another language orBase64so the user cannot ascertain its adversarial nature. 21 Agentic AI...

  4. [4]

    A modern approach.Artificial Intelligence

    Stuart Russell, Peter Norvig, and Artificial Intelligence. A modern approach.Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs, 25(27):79–80, 1995

  5. [5]

    Brown, Benjamin Mann, Nick Ryder, et al

    Tom B. Brown, Benjamin Mann, Nick Ryder, et al. Language models are few-shot learners.Advances in Neural Information Processing Systems (NeurIPS), 2020

  6. [6]

    GPT-4 Technical Report

    OpenAI. Gpt-4 technical report, 2023. arXiv:2303.08774

  7. [7]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Hugo Touvron et al. Llama 2: Open foundation and fine-tuned chat models, 2023. arXiv:2307.09288

  8. [8]

    A survey of multimodal large language model from a data-centric perspective

    Tianyi Bai, Hao Liang, Binwang Wan, Yanran Xu, Xi Li, Shiyu Li, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, et al. A survey of multimodal large language model from a data-centric perspective.arXiv preprint arXiv:2405.16640, 2024

  9. [9]

    Otter: A multi-modal model with in-context instruction tuning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

    Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Joshua Adrian Cahyono, Jingkang Yang, Chunyuan Li, and Ziwei Liu. Otter: A multi-modal model with in-context instruction tuning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  10. [10]

    The breakthrough of large language models release for medical applications: 1-year timeline and perspectives

    Marco Cascella, Federico Semeraro, Jonathan Montomoli, Valentina Bellini, Ornella Piazza, and Elena Bignami. The breakthrough of large language models release for medical applications: 1-year timeline and perspectives. Journal of Medical Systems, 48(1):22, 2024

  11. [11]

    Large language models: Their success and impact

    Spyros Makridakis, Fotios Petropoulos, and Yanfei Kang. Large language models: Their success and impact. Forecasting, 5(3):536–549, 2023

  12. [12]

    A comprehensive overview of large language models.ACM Transactions on Intelligent Systems and Technology, 2023

    Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. A comprehensive overview of large language models.ACM Transactions on Intelligent Systems and Technology, 2023

  13. [13]

    understand

    How LLMs Work. Ai what do large language models “understand”?Image, 21:1, 2024

  14. [14]

    Fundamental limitations of generative llms

    Andrei Kucharavy. Fundamental limitations of generative llms. InLarge Language Models in Cybersecurity: Threats, Exposure and Mitigation, pages 55–64. Springer Nature Switzerland Cham, 2024

  15. [15]

    Robert Tjarko Lange, Yuki Imajuku, and Edoardo Cetin

    Thomas Kwa, Ben West, Joel Becker, Amy Deng, Katharyn Garcia, Max Hasin, Sami Jawhar, Megan Kinniment, Nate Rush, Sydney V on Arx, et al. Measuring ai ability to complete long tasks.arXiv preprint arXiv:2503.14499, 2025

  16. [16]

    Ai agents are here

    Palo Alto Networks (Unit 42). Ai agents are here. so are the threats., 2025

  17. [17]

    Langchain documentation.https://python.langchain.com/, 2024

    LangChain. Langchain documentation.https://python.langchain.com/, 2024

  18. [18]

    Autogpt: An autonomous gpt experiment

    Toran Bruce Richards. Autogpt: An autonomous gpt experiment. https://github.com/Torantulino/ Auto-GPT, 2024

  19. [19]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    Guanzhi Wang et al. V oyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291, 2023

  20. [20]

    Three essentials for agentic ai security

    Paolo Dal Cin, Daniel Kendzior, Yusof Seedat, and Renato Marinho. Three essentials for agentic ai security. MIT Sloan Management Review (Online), pages 1–4, 2025

  21. [21]

    Just in time? manufacturers turn to ai to weather tariff storm, 2025

    Reuters. Just in time? manufacturers turn to ai to weather tariff storm, 2025. URL https://www.reuters. com/business/just-time-manufacturers-turn-ai-weather-tariff-storm-2025-08-13/

  22. [22]

    Forget chatbots

    Wired. Forget chatbots. ai agents are the future, 2025. URL https://www.wired.com/story/ fast-forward-forget-chatbots-ai-agents-are-the-future/. Accessed: 2025-08-16

  23. [23]

    Generative agents: Interactive simulacra of human behavior

    Joon Sung Park et al. Generative agents: Interactive simulacra of human behavior. InProceedings of the ACM Symposium on User Interface Software and Technology (UIST), 2023

  24. [24]

    Empowering biomedical discovery with ai agents.Cell, 187 (22):6125–6151, 2024

    Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, and Marinka Zitnik. Empowering biomedical discovery with ai agents.Cell, 187 (22):6125–6151, 2024

  25. [25]

    Agentic ai for scientific discovery: A survey of progress, challenges, and future directions.arXiv preprint arXiv:2503.08979, 2025

    Mourad Gridach, Jay Nanavati, Khaldoun Zine El Abidine, Lenon Mendes, and Christina Mack. Agentic ai for scientific discovery: A survey of progress, challenges, and future directions.arXiv preprint arXiv:2503.08979, 2025

  26. [26]

    Inside the automated warehouse where robots are packing your groceries, 2025

    Verge. Inside the automated warehouse where robots are packing your groceries, 2025. URL https://www.theverge.com/robot/719880/ ocado-online-grocery-automation-krogers-luton-ogrp-robot-grid. Accessed: 2025-08-16

  27. [27]

    Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023

    Zihan Chen, Yixin Wu, et al. Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023. URLhttps://arxiv.org/abs/2309.17288. 22 Agentic AI Security

  28. [28]

    Amazon’s delivery, logistics get ai boost, 2025

    Reuters. Amazon’s delivery, logistics get ai boost, 2025. URL https://www.reuters.com/business/ retail-consumer/amazons-delivery-logistics-will-get-an-ai-boost-2025-06-04/ . Accessed: 2025-08-16

  29. [29]

    arXiv preprint arXiv:2504.17669 , year=

    Subash Neupane, Sudip Mittal, and Shahram Rahimi. Towards a hipaa compliant agentic ai system in healthcare. arXiv preprint arXiv:2504.17669, 2025

  30. [30]

    Ai agents in healthcare

    Ken Huang. Ai agents in healthcare. InAgentic AI: Theories and Practices, pages 303–321. Springer, 2025

  31. [31]

    Next-generation agentic ai for transforming healthcare.Informatics and Health, 2(2): 73–83, 2025

    Nalan Karunanayake. Next-generation agentic ai for transforming healthcare.Informatics and Health, 2(2): 73–83, 2025

  32. [32]

    Coordinated ai agents for advancing healthcare.Nature Biomedical Engineering, pages 1–7, 2025

    Michael Moritz, Eric Topol, and Pranav Rajpurkar. Coordinated ai agents for advancing healthcare.Nature Biomedical Engineering, pages 1–7, 2025

  33. [33]

    The rise of agentic ai teammates in medicine.The Lancet, 405(10477):457, 2025

    James Zou and Eric J Topol. The rise of agentic ai teammates in medicine.The Lancet, 405(10477):457, 2025

  34. [34]

    Preventing zero-click ai threats: Insights from echoleak, 2025

    Trend Micro. Preventing zero-click ai threats: Insights from echoleak, 2025. URL https://www.trendmicro. com/en_us/research/25/g/preventing-zero-click-ai-threats-insights-from-echoleak. html. Accessed: 2025-08-16

  35. [35]

    Ai: Advent of agents opens new possibilities for attackers, 2025

    Symantec and Carbon Black. Ai: Advent of agents opens new possibilities for attackers, 2025. URL https: //www.security.com/threat-intelligence/ai-agent-attacks. Accessed: 2025-08-16

  36. [36]

    Assessing llms for zero-shot abstractive summarization through the lens of relevance paraphrasing

    Hadi Askari, Anshuman Chhabra, Muhao Chen, and Prasant Mohapatra. Assessing llms for zero-shot abstractive summarization through the lens of relevance paraphrasing. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 2187–2201, 2025

  37. [37]

    InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

    Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents.arXiv preprint arXiv:2403.02691, 2024

  38. [38]

    Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

    Donghyun Lee and Mo Tiwari. Prompt infection: Llm-to-llm prompt injection within multi-agent systems.arXiv preprint arXiv:2410.07283, 2024

  39. [39]

    Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models

    Shahriar Kabir Nahin, Hadi Askari, Muhao Chen, and Anshuman Chhabra. Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models.arXiv preprint arXiv:2510.08592, 2025

  40. [40]

    The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

    Matteo Lupinacci, Francesco Aurelio Pironti, Francesco Blefari, Francesco Romeo, Luigi Arena, and Angelo Fur- faro. The dark side of llms agent-based attacks for complete computer takeover.arXiv preprint arXiv:2507.06850, 2025

  41. [41]

    Agentic misalignment: How llms could be insider threats, 2025

    Anthropic Research Team. Agentic misalignment: How llms could be insider threats, 2025

  42. [42]

    Yann Dubois, Balázs Galambosi, Percy Liang, and Tat- sunori B Hashimoto

    Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. A practical memory injection attack against llm agents.arXiv preprint arXiv:2503.03704, 2025

  43. [43]

    Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases.Advances in Neural Information Processing Systems, 37:130185–130213, 2024

    Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases.Advances in Neural Information Processing Systems, 37:130185–130213, 2024

  44. [44]

    Commercial llm agents are already vulnerable to simple yet dangerous attacks,

    Ang Li, Yin Zhou, Vethavikashini Chithrra Raghuram, Tom Goldstein, and Micah Goldblum. Commercial llm agents are already vulnerable to simple yet dangerous attacks.arXiv preprint arXiv:2502.08586, 2025

  45. [45]

    Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes

    Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes. Imprompter: Tricking llm agents into improper tool use.arXiv preprint arXiv:2410.14923, 2024

  46. [46]

    AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

    Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, Zico Kolter, Matt Fredrikson, et al. Agentharm: A benchmark for measuring harmfulness of llm agents.arXiv preprint arXiv:2410.09024, 2024

  47. [47]

    Breaking agents: Compromising autonomous llm agents through malfunction amplification

    Boyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem, Michael Backes, Savvas Zannettou, and Yang Zhang. Breaking agents: Compromising autonomous llm agents through malfunction amplification.arXiv preprint arXiv:2407.20859, 2024

  48. [48]

    When llms autonomously attack, 2025

    Carnegie Mellon University. When llms autonomously attack, 2025. https://engineering.cmu.edu/ news-events/news/2025/07/24-when-llms-autonomously-attack.html

  49. [49]

    Generative to agentic ai: Survey, conceptualization, and challenges

    Johannes Schneider. Generative to agentic ai: Survey, conceptualization, and challenges.arXiv preprint arXiv:2504.18875, 2025

  50. [50]

    Evaluation and benchmarking of llm agents: A survey

    Mahmoud Mohammadi, Yipeng Li, Jane Lo, and Wendy Yip. Evaluation and benchmarking of llm agents: A survey. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6129–6139, 2025. 23 Agentic AI Security

  51. [51]

    Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

    Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, et al. Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems.arXiv preprint arXiv:2504.01990, 2025

  52. [52]

    A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

    Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, et al. A survey of self-evolving agents: On path to artificial super intelligence.arXiv preprint arXiv:2507.21046, 2025

  53. [53]

    Agentic ai: Autonomous intelligence for complex goals–a comprehensive survey.IEEe Access, 2025

    Deepak Bhaskar Acharya, Karthigeyan Kuppan, and B Divya. Agentic ai: Autonomous intelligence for complex goals–a comprehensive survey.IEEe Access, 2025

  54. [54]

    Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems.arXiv preprint arXiv:2506.04133, 2025

    Shaina Raza, Ranjan Sapkota, Manoj Karkee, and Christos Emmanouilidis. Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems.arXiv preprint arXiv:2506.04133, 2025

  55. [55]

    Artificial intelligence risk management framework: Generative artificial intelligence profile.NIST Trustworthy and Responsible AI Gaithersburg, MD, USA, 2024

    NIST AI. Artificial intelligence risk management framework: Generative artificial intelligence profile.NIST Trustworthy and Responsible AI Gaithersburg, MD, USA, 2024

  56. [56]

    A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

    Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382, 2023

  57. [57]

    Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection

    Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023. URL https://api. semanticscholar.org/CorpusID:258546941

  58. [58]

    Ignore Previous Prompt: Attack Techniques For Language Models

    Fábio Perez and Ian Ribeiro. Ignore previous prompt: Attack techniques for language models.ArXiv, abs/2211.09527, 2022. URLhttps://api.semanticscholar.org/CorpusID:253581710

  59. [59]

    (IBM, Invariant Labs, ETH Zurich, Google, Microsoft)

    Luca Beurer-Kellner, Beat Buesser, Ana-Maria Cre¸ tu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, et al. f.arXiv preprint arXiv:2506.08837, 2025

  60. [60]

    Owasp genai llm01: Prompt injection, 2025

    OW ASP GenAI Project. Owasp genai llm01: Prompt injection, 2025

  61. [61]

    Prompt injection 2.0: Hybrid ai threats,

    Jeremy McHugh, Kristina Sekrst, and Jonathan Rodriguez Cefalu. Prompt injection 2.0: Hybrid ai threats.ArXiv, abs/2507.13169, 2025. URLhttps://api.semanticscholar.org/CorpusID:280296803

  62. [62]

    Can Indirect Prompt Injection Attacks Be Detected and Removed?

    Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, and Bryan Hooi. Can indirect prompt injection attacks be detected and removed?arXiv preprint arXiv:2502.16580, 2025

  63. [63]

    Adaptive attacks break defenses against indirect prompt injection attacks on llm agents

    Qiusi Zhan, Richard Fang, Henil Shalin Panchal, and Daniel Kang. Adaptive attacks break defenses against indirect prompt injection attacks on llm agents. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 7101–7117, 2025

  64. [64]

    Zico Kolter, and Matt Fredrikson

    Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models, 2023. URL https://arxiv.org/abs/2307. 15043

  65. [65]

    Automatic and universal prompt injection attacks against large language models,

    Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, and Chaowei Xiao. Automatic and universal prompt injection attacks against large language models.arXiv preprint arXiv:2403.04957, 2024

  66. [66]

    Baseline Defenses for Adversarial Attacks Against Aligned Language Models

    Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping-yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models.arXiv preprint arXiv:2309.00614, 2023

  67. [67]

    Auto- DAN: Automatic and Interpretable Adversarial Attacks on Large Language Models,

    Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, and Tong Sun. Autodan: interpretable gradient-based adversarial attacks on large language models.arXiv preprint arXiv:2310.15140, 2023

  68. [68]

    Poisoning retrieval corpora by injecting adversarial passages

    Zexuan Zhong, Ziqing Huang, Alexander Wettig, and Danqi Chen. Poisoning retrieval corpora by injecting adversarial passages.arXiv preprint arXiv:2310.19156, 2023

  69. [69]

    Eia: Environmental injection attack on generalist web agents for privacy leakage.arXiv preprint arXiv:2409.11295, 2024

    Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, and Huan Sun. Eia: Environmental injection attack on generalist web agents for privacy leakage.arXiv preprint arXiv:2409.11295, 2024

  70. [70]

    Ad- vweb: Controllable black-box attacks on vlm-powered web agents

    Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, and Bo Li. Advagent: Controllable blackbox red-teaming on web agents.arXiv preprint arXiv:2410.17401, 2024

  71. [71]

    Adversarial at- tacks on multimodal agents

    Chen Henry Wu, Rishi Shah, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, and Aditi Raghunathan. Dissecting adversarial robustness of multimodal lm agents.arXiv preprint arXiv:2406.12814, 2024. 24 Agentic AI Security

  72. [72]

    Melon: Indirect prompt injection defense via masked re-execution and tool comparison

    Kaijie Zhu, Xianjun Yang, Jindong Wang, Wenbo Guo, and William Yang Wang. Melon: Provable defense against indirect prompt injection attacks in ai agents.arXiv preprint arXiv:2502.05174, 2025

  73. [73]

    Attacking vision- language computer agents via pop-ups

    Yanzhe Zhang, Tao Yu, and Diyi Yang. Attacking vision-language computer agents via pop-ups.arXiv preprint arXiv:2411.02391, 2024

  74. [74]

    InConference on Empirical Methods in Natural Language Processing

    Sam Johnson, Viet Pham, and Thai Le. Manipulating llm web agents with indirect prompt injection attack via html accessibility tree.arXiv preprint arXiv:2507.14799, 2025

  75. [75]

    Examining identity drift in conversations of llm agents, 2025

    Junhyuk Choi, Yeseon Hong, Minju Kim, and Bugeun Kim. Examining identity drift in conversations of llm agents, 2025. URLhttps://arxiv.org/abs/2412.00804

  76. [76]

    System prompt poisoning: Persistent attacks on large language models beyond user injection.arXiv preprint, 2025

    Jiawei Guo and Haipeng Cai. System prompt poisoning: Persistent attacks on large language models beyond user injection.arXiv preprint, 2025. URL https://arxiv.org/abs/2505.06493. Demonstrates how poisoning the system prompt can persistently compromise agent behavior across sessions

  77. [77]

    Human-imperceptible retrieval poisoning attacks in llm-powered applications

    Quan Zhang, Binqi Zeng, Chijin Zhou, Gwihwan Go, Heyuan Shi, and Yu Jiang. Human-imperceptible retrieval poisoning attacks in llm-powered applications. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, pages 502–506, 2024

  78. [78]

    and Teglia, Y

    Cody Clop and Yannick Teglia. Backdoored retrievers for prompt injection attacks on retrieval augmented generation of large language models.arXiv preprint arXiv:2410.14479, 2024

  79. [79]

    Manipulating multimodal agents via cross-modal prompt injection.arXiv preprint arXiv:2504.14348, 2025

    Le Wang, Zonghao Ying, Tianyuan Zhang, Siyuan Liang, Shengshan Hu, Mingchuan Zhang, Aishan Liu, and Xianglong Liu. Manipulating multimodal agents via cross-modal prompt injection.arXiv preprint arXiv:2504.14348, 2025

  80. [80]

    Unveiling ai agent vulnerabilities part ii: Code execution

    Sean Park. Unveiling ai agent vulnerabilities part ii: Code execution. Trend Micro Research Report, 2025. URL https://www.trendmicro.com/vinfo/br/security/news/cybercrime-and-digital-threats/ unveiling-ai-agent-vulnerabilities-code-execution . Examines vulnerabilities in LLM-powered agents with code execution, document upload, and internet access

Showing first 80 references.