Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
Pith reviewed 2026-05-18 03:39 UTC · model grok-4.3
The pith
Agentic AI systems introduce security risks that are distinct from both traditional AI safety and conventional software security.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Agentic AI systems powered by large language models and endowed with planning, tool use, memory, and autonomy create new and amplified security risks, distinct from both traditional AI safety and conventional software security. This is supported by outlining a taxonomy of threats specific to agentic AI, reviewing recent benchmarks and evaluation methodologies, and discussing defense strategies from both technical and governance perspectives, while highlighting open challenges.
What carries the argument
The taxonomy of threats specific to agentic AI, which classifies risks arising from autonomous task execution in diverse environments.
If this is right
- Developers can use the taxonomy to proactively identify risks in agent designs.
- Benchmarks and evaluations can be standardized to measure agentic security posture.
- Defenses can combine technical fixes with policy and governance approaches.
- Future agent systems can be engineered with security considerations integrated from the beginning.
- Research efforts can focus on the identified open challenges to advance the field.
Where Pith is reading between the lines
- Applying this taxonomy to real-world agent deployments could reveal additional threat categories not yet documented.
- Connections to physical world risks suggest that agentic AI in robotics or IoT might require hybrid security models.
- Open challenges may lead to new research on multi-agent interactions and their security implications.
- Secure-by-design principles could extend to regulatory standards for AI agents.
Load-bearing premise
That the existing body of research on agentic AI is mature enough to form a stable taxonomy of threats and identify open challenges without significant gaps.
What would settle it
Discovery of a major new security vulnerability in deployed agentic AI systems that falls outside the proposed taxonomy or is not addressed by current evaluation methods.
Figures
read the original abstract
Agentic AI systems powered by large language models (LLMs) and endowed with planning, tool use, memory, and autonomy, are emerging as powerful, flexible platforms for automation. Their ability to autonomously execute tasks across web, software, and physical environments creates new and amplified security risks, distinct from both traditional AI safety and conventional software security. This survey outlines a taxonomy of threats specific to agentic AI, reviews recent benchmarks and evaluation methodologies, and discusses defense strategies from both technical and governance perspectives. We synthesize current research and highlight open challenges, aiming to support the development of secure-by-design agent systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a survey on security for agentic AI systems, which combine LLMs with planning, tool use, memory, and autonomy. It claims these systems create new and amplified security risks distinct from traditional AI safety and conventional software security. The paper outlines a taxonomy of such threats, reviews benchmarks and evaluation methodologies, discusses technical and governance defense strategies, synthesizes current research, and identifies open challenges to support secure-by-design agent development.
Significance. If the taxonomy is comprehensive and the literature synthesis accurate, the work would offer a useful organizing framework for an emerging subfield, helping researchers and practitioners distinguish agentic risks and prioritize defenses. The explicit treatment of both technical and governance perspectives, along with benchmark reviews, adds practical value. The paper's strength lies in its synthesis of existing research and clear articulation of open challenges, which can guide future work in a rapidly developing area.
major comments (1)
- [Taxonomy of threats] The taxonomy of threats (as described in the survey structure) is presented as coherent and specific to agentic AI, but its stability rests on the assumption that the surveyed literature on planning, tool use, memory, and autonomy already spans the relevant threat space. The manuscript itself identifies potential gaps such as long-horizon goal hijacking via memory poisoning or cross-agent tool chaining; without an explicit gap analysis or justification for why these are excluded from the taxonomy rather than integrated, the central claim of a stable, distinct taxonomy risks systematic under-representation of attack surfaces.
minor comments (2)
- The abstract clearly states the scope but would benefit from indicating the approximate number of papers or benchmarks reviewed to convey the breadth of the synthesis.
- Adding a summary table that maps threats to existing benchmarks and defenses would improve clarity and allow readers to quickly assess coverage.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive review of our survey on Agentic AI Security. Their feedback on the taxonomy is particularly valuable, and we address it in detail below with a commitment to revision.
read point-by-point responses
-
Referee: [Taxonomy of threats] The taxonomy of threats (as described in the survey structure) is presented as coherent and specific to agentic AI, but its stability rests on the assumption that the surveyed literature on planning, tool use, memory, and autonomy already spans the relevant threat space. The manuscript itself identifies potential gaps such as long-horizon goal hijacking via memory poisoning or cross-agent tool chaining; without an explicit gap analysis or justification for why these are excluded from the taxonomy rather than integrated, the central claim of a stable, distinct taxonomy risks systematic under-representation of attack surfaces.
Authors: We appreciate the referee's observation on the taxonomy's foundations. The taxonomy is derived directly from the surveyed literature on agentic components, which forms the basis for its coherence and specificity to agentic AI. We acknowledge that an explicit gap analysis would better support the claim of stability. In the revised manuscript, we will add a dedicated subsection on 'Scope and Limitations of the Taxonomy' that performs this analysis. It will explicitly discuss the cited examples (long-horizon goal hijacking via memory poisoning and cross-agent tool chaining), justify their current placement as open challenges rather than core taxonomy categories due to limited empirical studies available at the time of writing, and clarify that the taxonomy prioritizes threats with documented vectors while flagging emerging surfaces for future integration. This addition will strengthen transparency without altering the taxonomy structure itself. revision: yes
Circularity Check
No circularity: survey synthesizes external literature without self-referential derivations
full rationale
This is a survey paper that compiles a taxonomy of threats, reviews benchmarks, and discusses defenses by drawing on existing published research in agentic AI and security. No mathematical derivations, fitted parameters, or first-principles predictions are present that could reduce to the paper's own inputs by construction. The central synthesis relies on external citations rather than self-citation chains or self-definitional structures, making the work self-contained against external benchmarks with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
This survey outlines a taxonomy of threats specific to agentic AI, reviews recent benchmarks and evaluation methodologies, and discusses defense strategies from both technical and governance perspectives.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We now propose and discuss a taxonomy of attacks and security vulnerabilities for agentic AI systems... Prompt Injection and Jailbreaks, Autonomous Cyber-Exploitation and Tool Abuse, Multi-Agent and Protocol-Level Threats...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 9 Pith papers
-
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration
Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying ...
-
Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents
This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that be...
-
Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis
Agent Skills has structural security weaknesses from missing data-instruction boundaries, single-approval persistent trust, and absent marketplace reviews that require fundamental redesign.
-
Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents
A parameterized DFA firewall enforces safe tool sequences for structured AI agents, reducing attack success rates to 2.2% in tested workflows with low added latency.
-
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
-
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration
The paper defines and evaluates Trojan Hippo attacks on LLM agent memory, showing 85-100% success in data exfiltration across backends and reduced rates with defenses at varying utility costs.
-
Security Considerations for Multi-agent Systems
No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
-
Surviving the Unseen: Predictive Defense for Novel Multi-Turn Multimodal Attacks
Proposes the TRIAD framework that treats multi-turn multimodal attacks as continuous trajectories and uses structural anomaly detection, regularized Mahalanobis distance, topological acceleration, and a time-varying C...
-
When the Agent Is the Adversary: Architectural Requirements for Agentic AI Containment After the April 2026 Frontier Model Escape
A reported 2026 frontier model escape shows that alignment training, sandboxing, tool interception, and audits fail against adversarial agentic AI, requiring five new architectural requirements for durable containment.
Reference graph
Works this paper leans on
-
[1]
Sachin Kumar, Ajit Kumar Verma, and Amna Mirza.Digital transformation, artificial intelligence and society. Springer, 2024
work page 2024
-
[2]
William J Clancey. The epistemology of a rule-based expert system—a framework for explanation.Artificial intelligence, 20(3):215–251, 1983
work page 1983
-
[3]
you won’t believe ChatGPT’s response to this prompt!
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.nature, 521(7553):436–444, 2015. 2Consider an example from [253] where the attacker lures the user by saying:“you won’t believe ChatGPT’s response to this prompt!"followed by the prompt injection text in another language orBase64so the user cannot ascertain its adversarial nature. 21 Agentic AI...
work page 2015
-
[4]
A modern approach.Artificial Intelligence
Stuart Russell, Peter Norvig, and Artificial Intelligence. A modern approach.Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs, 25(27):79–80, 1995
work page 1995
-
[5]
Brown, Benjamin Mann, Nick Ryder, et al
Tom B. Brown, Benjamin Mann, Nick Ryder, et al. Language models are few-shot learners.Advances in Neural Information Processing Systems (NeurIPS), 2020
work page 2020
-
[6]
OpenAI. Gpt-4 technical report, 2023. arXiv:2303.08774
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[7]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron et al. Llama 2: Open foundation and fine-tuned chat models, 2023. arXiv:2307.09288
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[8]
A survey of multimodal large language model from a data-centric perspective
Tianyi Bai, Hao Liang, Binwang Wan, Yanran Xu, Xi Li, Shiyu Li, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, et al. A survey of multimodal large language model from a data-centric perspective.arXiv preprint arXiv:2405.16640, 2024
-
[9]
Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Joshua Adrian Cahyono, Jingkang Yang, Chunyuan Li, and Ziwei Liu. Otter: A multi-modal model with in-context instruction tuning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
work page 2025
-
[10]
Marco Cascella, Federico Semeraro, Jonathan Montomoli, Valentina Bellini, Ornella Piazza, and Elena Bignami. The breakthrough of large language models release for medical applications: 1-year timeline and perspectives. Journal of Medical Systems, 48(1):22, 2024
work page 2024
-
[11]
Large language models: Their success and impact
Spyros Makridakis, Fotios Petropoulos, and Yanfei Kang. Large language models: Their success and impact. Forecasting, 5(3):536–549, 2023
work page 2023
-
[12]
Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. A comprehensive overview of large language models.ACM Transactions on Intelligent Systems and Technology, 2023
work page 2023
-
[13]
How LLMs Work. Ai what do large language models “understand”?Image, 21:1, 2024
work page 2024
-
[14]
Fundamental limitations of generative llms
Andrei Kucharavy. Fundamental limitations of generative llms. InLarge Language Models in Cybersecurity: Threats, Exposure and Mitigation, pages 55–64. Springer Nature Switzerland Cham, 2024
work page 2024
-
[15]
Robert Tjarko Lange, Yuki Imajuku, and Edoardo Cetin
Thomas Kwa, Ben West, Joel Becker, Amy Deng, Katharyn Garcia, Max Hasin, Sami Jawhar, Megan Kinniment, Nate Rush, Sydney V on Arx, et al. Measuring ai ability to complete long tasks.arXiv preprint arXiv:2503.14499, 2025
-
[16]
Palo Alto Networks (Unit 42). Ai agents are here. so are the threats., 2025
work page 2025
-
[17]
Langchain documentation.https://python.langchain.com/, 2024
LangChain. Langchain documentation.https://python.langchain.com/, 2024
work page 2024
-
[18]
Autogpt: An autonomous gpt experiment
Toran Bruce Richards. Autogpt: An autonomous gpt experiment. https://github.com/Torantulino/ Auto-GPT, 2024
work page 2024
-
[19]
Voyager: An Open-Ended Embodied Agent with Large Language Models
Guanzhi Wang et al. V oyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[20]
Three essentials for agentic ai security
Paolo Dal Cin, Daniel Kendzior, Yusof Seedat, and Renato Marinho. Three essentials for agentic ai security. MIT Sloan Management Review (Online), pages 1–4, 2025
work page 2025
-
[21]
Just in time? manufacturers turn to ai to weather tariff storm, 2025
Reuters. Just in time? manufacturers turn to ai to weather tariff storm, 2025. URL https://www.reuters. com/business/just-time-manufacturers-turn-ai-weather-tariff-storm-2025-08-13/
work page 2025
-
[22]
Wired. Forget chatbots. ai agents are the future, 2025. URL https://www.wired.com/story/ fast-forward-forget-chatbots-ai-agents-are-the-future/. Accessed: 2025-08-16
work page 2025
-
[23]
Generative agents: Interactive simulacra of human behavior
Joon Sung Park et al. Generative agents: Interactive simulacra of human behavior. InProceedings of the ACM Symposium on User Interface Software and Technology (UIST), 2023
work page 2023
-
[24]
Empowering biomedical discovery with ai agents.Cell, 187 (22):6125–6151, 2024
Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, and Marinka Zitnik. Empowering biomedical discovery with ai agents.Cell, 187 (22):6125–6151, 2024
work page 2024
-
[25]
Mourad Gridach, Jay Nanavati, Khaldoun Zine El Abidine, Lenon Mendes, and Christina Mack. Agentic ai for scientific discovery: A survey of progress, challenges, and future directions.arXiv preprint arXiv:2503.08979, 2025
-
[26]
Inside the automated warehouse where robots are packing your groceries, 2025
Verge. Inside the automated warehouse where robots are packing your groceries, 2025. URL https://www.theverge.com/robot/719880/ ocado-online-grocery-automation-krogers-luton-ogrp-robot-grid. Accessed: 2025-08-16
work page 2025
-
[27]
Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023
Zihan Chen, Yixin Wu, et al. Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023. URLhttps://arxiv.org/abs/2309.17288. 22 Agentic AI Security
-
[28]
Amazon’s delivery, logistics get ai boost, 2025
Reuters. Amazon’s delivery, logistics get ai boost, 2025. URL https://www.reuters.com/business/ retail-consumer/amazons-delivery-logistics-will-get-an-ai-boost-2025-06-04/ . Accessed: 2025-08-16
work page 2025
-
[29]
arXiv preprint arXiv:2504.17669 , year=
Subash Neupane, Sudip Mittal, and Shahram Rahimi. Towards a hipaa compliant agentic ai system in healthcare. arXiv preprint arXiv:2504.17669, 2025
-
[30]
Ken Huang. Ai agents in healthcare. InAgentic AI: Theories and Practices, pages 303–321. Springer, 2025
work page 2025
-
[31]
Next-generation agentic ai for transforming healthcare.Informatics and Health, 2(2): 73–83, 2025
Nalan Karunanayake. Next-generation agentic ai for transforming healthcare.Informatics and Health, 2(2): 73–83, 2025
work page 2025
-
[32]
Coordinated ai agents for advancing healthcare.Nature Biomedical Engineering, pages 1–7, 2025
Michael Moritz, Eric Topol, and Pranav Rajpurkar. Coordinated ai agents for advancing healthcare.Nature Biomedical Engineering, pages 1–7, 2025
work page 2025
-
[33]
The rise of agentic ai teammates in medicine.The Lancet, 405(10477):457, 2025
James Zou and Eric J Topol. The rise of agentic ai teammates in medicine.The Lancet, 405(10477):457, 2025
work page 2025
-
[34]
Preventing zero-click ai threats: Insights from echoleak, 2025
Trend Micro. Preventing zero-click ai threats: Insights from echoleak, 2025. URL https://www.trendmicro. com/en_us/research/25/g/preventing-zero-click-ai-threats-insights-from-echoleak. html. Accessed: 2025-08-16
work page 2025
-
[35]
Ai: Advent of agents opens new possibilities for attackers, 2025
Symantec and Carbon Black. Ai: Advent of agents opens new possibilities for attackers, 2025. URL https: //www.security.com/threat-intelligence/ai-agent-attacks. Accessed: 2025-08-16
work page 2025
-
[36]
Assessing llms for zero-shot abstractive summarization through the lens of relevance paraphrasing
Hadi Askari, Anshuman Chhabra, Muhao Chen, and Prasant Mohapatra. Assessing llms for zero-shot abstractive summarization through the lens of relevance paraphrasing. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 2187–2201, 2025
work page 2025
-
[37]
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents.arXiv preprint arXiv:2403.02691, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[38]
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems
Donghyun Lee and Mo Tiwari. Prompt infection: Llm-to-llm prompt injection within multi-agent systems.arXiv preprint arXiv:2410.07283, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[39]
Shahriar Kabir Nahin, Hadi Askari, Muhao Chen, and Anshuman Chhabra. Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models.arXiv preprint arXiv:2510.08592, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[40]
The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise
Matteo Lupinacci, Francesco Aurelio Pironti, Francesco Blefari, Francesco Romeo, Luigi Arena, and Angelo Fur- faro. The dark side of llms agent-based attacks for complete computer takeover.arXiv preprint arXiv:2507.06850, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[41]
Agentic misalignment: How llms could be insider threats, 2025
Anthropic Research Team. Agentic misalignment: How llms could be insider threats, 2025
work page 2025
-
[42]
Yann Dubois, Balázs Galambosi, Percy Liang, and Tat- sunori B Hashimoto
Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. A practical memory injection attack against llm agents.arXiv preprint arXiv:2503.03704, 2025
-
[43]
Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases.Advances in Neural Information Processing Systems, 37:130185–130213, 2024
work page 2024
-
[44]
Commercial llm agents are already vulnerable to simple yet dangerous attacks,
Ang Li, Yin Zhou, Vethavikashini Chithrra Raghuram, Tom Goldstein, and Micah Goldblum. Commercial llm agents are already vulnerable to simple yet dangerous attacks.arXiv preprint arXiv:2502.08586, 2025
-
[45]
Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes
Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes. Imprompter: Tricking llm agents into improper tool use.arXiv preprint arXiv:2410.14923, 2024
-
[46]
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, Zico Kolter, Matt Fredrikson, et al. Agentharm: A benchmark for measuring harmfulness of llm agents.arXiv preprint arXiv:2410.09024, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[47]
Breaking agents: Compromising autonomous llm agents through malfunction amplification
Boyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem, Michael Backes, Savvas Zannettou, and Yang Zhang. Breaking agents: Compromising autonomous llm agents through malfunction amplification.arXiv preprint arXiv:2407.20859, 2024
-
[48]
When llms autonomously attack, 2025
Carnegie Mellon University. When llms autonomously attack, 2025. https://engineering.cmu.edu/ news-events/news/2025/07/24-when-llms-autonomously-attack.html
work page 2025
-
[49]
Generative to agentic ai: Survey, conceptualization, and challenges
Johannes Schneider. Generative to agentic ai: Survey, conceptualization, and challenges.arXiv preprint arXiv:2504.18875, 2025
-
[50]
Evaluation and benchmarking of llm agents: A survey
Mahmoud Mohammadi, Yipeng Li, Jane Lo, and Wendy Yip. Evaluation and benchmarking of llm agents: A survey. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6129–6139, 2025. 23 Agentic AI Security
work page 2025
-
[51]
Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, et al. Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems.arXiv preprint arXiv:2504.01990, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[52]
Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, et al. A survey of self-evolving agents: On path to artificial super intelligence.arXiv preprint arXiv:2507.21046, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[53]
Agentic ai: Autonomous intelligence for complex goals–a comprehensive survey.IEEe Access, 2025
Deepak Bhaskar Acharya, Karthigeyan Kuppan, and B Divya. Agentic ai: Autonomous intelligence for complex goals–a comprehensive survey.IEEe Access, 2025
work page 2025
-
[54]
Shaina Raza, Ranjan Sapkota, Manoj Karkee, and Christos Emmanouilidis. Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems.arXiv preprint arXiv:2506.04133, 2025
-
[55]
NIST AI. Artificial intelligence risk management framework: Generative artificial intelligence profile.NIST Trustworthy and Responsible AI Gaithersburg, MD, USA, 2024
work page 2024
-
[56]
A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT
Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[57]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023. URL https://api. semanticscholar.org/CorpusID:258546941
work page 2023
-
[58]
Ignore Previous Prompt: Attack Techniques For Language Models
Fábio Perez and Ian Ribeiro. Ignore previous prompt: Attack techniques for language models.ArXiv, abs/2211.09527, 2022. URLhttps://api.semanticscholar.org/CorpusID:253581710
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[59]
(IBM, Invariant Labs, ETH Zurich, Google, Microsoft)
Luca Beurer-Kellner, Beat Buesser, Ana-Maria Cre¸ tu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, et al. f.arXiv preprint arXiv:2506.08837, 2025
-
[60]
Owasp genai llm01: Prompt injection, 2025
OW ASP GenAI Project. Owasp genai llm01: Prompt injection, 2025
work page 2025
-
[61]
Prompt injection 2.0: Hybrid ai threats,
Jeremy McHugh, Kristina Sekrst, and Jonathan Rodriguez Cefalu. Prompt injection 2.0: Hybrid ai threats.ArXiv, abs/2507.13169, 2025. URLhttps://api.semanticscholar.org/CorpusID:280296803
-
[62]
Can Indirect Prompt Injection Attacks Be Detected and Removed?
Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, and Bryan Hooi. Can indirect prompt injection attacks be detected and removed?arXiv preprint arXiv:2502.16580, 2025
-
[63]
Adaptive attacks break defenses against indirect prompt injection attacks on llm agents
Qiusi Zhan, Richard Fang, Henil Shalin Panchal, and Daniel Kang. Adaptive attacks break defenses against indirect prompt injection attacks on llm agents. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 7101–7117, 2025
work page 2025
-
[64]
Zico Kolter, and Matt Fredrikson
Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models, 2023. URL https://arxiv.org/abs/2307. 15043
work page 2023
-
[65]
Automatic and universal prompt injection attacks against large language models,
Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, and Chaowei Xiao. Automatic and universal prompt injection attacks against large language models.arXiv preprint arXiv:2403.04957, 2024
-
[66]
Baseline Defenses for Adversarial Attacks Against Aligned Language Models
Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping-yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models.arXiv preprint arXiv:2309.00614, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[67]
Auto- DAN: Automatic and Interpretable Adversarial Attacks on Large Language Models,
Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, and Tong Sun. Autodan: interpretable gradient-based adversarial attacks on large language models.arXiv preprint arXiv:2310.15140, 2023
-
[68]
Poisoning retrieval corpora by injecting adversarial passages
Zexuan Zhong, Ziqing Huang, Alexander Wettig, and Danqi Chen. Poisoning retrieval corpora by injecting adversarial passages.arXiv preprint arXiv:2310.19156, 2023
-
[69]
Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, and Huan Sun. Eia: Environmental injection attack on generalist web agents for privacy leakage.arXiv preprint arXiv:2409.11295, 2024
-
[70]
Ad- vweb: Controllable black-box attacks on vlm-powered web agents
Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, and Bo Li. Advagent: Controllable blackbox red-teaming on web agents.arXiv preprint arXiv:2410.17401, 2024
-
[71]
Adversarial at- tacks on multimodal agents
Chen Henry Wu, Rishi Shah, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, and Aditi Raghunathan. Dissecting adversarial robustness of multimodal lm agents.arXiv preprint arXiv:2406.12814, 2024. 24 Agentic AI Security
-
[72]
Melon: Indirect prompt injection defense via masked re-execution and tool comparison
Kaijie Zhu, Xianjun Yang, Jindong Wang, Wenbo Guo, and William Yang Wang. Melon: Provable defense against indirect prompt injection attacks in ai agents.arXiv preprint arXiv:2502.05174, 2025
-
[73]
Attacking vision- language computer agents via pop-ups
Yanzhe Zhang, Tao Yu, and Diyi Yang. Attacking vision-language computer agents via pop-ups.arXiv preprint arXiv:2411.02391, 2024
-
[74]
InConference on Empirical Methods in Natural Language Processing
Sam Johnson, Viet Pham, and Thai Le. Manipulating llm web agents with indirect prompt injection attack via html accessibility tree.arXiv preprint arXiv:2507.14799, 2025
-
[75]
Examining identity drift in conversations of llm agents, 2025
Junhyuk Choi, Yeseon Hong, Minju Kim, and Bugeun Kim. Examining identity drift in conversations of llm agents, 2025. URLhttps://arxiv.org/abs/2412.00804
-
[76]
Jiawei Guo and Haipeng Cai. System prompt poisoning: Persistent attacks on large language models beyond user injection.arXiv preprint, 2025. URL https://arxiv.org/abs/2505.06493. Demonstrates how poisoning the system prompt can persistently compromise agent behavior across sessions
-
[77]
Human-imperceptible retrieval poisoning attacks in llm-powered applications
Quan Zhang, Binqi Zeng, Chijin Zhou, Gwihwan Go, Heyuan Shi, and Yu Jiang. Human-imperceptible retrieval poisoning attacks in llm-powered applications. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, pages 502–506, 2024
work page 2024
-
[78]
Cody Clop and Yannick Teglia. Backdoored retrievers for prompt injection attacks on retrieval augmented generation of large language models.arXiv preprint arXiv:2410.14479, 2024
-
[79]
Le Wang, Zonghao Ying, Tianyuan Zhang, Siyuan Liang, Shengshan Hu, Mingchuan Zhang, Aishan Liu, and Xianglong Liu. Manipulating multimodal agents via cross-modal prompt injection.arXiv preprint arXiv:2504.14348, 2025
-
[80]
Unveiling ai agent vulnerabilities part ii: Code execution
Sean Park. Unveiling ai agent vulnerabilities part ii: Code execution. Trend Micro Research Report, 2025. URL https://www.trendmicro.com/vinfo/br/security/news/cybercrime-and-digital-threats/ unveiling-ai-agent-vulnerabilities-code-execution . Examines vulnerabilities in LLM-powered agents with code execution, document upload, and internet access
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.