Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

Anshuman Chhabra; Prasant Mohapatra; Shahriar Kabir Nahin; Shrestha Datta

arxiv: 2510.23883 · v3 · submitted 2025-10-27 · 💻 cs.AI

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

Anshuman Chhabra , Shrestha Datta , Shahriar Kabir Nahin , Prasant Mohapatra This is my paper

Pith reviewed 2026-05-18 03:39 UTC · model grok-4.3

classification 💻 cs.AI

keywords agentic AIAI securityLLM agentsthreat taxonomyAI defensesautonomous agentssecurity evaluationopen challenges

0 comments

The pith

Agentic AI systems introduce security risks that are distinct from both traditional AI safety and conventional software security.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This survey examines security issues in agentic AI, which are LLM-powered systems capable of planning, using tools, maintaining memory, and acting autonomously across web, software, and physical settings. It establishes that these features lead to new and amplified risks not covered by existing AI safety or software security frameworks. The paper delivers a taxonomy of agent-specific threats, reviews evaluation benchmarks and methods, and covers defense strategies involving both technical measures and governance. It synthesizes the research landscape and identifies open challenges to help build agent systems that are secure by design.

Core claim

Agentic AI systems powered by large language models and endowed with planning, tool use, memory, and autonomy create new and amplified security risks, distinct from both traditional AI safety and conventional software security. This is supported by outlining a taxonomy of threats specific to agentic AI, reviewing recent benchmarks and evaluation methodologies, and discussing defense strategies from both technical and governance perspectives, while highlighting open challenges.

What carries the argument

The taxonomy of threats specific to agentic AI, which classifies risks arising from autonomous task execution in diverse environments.

If this is right

Developers can use the taxonomy to proactively identify risks in agent designs.
Benchmarks and evaluations can be standardized to measure agentic security posture.
Defenses can combine technical fixes with policy and governance approaches.
Future agent systems can be engineered with security considerations integrated from the beginning.
Research efforts can focus on the identified open challenges to advance the field.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying this taxonomy to real-world agent deployments could reveal additional threat categories not yet documented.
Connections to physical world risks suggest that agentic AI in robotics or IoT might require hybrid security models.
Open challenges may lead to new research on multi-agent interactions and their security implications.
Secure-by-design principles could extend to regulatory standards for AI agents.

Load-bearing premise

That the existing body of research on agentic AI is mature enough to form a stable taxonomy of threats and identify open challenges without significant gaps.

What would settle it

Discovery of a major new security vulnerability in deployed agentic AI systems that falls outside the proposed taxonomy or is not addressed by current evaluation methods.

Figures

Figures reproduced from arXiv: 2510.23883 by Anshuman Chhabra, Prasant Mohapatra, Shahriar Kabir Nahin, Shrestha Datta.

**Figure 2.** Figure 2: Examples showcasing (a) direct and (b) indirect prompt injection. In the former, the agent is directly instructed by the adversary to reveal confidential information whereas in the latter the attacker has exploited the agent’s reliance on external information sources to have it download malware from their altered website. attacker; while for IPI attacks, the owner or supplier of agent-processed third-party… view at source ↗

**Figure 3.** Figure 3: An example showcasing unintentional prompt injection. Even when a malicious adversary is not present, unintended dangers can jeopardize LLM agents’ safety. For example, unclear or badly worded user inquiries may unintentionally overrule system directives or result in dangerous actions. Moreover, contextual drift within lengthy chat histories can change the agent’s behavior without the need for explicit ove… view at source ↗

**Figure 4.** Figure 4: Visualizing different prompt injection attacks based on modality: (a) [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Visualizing different protocol-level attacks for multi-agent systems: (a) [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Some Defenses Against Prompt Injection Attacks. [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Policy Filtering and Enforcement Defense Strategies. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: An Example of Sandboxing as a Defense Strategy. [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

read the original abstract

Agentic AI systems powered by large language models (LLMs) and endowed with planning, tool use, memory, and autonomy, are emerging as powerful, flexible platforms for automation. Their ability to autonomously execute tasks across web, software, and physical environments creates new and amplified security risks, distinct from both traditional AI safety and conventional software security. This survey outlines a taxonomy of threats specific to agentic AI, reviews recent benchmarks and evaluation methodologies, and discusses defense strategies from both technical and governance perspectives. We synthesize current research and highlight open challenges, aiming to support the development of secure-by-design agent systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a useful survey that organizes threats and defenses for agentic AI but its taxonomy may have gaps because the field is still young.

read the letter

This paper is a survey that maps out security threats for agentic AI systems, which are LLMs given planning, tool use, memory, and autonomy. The core thing to take away is that it creates a taxonomy of these risks and reviews defenses and evaluations, but its value rests on the assumption that current research covers the key issues adequately. It does well by synthesizing a lot of prior literature into one place and highlighting how agent autonomy creates distinct problems compared to regular software or non-agent AI. The sections on benchmarks and evaluation methodologies look helpful for assessing these systems. Discussing both technical defenses and governance perspectives adds a practical angle that could influence how teams build these agents securely. A potential weak point is the completeness of the taxonomy. Since agentic AI is still developing quickly, there could be threats not yet studied in the cited papers, such as memory poisoning affecting long-term goals or security issues when agents chain tools across different systems. If those are missing, the open challenges might not fully reflect the real landscape. The citations seem fine, drawing from external sources without any self-referential loops. This work is for AI security researchers and practitioners who want a structured introduction to the topic. A reader focused on building reliable autonomous systems would pick up useful ideas from the challenge list. It should go to a serious referee because surveys that organize emerging fields help the community focus efforts, provided the gaps are addressed in revision. I would recommend engaging with this for background and sending it out for peer review.

Referee Report

1 major / 2 minor

Summary. The manuscript is a survey on security for agentic AI systems, which combine LLMs with planning, tool use, memory, and autonomy. It claims these systems create new and amplified security risks distinct from traditional AI safety and conventional software security. The paper outlines a taxonomy of such threats, reviews benchmarks and evaluation methodologies, discusses technical and governance defense strategies, synthesizes current research, and identifies open challenges to support secure-by-design agent development.

Significance. If the taxonomy is comprehensive and the literature synthesis accurate, the work would offer a useful organizing framework for an emerging subfield, helping researchers and practitioners distinguish agentic risks and prioritize defenses. The explicit treatment of both technical and governance perspectives, along with benchmark reviews, adds practical value. The paper's strength lies in its synthesis of existing research and clear articulation of open challenges, which can guide future work in a rapidly developing area.

major comments (1)

[Taxonomy of threats] The taxonomy of threats (as described in the survey structure) is presented as coherent and specific to agentic AI, but its stability rests on the assumption that the surveyed literature on planning, tool use, memory, and autonomy already spans the relevant threat space. The manuscript itself identifies potential gaps such as long-horizon goal hijacking via memory poisoning or cross-agent tool chaining; without an explicit gap analysis or justification for why these are excluded from the taxonomy rather than integrated, the central claim of a stable, distinct taxonomy risks systematic under-representation of attack surfaces.

minor comments (2)

The abstract clearly states the scope but would benefit from indicating the approximate number of papers or benchmarks reviewed to convey the breadth of the synthesis.
Adding a summary table that maps threats to existing benchmarks and defenses would improve clarity and allow readers to quickly assess coverage.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review of our survey on Agentic AI Security. Their feedback on the taxonomy is particularly valuable, and we address it in detail below with a commitment to revision.

read point-by-point responses

Referee: [Taxonomy of threats] The taxonomy of threats (as described in the survey structure) is presented as coherent and specific to agentic AI, but its stability rests on the assumption that the surveyed literature on planning, tool use, memory, and autonomy already spans the relevant threat space. The manuscript itself identifies potential gaps such as long-horizon goal hijacking via memory poisoning or cross-agent tool chaining; without an explicit gap analysis or justification for why these are excluded from the taxonomy rather than integrated, the central claim of a stable, distinct taxonomy risks systematic under-representation of attack surfaces.

Authors: We appreciate the referee's observation on the taxonomy's foundations. The taxonomy is derived directly from the surveyed literature on agentic components, which forms the basis for its coherence and specificity to agentic AI. We acknowledge that an explicit gap analysis would better support the claim of stability. In the revised manuscript, we will add a dedicated subsection on 'Scope and Limitations of the Taxonomy' that performs this analysis. It will explicitly discuss the cited examples (long-horizon goal hijacking via memory poisoning and cross-agent tool chaining), justify their current placement as open challenges rather than core taxonomy categories due to limited empirical studies available at the time of writing, and clarify that the taxonomy prioritizes threats with documented vectors while flagging emerging surfaces for future integration. This addition will strengthen transparency without altering the taxonomy structure itself. revision: yes

Circularity Check

0 steps flagged

No circularity: survey synthesizes external literature without self-referential derivations

full rationale

This is a survey paper that compiles a taxonomy of threats, reviews benchmarks, and discusses defenses by drawing on existing published research in agentic AI and security. No mathematical derivations, fitted parameters, or first-principles predictions are present that could reduce to the paper's own inputs by construction. The central synthesis relies on external citations rather than self-citation chains or self-definitional structures, making the work self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper, the work does not introduce new free parameters, axioms, or invented entities; it relies entirely on the existing body of published research in AI and security.

pith-pipeline@v0.9.0 · 5639 in / 1009 out tokens · 41874 ms · 2026-05-18T03:39:29.682176+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

This survey outlines a taxonomy of threats specific to agentic AI, reviews recent benchmarks and evaluation methodologies, and discusses defense strategies from both technical and governance perspectives.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We now propose and discuss a taxonomy of attacks and security vulnerabilities for agentic AI systems... Prompt Injection and Jailbreaks, Autonomous Cyber-Exploitation and Tool Abuse, Multi-Agent and Protocol-Level Threats...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration
cs.CR 2026-05 unverdicted novelty 8.0

Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying ...
Taxonomy and Consistency Analysis of Safety Benchmarks for AI Agents
cs.CY 2026-04 accept novelty 8.0

This paper delivers the first systematic taxonomy and cross-benchmark consistency analysis of 40 agent safety benchmarks, finding broad but shallow risk coverage, no ranking concordance across evaluations, and that be...
Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis
cs.CR 2026-04 accept novelty 8.0

Agent Skills has structural security weaknesses from missing data-instruction boundaries, single-approval persistent trust, and absent marketplace reviews that require fundamental redesign.
Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents
cs.CR 2026-04 unverdicted novelty 7.0

A parameterized DFA firewall enforces safe tool sequences for structured AI agents, reducing attack success rates to 2.2% in tested workflows with low added latency.
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
cs.CR 2026-04 unverdicted novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration
cs.CR 2026-05 unverdicted novelty 6.0

The paper defines and evaluates Trojan Hippo attacks on LLM agent memory, showing 85-100% success in data exfiltration across backends and reduced rates with defenses at varying utility costs.
Security Considerations for Multi-agent Systems
cs.CR 2026-03 unverdicted novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
Surviving the Unseen: Predictive Defense for Novel Multi-Turn Multimodal Attacks
cs.CR 2026-05 unverdicted novelty 4.0

Proposes the TRIAD framework that treats multi-turn multimodal attacks as continuous trajectories and uses structural anomaly detection, regularized Mahalanobis distance, topological acceleration, and a time-varying C...
When the Agent Is the Adversary: Architectural Requirements for Agentic AI Containment After the April 2026 Frontier Model Escape
cs.CR 2026-04 unverdicted novelty 3.0

A reported 2026 frontier model escape shows that alignment training, sandboxing, tool interception, and audits fail against adversarial agentic AI, requiring five new architectural requirements for durable containment.

Reference graph

Works this paper leans on

257 extracted references · 257 canonical work pages · cited by 8 Pith papers · 41 internal anchors

[1]

Springer, 2024

Sachin Kumar, Ajit Kumar Verma, and Amna Mirza.Digital transformation, artificial intelligence and society. Springer, 2024

work page 2024
[2]

The epistemology of a rule-based expert system—a framework for explanation.Artificial intelligence, 20(3):215–251, 1983

William J Clancey. The epistemology of a rule-based expert system—a framework for explanation.Artificial intelligence, 20(3):215–251, 1983

work page 1983
[3]

you won’t believe ChatGPT’s response to this prompt!

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.nature, 521(7553):436–444, 2015. 2Consider an example from [253] where the attacker lures the user by saying:“you won’t believe ChatGPT’s response to this prompt!"followed by the prompt injection text in another language orBase64so the user cannot ascertain its adversarial nature. 21 Agentic AI...

work page 2015
[4]

A modern approach.Artificial Intelligence

Stuart Russell, Peter Norvig, and Artificial Intelligence. A modern approach.Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs, 25(27):79–80, 1995

work page 1995
[5]

Brown, Benjamin Mann, Nick Ryder, et al

Tom B. Brown, Benjamin Mann, Nick Ryder, et al. Language models are few-shot learners.Advances in Neural Information Processing Systems (NeurIPS), 2020

work page 2020
[6]

GPT-4 Technical Report

OpenAI. Gpt-4 technical report, 2023. arXiv:2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron et al. Llama 2: Open foundation and fine-tuned chat models, 2023. arXiv:2307.09288

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

A survey of multimodal large language model from a data-centric perspective

Tianyi Bai, Hao Liang, Binwang Wan, Yanran Xu, Xi Li, Shiyu Li, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, et al. A survey of multimodal large language model from a data-centric perspective.arXiv preprint arXiv:2405.16640, 2024

work page arXiv 2024
[9]

Otter: A multi-modal model with in-context instruction tuning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Joshua Adrian Cahyono, Jingkang Yang, Chunyuan Li, and Ziwei Liu. Otter: A multi-modal model with in-context instruction tuning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025
[10]

The breakthrough of large language models release for medical applications: 1-year timeline and perspectives

Marco Cascella, Federico Semeraro, Jonathan Montomoli, Valentina Bellini, Ornella Piazza, and Elena Bignami. The breakthrough of large language models release for medical applications: 1-year timeline and perspectives. Journal of Medical Systems, 48(1):22, 2024

work page 2024
[11]

Large language models: Their success and impact

Spyros Makridakis, Fotios Petropoulos, and Yanfei Kang. Large language models: Their success and impact. Forecasting, 5(3):536–549, 2023

work page 2023
[12]

A comprehensive overview of large language models.ACM Transactions on Intelligent Systems and Technology, 2023

Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. A comprehensive overview of large language models.ACM Transactions on Intelligent Systems and Technology, 2023

work page 2023
[13]

understand

How LLMs Work. Ai what do large language models “understand”?Image, 21:1, 2024

work page 2024
[14]

Fundamental limitations of generative llms

Andrei Kucharavy. Fundamental limitations of generative llms. InLarge Language Models in Cybersecurity: Threats, Exposure and Mitigation, pages 55–64. Springer Nature Switzerland Cham, 2024

work page 2024
[15]

Robert Tjarko Lange, Yuki Imajuku, and Edoardo Cetin

Thomas Kwa, Ben West, Joel Becker, Amy Deng, Katharyn Garcia, Max Hasin, Sami Jawhar, Megan Kinniment, Nate Rush, Sydney V on Arx, et al. Measuring ai ability to complete long tasks.arXiv preprint arXiv:2503.14499, 2025

work page arXiv 2025
[16]

Ai agents are here

Palo Alto Networks (Unit 42). Ai agents are here. so are the threats., 2025

work page 2025
[17]

Langchain documentation.https://python.langchain.com/, 2024

LangChain. Langchain documentation.https://python.langchain.com/, 2024

work page 2024
[18]

Autogpt: An autonomous gpt experiment

Toran Bruce Richards. Autogpt: An autonomous gpt experiment. https://github.com/Torantulino/ Auto-GPT, 2024

work page 2024
[19]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Guanzhi Wang et al. V oyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

Three essentials for agentic ai security

Paolo Dal Cin, Daniel Kendzior, Yusof Seedat, and Renato Marinho. Three essentials for agentic ai security. MIT Sloan Management Review (Online), pages 1–4, 2025

work page 2025
[21]

Just in time? manufacturers turn to ai to weather tariff storm, 2025

Reuters. Just in time? manufacturers turn to ai to weather tariff storm, 2025. URL https://www.reuters. com/business/just-time-manufacturers-turn-ai-weather-tariff-storm-2025-08-13/

work page 2025
[22]

Forget chatbots

Wired. Forget chatbots. ai agents are the future, 2025. URL https://www.wired.com/story/ fast-forward-forget-chatbots-ai-agents-are-the-future/. Accessed: 2025-08-16

work page 2025
[23]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park et al. Generative agents: Interactive simulacra of human behavior. InProceedings of the ACM Symposium on User Interface Software and Technology (UIST), 2023

work page 2023
[24]

Empowering biomedical discovery with ai agents.Cell, 187 (22):6125–6151, 2024

Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, and Marinka Zitnik. Empowering biomedical discovery with ai agents.Cell, 187 (22):6125–6151, 2024

work page 2024
[25]

Agentic ai for scientific discovery: A survey of progress, challenges, and future directions.arXiv preprint arXiv:2503.08979, 2025

Mourad Gridach, Jay Nanavati, Khaldoun Zine El Abidine, Lenon Mendes, and Christina Mack. Agentic ai for scientific discovery: A survey of progress, challenges, and future directions.arXiv preprint arXiv:2503.08979, 2025

work page arXiv 2025
[26]

Inside the automated warehouse where robots are packing your groceries, 2025

Verge. Inside the automated warehouse where robots are packing your groceries, 2025. URL https://www.theverge.com/robot/719880/ ocado-online-grocery-automation-krogers-luton-ogrp-robot-grid. Accessed: 2025-08-16

work page 2025
[27]

Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023

Zihan Chen, Yixin Wu, et al. Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023. URLhttps://arxiv.org/abs/2309.17288. 22 Agentic AI Security

work page arXiv 2023
[28]

Amazon’s delivery, logistics get ai boost, 2025

Reuters. Amazon’s delivery, logistics get ai boost, 2025. URL https://www.reuters.com/business/ retail-consumer/amazons-delivery-logistics-will-get-an-ai-boost-2025-06-04/ . Accessed: 2025-08-16

work page 2025
[29]

arXiv preprint arXiv:2504.17669 , year=

Subash Neupane, Sudip Mittal, and Shahram Rahimi. Towards a hipaa compliant agentic ai system in healthcare. arXiv preprint arXiv:2504.17669, 2025

work page arXiv 2025
[30]

Ai agents in healthcare

Ken Huang. Ai agents in healthcare. InAgentic AI: Theories and Practices, pages 303–321. Springer, 2025

work page 2025
[31]

Next-generation agentic ai for transforming healthcare.Informatics and Health, 2(2): 73–83, 2025

Nalan Karunanayake. Next-generation agentic ai for transforming healthcare.Informatics and Health, 2(2): 73–83, 2025

work page 2025
[32]

Coordinated ai agents for advancing healthcare.Nature Biomedical Engineering, pages 1–7, 2025

Michael Moritz, Eric Topol, and Pranav Rajpurkar. Coordinated ai agents for advancing healthcare.Nature Biomedical Engineering, pages 1–7, 2025

work page 2025
[33]

The rise of agentic ai teammates in medicine.The Lancet, 405(10477):457, 2025

James Zou and Eric J Topol. The rise of agentic ai teammates in medicine.The Lancet, 405(10477):457, 2025

work page 2025
[34]

Preventing zero-click ai threats: Insights from echoleak, 2025

Trend Micro. Preventing zero-click ai threats: Insights from echoleak, 2025. URL https://www.trendmicro. com/en_us/research/25/g/preventing-zero-click-ai-threats-insights-from-echoleak. html. Accessed: 2025-08-16

work page 2025
[35]

Ai: Advent of agents opens new possibilities for attackers, 2025

Symantec and Carbon Black. Ai: Advent of agents opens new possibilities for attackers, 2025. URL https: //www.security.com/threat-intelligence/ai-agent-attacks. Accessed: 2025-08-16

work page 2025
[36]

Assessing llms for zero-shot abstractive summarization through the lens of relevance paraphrasing

Hadi Askari, Anshuman Chhabra, Muhao Chen, and Prasant Mohapatra. Assessing llms for zero-shot abstractive summarization through the lens of relevance paraphrasing. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 2187–2201, 2025

work page 2025
[37]

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents.arXiv preprint arXiv:2403.02691, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

Donghyun Lee and Mo Tiwari. Prompt infection: Llm-to-llm prompt injection within multi-agent systems.arXiv preprint arXiv:2410.07283, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[39]

Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models

Shahriar Kabir Nahin, Hadi Askari, Muhao Chen, and Anshuman Chhabra. Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models.arXiv preprint arXiv:2510.08592, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

Matteo Lupinacci, Francesco Aurelio Pironti, Francesco Blefari, Francesco Romeo, Luigi Arena, and Angelo Fur- faro. The dark side of llms agent-based attacks for complete computer takeover.arXiv preprint arXiv:2507.06850, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[41]

Agentic misalignment: How llms could be insider threats, 2025

Anthropic Research Team. Agentic misalignment: How llms could be insider threats, 2025

work page 2025
[42]

Yann Dubois, Balázs Galambosi, Percy Liang, and Tat- sunori B Hashimoto

Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. A practical memory injection attack against llm agents.arXiv preprint arXiv:2503.03704, 2025

work page arXiv 2025
[43]

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases.Advances in Neural Information Processing Systems, 37:130185–130213, 2024

Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases.Advances in Neural Information Processing Systems, 37:130185–130213, 2024

work page 2024
[44]

Commercial llm agents are already vulnerable to simple yet dangerous attacks,

Ang Li, Yin Zhou, Vethavikashini Chithrra Raghuram, Tom Goldstein, and Micah Goldblum. Commercial llm agents are already vulnerable to simple yet dangerous attacks.arXiv preprint arXiv:2502.08586, 2025

work page arXiv 2025
[45]

Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes

Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes. Imprompter: Tricking llm agents into improper tool use.arXiv preprint arXiv:2410.14923, 2024

work page arXiv 2024
[46]

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, Zico Kolter, Matt Fredrikson, et al. Agentharm: A benchmark for measuring harmfulness of llm agents.arXiv preprint arXiv:2410.09024, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[47]

Breaking agents: Compromising autonomous llm agents through malfunction amplification

Boyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem, Michael Backes, Savvas Zannettou, and Yang Zhang. Breaking agents: Compromising autonomous llm agents through malfunction amplification.arXiv preprint arXiv:2407.20859, 2024

work page arXiv 2024
[48]

When llms autonomously attack, 2025

Carnegie Mellon University. When llms autonomously attack, 2025. https://engineering.cmu.edu/ news-events/news/2025/07/24-when-llms-autonomously-attack.html

work page 2025
[49]

Generative to agentic ai: Survey, conceptualization, and challenges

Johannes Schneider. Generative to agentic ai: Survey, conceptualization, and challenges.arXiv preprint arXiv:2504.18875, 2025

work page arXiv 2025
[50]

Evaluation and benchmarking of llm agents: A survey

Mahmoud Mohammadi, Yipeng Li, Jane Lo, and Wendy Yip. Evaluation and benchmarking of llm agents: A survey. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6129–6139, 2025. 23 Agentic AI Security

work page 2025
[51]

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, et al. Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems.arXiv preprint arXiv:2504.01990, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[52]

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, et al. A survey of self-evolving agents: On path to artificial super intelligence.arXiv preprint arXiv:2507.21046, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[53]

Agentic ai: Autonomous intelligence for complex goals–a comprehensive survey.IEEe Access, 2025

Deepak Bhaskar Acharya, Karthigeyan Kuppan, and B Divya. Agentic ai: Autonomous intelligence for complex goals–a comprehensive survey.IEEe Access, 2025

work page 2025
[54]

Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems.arXiv preprint arXiv:2506.04133, 2025

Shaina Raza, Ranjan Sapkota, Manoj Karkee, and Christos Emmanouilidis. Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems.arXiv preprint arXiv:2506.04133, 2025

work page arXiv 2025
[55]

Artificial intelligence risk management framework: Generative artificial intelligence profile.NIST Trustworthy and Responsible AI Gaithersburg, MD, USA, 2024

NIST AI. Artificial intelligence risk management framework: Generative artificial intelligence profile.NIST Trustworthy and Responsible AI Gaithersburg, MD, USA, 2024

work page 2024
[56]

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[57]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023. URL https://api. semanticscholar.org/CorpusID:258546941

work page 2023
[58]

Ignore Previous Prompt: Attack Techniques For Language Models

Fábio Perez and Ian Ribeiro. Ignore previous prompt: Attack techniques for language models.ArXiv, abs/2211.09527, 2022. URLhttps://api.semanticscholar.org/CorpusID:253581710

work page internal anchor Pith review Pith/arXiv arXiv 2022
[59]

(IBM, Invariant Labs, ETH Zurich, Google, Microsoft)

Luca Beurer-Kellner, Beat Buesser, Ana-Maria Cre¸ tu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, et al. f.arXiv preprint arXiv:2506.08837, 2025

work page arXiv 2025
[60]

Owasp genai llm01: Prompt injection, 2025

OW ASP GenAI Project. Owasp genai llm01: Prompt injection, 2025

work page 2025
[61]

Prompt injection 2.0: Hybrid ai threats,

Jeremy McHugh, Kristina Sekrst, and Jonathan Rodriguez Cefalu. Prompt injection 2.0: Hybrid ai threats.ArXiv, abs/2507.13169, 2025. URLhttps://api.semanticscholar.org/CorpusID:280296803

work page arXiv 2025
[62]

Can Indirect Prompt Injection Attacks Be Detected and Removed?

Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, and Bryan Hooi. Can indirect prompt injection attacks be detected and removed?arXiv preprint arXiv:2502.16580, 2025

work page arXiv 2025
[63]

Adaptive attacks break defenses against indirect prompt injection attacks on llm agents

Qiusi Zhan, Richard Fang, Henil Shalin Panchal, and Daniel Kang. Adaptive attacks break defenses against indirect prompt injection attacks on llm agents. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 7101–7117, 2025

work page 2025
[64]

Zico Kolter, and Matt Fredrikson

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models, 2023. URL https://arxiv.org/abs/2307. 15043

work page 2023
[65]

Automatic and universal prompt injection attacks against large language models,

Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, and Chaowei Xiao. Automatic and universal prompt injection attacks against large language models.arXiv preprint arXiv:2403.04957, 2024

work page arXiv 2024
[66]

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping-yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models.arXiv preprint arXiv:2309.00614, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[67]

Auto- DAN: Automatic and Interpretable Adversarial Attacks on Large Language Models,

Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, and Tong Sun. Autodan: interpretable gradient-based adversarial attacks on large language models.arXiv preprint arXiv:2310.15140, 2023

work page arXiv 2023
[68]

Poisoning retrieval corpora by injecting adversarial passages

Zexuan Zhong, Ziqing Huang, Alexander Wettig, and Danqi Chen. Poisoning retrieval corpora by injecting adversarial passages.arXiv preprint arXiv:2310.19156, 2023

work page arXiv 2023
[69]

Eia: Environmental injection attack on generalist web agents for privacy leakage.arXiv preprint arXiv:2409.11295, 2024

Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, and Huan Sun. Eia: Environmental injection attack on generalist web agents for privacy leakage.arXiv preprint arXiv:2409.11295, 2024

work page arXiv 2024
[70]

Ad- vweb: Controllable black-box attacks on vlm-powered web agents

Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, and Bo Li. Advagent: Controllable blackbox red-teaming on web agents.arXiv preprint arXiv:2410.17401, 2024

work page arXiv 2024
[71]

Adversarial at- tacks on multimodal agents

Chen Henry Wu, Rishi Shah, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, and Aditi Raghunathan. Dissecting adversarial robustness of multimodal lm agents.arXiv preprint arXiv:2406.12814, 2024. 24 Agentic AI Security

work page arXiv 2024
[72]

Melon: Indirect prompt injection defense via masked re-execution and tool comparison

Kaijie Zhu, Xianjun Yang, Jindong Wang, Wenbo Guo, and William Yang Wang. Melon: Provable defense against indirect prompt injection attacks in ai agents.arXiv preprint arXiv:2502.05174, 2025

work page arXiv 2025
[73]

Attacking vision- language computer agents via pop-ups

Yanzhe Zhang, Tao Yu, and Diyi Yang. Attacking vision-language computer agents via pop-ups.arXiv preprint arXiv:2411.02391, 2024

work page arXiv 2024
[74]

InConference on Empirical Methods in Natural Language Processing

Sam Johnson, Viet Pham, and Thai Le. Manipulating llm web agents with indirect prompt injection attack via html accessibility tree.arXiv preprint arXiv:2507.14799, 2025

work page arXiv 2025
[75]

Examining identity drift in conversations of llm agents, 2025

Junhyuk Choi, Yeseon Hong, Minju Kim, and Bugeun Kim. Examining identity drift in conversations of llm agents, 2025. URLhttps://arxiv.org/abs/2412.00804

work page arXiv 2025
[76]

System prompt poisoning: Persistent attacks on large language models beyond user injection.arXiv preprint, 2025

Jiawei Guo and Haipeng Cai. System prompt poisoning: Persistent attacks on large language models beyond user injection.arXiv preprint, 2025. URL https://arxiv.org/abs/2505.06493. Demonstrates how poisoning the system prompt can persistently compromise agent behavior across sessions

work page arXiv 2025
[77]

Human-imperceptible retrieval poisoning attacks in llm-powered applications

Quan Zhang, Binqi Zeng, Chijin Zhou, Gwihwan Go, Heyuan Shi, and Yu Jiang. Human-imperceptible retrieval poisoning attacks in llm-powered applications. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, pages 502–506, 2024

work page 2024
[78]

and Teglia, Y

Cody Clop and Yannick Teglia. Backdoored retrievers for prompt injection attacks on retrieval augmented generation of large language models.arXiv preprint arXiv:2410.14479, 2024

work page arXiv 2024
[79]

Manipulating multimodal agents via cross-modal prompt injection.arXiv preprint arXiv:2504.14348, 2025

Le Wang, Zonghao Ying, Tianyuan Zhang, Siyuan Liang, Shengshan Hu, Mingchuan Zhang, Aishan Liu, and Xianglong Liu. Manipulating multimodal agents via cross-modal prompt injection.arXiv preprint arXiv:2504.14348, 2025

work page arXiv 2025
[80]

Unveiling ai agent vulnerabilities part ii: Code execution

Sean Park. Unveiling ai agent vulnerabilities part ii: Code execution. Trend Micro Research Report, 2025. URL https://www.trendmicro.com/vinfo/br/security/news/cybercrime-and-digital-threats/ unveiling-ai-agent-vulnerabilities-code-execution . Examines vulnerabilities in LLM-powered agents with code execution, document upload, and internet access

work page 2025

Showing first 80 references.

[1] [1]

Springer, 2024

Sachin Kumar, Ajit Kumar Verma, and Amna Mirza.Digital transformation, artificial intelligence and society. Springer, 2024

work page 2024

[2] [2]

The epistemology of a rule-based expert system—a framework for explanation.Artificial intelligence, 20(3):215–251, 1983

William J Clancey. The epistemology of a rule-based expert system—a framework for explanation.Artificial intelligence, 20(3):215–251, 1983

work page 1983

[3] [3]

you won’t believe ChatGPT’s response to this prompt!

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.nature, 521(7553):436–444, 2015. 2Consider an example from [253] where the attacker lures the user by saying:“you won’t believe ChatGPT’s response to this prompt!"followed by the prompt injection text in another language orBase64so the user cannot ascertain its adversarial nature. 21 Agentic AI...

work page 2015

[4] [4]

A modern approach.Artificial Intelligence

Stuart Russell, Peter Norvig, and Artificial Intelligence. A modern approach.Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs, 25(27):79–80, 1995

work page 1995

[5] [5]

Brown, Benjamin Mann, Nick Ryder, et al

Tom B. Brown, Benjamin Mann, Nick Ryder, et al. Language models are few-shot learners.Advances in Neural Information Processing Systems (NeurIPS), 2020

work page 2020

[6] [6]

GPT-4 Technical Report

OpenAI. Gpt-4 technical report, 2023. arXiv:2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023

[7] [7]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron et al. Llama 2: Open foundation and fine-tuned chat models, 2023. arXiv:2307.09288

work page internal anchor Pith review Pith/arXiv arXiv 2023

[8] [8]

A survey of multimodal large language model from a data-centric perspective

Tianyi Bai, Hao Liang, Binwang Wan, Yanran Xu, Xi Li, Shiyu Li, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, et al. A survey of multimodal large language model from a data-centric perspective.arXiv preprint arXiv:2405.16640, 2024

work page arXiv 2024

[9] [9]

Otter: A multi-modal model with in-context instruction tuning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Joshua Adrian Cahyono, Jingkang Yang, Chunyuan Li, and Ziwei Liu. Otter: A multi-modal model with in-context instruction tuning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025

[10] [10]

The breakthrough of large language models release for medical applications: 1-year timeline and perspectives

Marco Cascella, Federico Semeraro, Jonathan Montomoli, Valentina Bellini, Ornella Piazza, and Elena Bignami. The breakthrough of large language models release for medical applications: 1-year timeline and perspectives. Journal of Medical Systems, 48(1):22, 2024

work page 2024

[11] [11]

Large language models: Their success and impact

Spyros Makridakis, Fotios Petropoulos, and Yanfei Kang. Large language models: Their success and impact. Forecasting, 5(3):536–549, 2023

work page 2023

[12] [12]

A comprehensive overview of large language models.ACM Transactions on Intelligent Systems and Technology, 2023

Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, and Ajmal Mian. A comprehensive overview of large language models.ACM Transactions on Intelligent Systems and Technology, 2023

work page 2023

[13] [13]

understand

How LLMs Work. Ai what do large language models “understand”?Image, 21:1, 2024

work page 2024

[14] [14]

Fundamental limitations of generative llms

Andrei Kucharavy. Fundamental limitations of generative llms. InLarge Language Models in Cybersecurity: Threats, Exposure and Mitigation, pages 55–64. Springer Nature Switzerland Cham, 2024

work page 2024

[15] [15]

Robert Tjarko Lange, Yuki Imajuku, and Edoardo Cetin

Thomas Kwa, Ben West, Joel Becker, Amy Deng, Katharyn Garcia, Max Hasin, Sami Jawhar, Megan Kinniment, Nate Rush, Sydney V on Arx, et al. Measuring ai ability to complete long tasks.arXiv preprint arXiv:2503.14499, 2025

work page arXiv 2025

[16] [16]

Ai agents are here

Palo Alto Networks (Unit 42). Ai agents are here. so are the threats., 2025

work page 2025

[17] [17]

Langchain documentation.https://python.langchain.com/, 2024

LangChain. Langchain documentation.https://python.langchain.com/, 2024

work page 2024

[18] [18]

Autogpt: An autonomous gpt experiment

Toran Bruce Richards. Autogpt: An autonomous gpt experiment. https://github.com/Torantulino/ Auto-GPT, 2024

work page 2024

[19] [19]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Guanzhi Wang et al. V oyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[20] [20]

Three essentials for agentic ai security

Paolo Dal Cin, Daniel Kendzior, Yusof Seedat, and Renato Marinho. Three essentials for agentic ai security. MIT Sloan Management Review (Online), pages 1–4, 2025

work page 2025

[21] [21]

Just in time? manufacturers turn to ai to weather tariff storm, 2025

Reuters. Just in time? manufacturers turn to ai to weather tariff storm, 2025. URL https://www.reuters. com/business/just-time-manufacturers-turn-ai-weather-tariff-storm-2025-08-13/

work page 2025

[22] [22]

Forget chatbots

Wired. Forget chatbots. ai agents are the future, 2025. URL https://www.wired.com/story/ fast-forward-forget-chatbots-ai-agents-are-the-future/. Accessed: 2025-08-16

work page 2025

[23] [23]

Generative agents: Interactive simulacra of human behavior

Joon Sung Park et al. Generative agents: Interactive simulacra of human behavior. InProceedings of the ACM Symposium on User Interface Software and Technology (UIST), 2023

work page 2023

[24] [24]

Empowering biomedical discovery with ai agents.Cell, 187 (22):6125–6151, 2024

Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, and Marinka Zitnik. Empowering biomedical discovery with ai agents.Cell, 187 (22):6125–6151, 2024

work page 2024

[25] [25]

Agentic ai for scientific discovery: A survey of progress, challenges, and future directions.arXiv preprint arXiv:2503.08979, 2025

Mourad Gridach, Jay Nanavati, Khaldoun Zine El Abidine, Lenon Mendes, and Christina Mack. Agentic ai for scientific discovery: A survey of progress, challenges, and future directions.arXiv preprint arXiv:2503.08979, 2025

work page arXiv 2025

[26] [26]

Inside the automated warehouse where robots are packing your groceries, 2025

Verge. Inside the automated warehouse where robots are packing your groceries, 2025. URL https://www.theverge.com/robot/719880/ ocado-online-grocery-automation-krogers-luton-ogrp-robot-grid. Accessed: 2025-08-16

work page 2025

[27] [27]

Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023

Zihan Chen, Yixin Wu, et al. Autoagents: A framework for automatic agent generation.arXiv preprint arXiv:2309.17288, 2023. URLhttps://arxiv.org/abs/2309.17288. 22 Agentic AI Security

work page arXiv 2023

[28] [28]

Amazon’s delivery, logistics get ai boost, 2025

Reuters. Amazon’s delivery, logistics get ai boost, 2025. URL https://www.reuters.com/business/ retail-consumer/amazons-delivery-logistics-will-get-an-ai-boost-2025-06-04/ . Accessed: 2025-08-16

work page 2025

[29] [29]

arXiv preprint arXiv:2504.17669 , year=

Subash Neupane, Sudip Mittal, and Shahram Rahimi. Towards a hipaa compliant agentic ai system in healthcare. arXiv preprint arXiv:2504.17669, 2025

work page arXiv 2025

[30] [30]

Ai agents in healthcare

Ken Huang. Ai agents in healthcare. InAgentic AI: Theories and Practices, pages 303–321. Springer, 2025

work page 2025

[31] [31]

Next-generation agentic ai for transforming healthcare.Informatics and Health, 2(2): 73–83, 2025

Nalan Karunanayake. Next-generation agentic ai for transforming healthcare.Informatics and Health, 2(2): 73–83, 2025

work page 2025

[32] [32]

Coordinated ai agents for advancing healthcare.Nature Biomedical Engineering, pages 1–7, 2025

Michael Moritz, Eric Topol, and Pranav Rajpurkar. Coordinated ai agents for advancing healthcare.Nature Biomedical Engineering, pages 1–7, 2025

work page 2025

[33] [33]

The rise of agentic ai teammates in medicine.The Lancet, 405(10477):457, 2025

James Zou and Eric J Topol. The rise of agentic ai teammates in medicine.The Lancet, 405(10477):457, 2025

work page 2025

[34] [34]

Preventing zero-click ai threats: Insights from echoleak, 2025

Trend Micro. Preventing zero-click ai threats: Insights from echoleak, 2025. URL https://www.trendmicro. com/en_us/research/25/g/preventing-zero-click-ai-threats-insights-from-echoleak. html. Accessed: 2025-08-16

work page 2025

[35] [35]

Ai: Advent of agents opens new possibilities for attackers, 2025

Symantec and Carbon Black. Ai: Advent of agents opens new possibilities for attackers, 2025. URL https: //www.security.com/threat-intelligence/ai-agent-attacks. Accessed: 2025-08-16

work page 2025

[36] [36]

Assessing llms for zero-shot abstractive summarization through the lens of relevance paraphrasing

Hadi Askari, Anshuman Chhabra, Muhao Chen, and Prasant Mohapatra. Assessing llms for zero-shot abstractive summarization through the lens of relevance paraphrasing. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 2187–2201, 2025

work page 2025

[37] [37]

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents.arXiv preprint arXiv:2403.02691, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[38] [38]

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

Donghyun Lee and Mo Tiwari. Prompt infection: Llm-to-llm prompt injection within multi-agent systems.arXiv preprint arXiv:2410.07283, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[39] [39]

Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models

Shahriar Kabir Nahin, Hadi Askari, Muhao Chen, and Anshuman Chhabra. Less Diverse, Less Safe: The Indirect But Pervasive Risk of Test-Time Scaling in Large Language Models.arXiv preprint arXiv:2510.08592, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[40] [40]

The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

Matteo Lupinacci, Francesco Aurelio Pironti, Francesco Blefari, Francesco Romeo, Luigi Arena, and Angelo Fur- faro. The dark side of llms agent-based attacks for complete computer takeover.arXiv preprint arXiv:2507.06850, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[41] [41]

Agentic misalignment: How llms could be insider threats, 2025

Anthropic Research Team. Agentic misalignment: How llms could be insider threats, 2025

work page 2025

[42] [42]

Yann Dubois, Balázs Galambosi, Percy Liang, and Tat- sunori B Hashimoto

Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. A practical memory injection attack against llm agents.arXiv preprint arXiv:2503.03704, 2025

work page arXiv 2025

[43] [43]

Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases.Advances in Neural Information Processing Systems, 37:130185–130213, 2024

Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases.Advances in Neural Information Processing Systems, 37:130185–130213, 2024

work page 2024

[44] [44]

Commercial llm agents are already vulnerable to simple yet dangerous attacks,

Ang Li, Yin Zhou, Vethavikashini Chithrra Raghuram, Tom Goldstein, and Micah Goldblum. Commercial llm agents are already vulnerable to simple yet dangerous attacks.arXiv preprint arXiv:2502.08586, 2025

work page arXiv 2025

[45] [45]

Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes

Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes. Imprompter: Tricking llm agents into improper tool use.arXiv preprint arXiv:2410.14923, 2024

work page arXiv 2024

[46] [46]

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, Zico Kolter, Matt Fredrikson, et al. Agentharm: A benchmark for measuring harmfulness of llm agents.arXiv preprint arXiv:2410.09024, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[47] [47]

Breaking agents: Compromising autonomous llm agents through malfunction amplification

Boyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem, Michael Backes, Savvas Zannettou, and Yang Zhang. Breaking agents: Compromising autonomous llm agents through malfunction amplification.arXiv preprint arXiv:2407.20859, 2024

work page arXiv 2024

[48] [48]

When llms autonomously attack, 2025

Carnegie Mellon University. When llms autonomously attack, 2025. https://engineering.cmu.edu/ news-events/news/2025/07/24-when-llms-autonomously-attack.html

work page 2025

[49] [49]

Generative to agentic ai: Survey, conceptualization, and challenges

Johannes Schneider. Generative to agentic ai: Survey, conceptualization, and challenges.arXiv preprint arXiv:2504.18875, 2025

work page arXiv 2025

[50] [50]

Evaluation and benchmarking of llm agents: A survey

Mahmoud Mohammadi, Yipeng Li, Jane Lo, and Wendy Yip. Evaluation and benchmarking of llm agents: A survey. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 6129–6139, 2025. 23 Agentic AI Security

work page 2025

[51] [51]

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, et al. Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems.arXiv preprint arXiv:2504.01990, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[52] [52]

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, et al. A survey of self-evolving agents: On path to artificial super intelligence.arXiv preprint arXiv:2507.21046, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[53] [53]

Agentic ai: Autonomous intelligence for complex goals–a comprehensive survey.IEEe Access, 2025

Deepak Bhaskar Acharya, Karthigeyan Kuppan, and B Divya. Agentic ai: Autonomous intelligence for complex goals–a comprehensive survey.IEEe Access, 2025

work page 2025

[54] [54]

Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems.arXiv preprint arXiv:2506.04133, 2025

Shaina Raza, Ranjan Sapkota, Manoj Karkee, and Christos Emmanouilidis. Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems.arXiv preprint arXiv:2506.04133, 2025

work page arXiv 2025

[55] [55]

Artificial intelligence risk management framework: Generative artificial intelligence profile.NIST Trustworthy and Responsible AI Gaithersburg, MD, USA, 2024

NIST AI. Artificial intelligence risk management framework: Generative artificial intelligence profile.NIST Trustworthy and Responsible AI Gaithersburg, MD, USA, 2024

work page 2024

[56] [56]

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[57] [57]

Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023. URL https://api. semanticscholar.org/CorpusID:258546941

work page 2023

[58] [58]

Ignore Previous Prompt: Attack Techniques For Language Models

Fábio Perez and Ian Ribeiro. Ignore previous prompt: Attack techniques for language models.ArXiv, abs/2211.09527, 2022. URLhttps://api.semanticscholar.org/CorpusID:253581710

work page internal anchor Pith review Pith/arXiv arXiv 2022

[59] [59]

(IBM, Invariant Labs, ETH Zurich, Google, Microsoft)

Luca Beurer-Kellner, Beat Buesser, Ana-Maria Cre¸ tu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, et al. f.arXiv preprint arXiv:2506.08837, 2025

work page arXiv 2025

[60] [60]

Owasp genai llm01: Prompt injection, 2025

OW ASP GenAI Project. Owasp genai llm01: Prompt injection, 2025

work page 2025

[61] [61]

Prompt injection 2.0: Hybrid ai threats,

Jeremy McHugh, Kristina Sekrst, and Jonathan Rodriguez Cefalu. Prompt injection 2.0: Hybrid ai threats.ArXiv, abs/2507.13169, 2025. URLhttps://api.semanticscholar.org/CorpusID:280296803

work page arXiv 2025

[62] [62]

Can Indirect Prompt Injection Attacks Be Detected and Removed?

Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, and Bryan Hooi. Can indirect prompt injection attacks be detected and removed?arXiv preprint arXiv:2502.16580, 2025

work page arXiv 2025

[63] [63]

Adaptive attacks break defenses against indirect prompt injection attacks on llm agents

Qiusi Zhan, Richard Fang, Henil Shalin Panchal, and Daniel Kang. Adaptive attacks break defenses against indirect prompt injection attacks on llm agents. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 7101–7117, 2025

work page 2025

[64] [64]

Zico Kolter, and Matt Fredrikson

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models, 2023. URL https://arxiv.org/abs/2307. 15043

work page 2023

[65] [65]

Automatic and universal prompt injection attacks against large language models,

Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, and Chaowei Xiao. Automatic and universal prompt injection attacks against large language models.arXiv preprint arXiv:2403.04957, 2024

work page arXiv 2024

[66] [66]

Baseline Defenses for Adversarial Attacks Against Aligned Language Models

Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping-yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models.arXiv preprint arXiv:2309.00614, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[67] [67]

Auto- DAN: Automatic and Interpretable Adversarial Attacks on Large Language Models,

Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, and Tong Sun. Autodan: interpretable gradient-based adversarial attacks on large language models.arXiv preprint arXiv:2310.15140, 2023

work page arXiv 2023

[68] [68]

Poisoning retrieval corpora by injecting adversarial passages

Zexuan Zhong, Ziqing Huang, Alexander Wettig, and Danqi Chen. Poisoning retrieval corpora by injecting adversarial passages.arXiv preprint arXiv:2310.19156, 2023

work page arXiv 2023

[69] [69]

Eia: Environmental injection attack on generalist web agents for privacy leakage.arXiv preprint arXiv:2409.11295, 2024

Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, and Huan Sun. Eia: Environmental injection attack on generalist web agents for privacy leakage.arXiv preprint arXiv:2409.11295, 2024

work page arXiv 2024

[70] [70]

Ad- vweb: Controllable black-box attacks on vlm-powered web agents

Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, and Bo Li. Advagent: Controllable blackbox red-teaming on web agents.arXiv preprint arXiv:2410.17401, 2024

work page arXiv 2024

[71] [71]

Adversarial at- tacks on multimodal agents

Chen Henry Wu, Rishi Shah, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, and Aditi Raghunathan. Dissecting adversarial robustness of multimodal lm agents.arXiv preprint arXiv:2406.12814, 2024. 24 Agentic AI Security

work page arXiv 2024

[72] [72]

Melon: Indirect prompt injection defense via masked re-execution and tool comparison

Kaijie Zhu, Xianjun Yang, Jindong Wang, Wenbo Guo, and William Yang Wang. Melon: Provable defense against indirect prompt injection attacks in ai agents.arXiv preprint arXiv:2502.05174, 2025

work page arXiv 2025

[73] [73]

Attacking vision- language computer agents via pop-ups

Yanzhe Zhang, Tao Yu, and Diyi Yang. Attacking vision-language computer agents via pop-ups.arXiv preprint arXiv:2411.02391, 2024

work page arXiv 2024

[74] [74]

InConference on Empirical Methods in Natural Language Processing

Sam Johnson, Viet Pham, and Thai Le. Manipulating llm web agents with indirect prompt injection attack via html accessibility tree.arXiv preprint arXiv:2507.14799, 2025

work page arXiv 2025

[75] [75]

Examining identity drift in conversations of llm agents, 2025

Junhyuk Choi, Yeseon Hong, Minju Kim, and Bugeun Kim. Examining identity drift in conversations of llm agents, 2025. URLhttps://arxiv.org/abs/2412.00804

work page arXiv 2025

[76] [76]

System prompt poisoning: Persistent attacks on large language models beyond user injection.arXiv preprint, 2025

Jiawei Guo and Haipeng Cai. System prompt poisoning: Persistent attacks on large language models beyond user injection.arXiv preprint, 2025. URL https://arxiv.org/abs/2505.06493. Demonstrates how poisoning the system prompt can persistently compromise agent behavior across sessions

work page arXiv 2025

[77] [77]

Human-imperceptible retrieval poisoning attacks in llm-powered applications

Quan Zhang, Binqi Zeng, Chijin Zhou, Gwihwan Go, Heyuan Shi, and Yu Jiang. Human-imperceptible retrieval poisoning attacks in llm-powered applications. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, pages 502–506, 2024

work page 2024

[78] [78]

and Teglia, Y

Cody Clop and Yannick Teglia. Backdoored retrievers for prompt injection attacks on retrieval augmented generation of large language models.arXiv preprint arXiv:2410.14479, 2024

work page arXiv 2024

[79] [79]

Manipulating multimodal agents via cross-modal prompt injection.arXiv preprint arXiv:2504.14348, 2025

Le Wang, Zonghao Ying, Tianyuan Zhang, Siyuan Liang, Shengshan Hu, Mingchuan Zhang, Aishan Liu, and Xianglong Liu. Manipulating multimodal agents via cross-modal prompt injection.arXiv preprint arXiv:2504.14348, 2025

work page arXiv 2025

[80] [80]

Unveiling ai agent vulnerabilities part ii: Code execution

Sean Park. Unveiling ai agent vulnerabilities part ii: Code execution. Trend Micro Research Report, 2025. URL https://www.trendmicro.com/vinfo/br/security/news/cybercrime-and-digital-threats/ unveiling-ai-agent-vulnerabilities-code-execution . Examines vulnerabilities in LLM-powered agents with code execution, document upload, and internet access

work page 2025