Recognition: no theorem link
Security Considerations for Artificial Intelligence Agents
Pith reviewed 2026-05-15 12:34 UTC · model grok-4.3
The pith
AI agent architectures create new security failure modes by changing code-data separation and authority boundaries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. Principal attack surfaces are mapped across tools, connectors, hosting boundaries, and multi-agent coordination, with emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. Defenses are assessed as a layered stack of input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions.
What carries the argument
The mapping of attack surfaces together with the layered defense stack that addresses indirect prompt injection and confused-deputy behavior through input mitigations, sandboxing, and policy enforcement.
If this is right
- Confidentiality risks increase when agents connect to external tools and data sources without clear separation.
- Integrity can be compromised through confused-deputy attacks that cause agents to perform unauthorized actions.
- Availability problems can cascade across long-running multi-agent workflows.
- Layered defenses must combine input sanitization, sandboxing, and deterministic policy enforcement for critical steps.
- Standards are needed for policy models that handle delegation and privilege control in agent systems.
Where Pith is reading between the lines
- Traditional software security models may need revision to cover the integrated reasoning and tool-use loop in agents.
- Open-world testing could surface coordination vulnerabilities not visible in controlled settings.
- The layered stack approach could inform security practices for other AI systems that combine planning and execution.
Load-bearing premise
Experience operating general-purpose agentic systems generalizes to frontier AI agents in both controlled and open environments.
What would settle it
A production deployment of frontier agents that shows no measurable rise in incidents tied to code-data mixing, authority violations, or cascading workflow failures.
read the original abstract
This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic systems used by millions of users and thousands of enterprises in both controlled and open-world environments. Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. We map principal attack surfaces across tools, connectors, hosting boundaries, and multi-agent coordination, with particular emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. We then assess current defenses as a layered stack: input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions. Finally, we identify standards and research gaps, including adaptive security benchmarks, policy models for delegation and privilege control, and guidance for secure multi-agent system design aligned with NIST risk management principles.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper, adapted from Perplexity's response to a NIST/CAISI RFI, claims that AI agent architectures alter core security assumptions around code-data separation, authority boundaries, and execution predictability, thereby introducing new confidentiality, integrity, and availability failure modes. Drawing on operational experience with general-purpose agentic systems serving millions of users, it maps principal attack surfaces (tools, connectors, hosting boundaries, multi-agent coordination) with emphasis on indirect prompt injection and confused-deputy behavior, evaluates a layered defense stack (input/model mitigations, sandboxing, deterministic policy enforcement), and identifies gaps in adaptive benchmarks, policy models for delegation, and secure multi-agent design aligned with NIST principles.
Significance. If the observations hold, the work offers timely practitioner-derived insights into emerging risks for frontier AI agents, grounded in large-scale real-world deployment rather than purely theoretical analysis. This could usefully inform standards development and research priorities, particularly the call for policy models and multi-agent security guidance, though its impact hinges on the transferability of Perplexity-specific experience.
major comments (1)
- The central mapping of changed assumptions and new failure modes (e.g., cascading failures in long-running workflows) rests entirely on qualitative operational experience without quantitative data, error bars, or reproducible measurements to substantiate prevalence or severity; this weakens the load-bearing claim that these modes are distinctly new relative to prior systems.
minor comments (2)
- The discussion of attack surfaces would benefit from a summary table or diagram to improve clarity and allow readers to quickly compare surfaces across tools, connectors, and multi-agent coordination.
- Add citations to prior work on prompt injection and confused-deputy problems in AI systems to better situate the observations within the existing literature.
Simulated Author's Rebuttal
We thank the referee for their positive evaluation of the manuscript's practitioner perspective and for recommending minor revision. We address the major comment below.
read point-by-point responses
-
Referee: The central mapping of changed assumptions and new failure modes (e.g., cascading failures in long-running workflows) rests entirely on qualitative operational experience without quantitative data, error bars, or reproducible measurements to substantiate prevalence or severity; this weakens the load-bearing claim that these modes are distinctly new relative to prior systems.
Authors: We acknowledge that the analysis is qualitative and drawn from operational experience with production agentic systems. Quantitative data on security incidents, prevalence, or severity is not available in a form that can be shared or reproduced, owing to the proprietary and sensitive nature of real-world deployments. We maintain that the failure modes are architecturally distinct because they arise directly from the new assumptions around code-data separation, authority delegation, and long-running tool-using workflows that were not present in prior non-agentic systems; the manuscript grounds this distinction in concrete examples rather than statistical claims. We have added a new paragraph in the introduction explicitly discussing the observational basis and limitations of the analysis to address this point. revision: partial
- Quantitative data, error bars, or reproducible measurements on the prevalence or severity of the described failure modes
Circularity Check
No significant circularity detected
full rationale
The paper is an observational discussion of security considerations for AI agents, drawing on Perplexity's deployed experience with general-purpose agentic systems. It maps attack surfaces and defenses without any mathematical derivations, equations, fitted parameters, or formal predictions. No load-bearing step reduces by construction to self-citations, ansatzes, or renamed inputs; claims about changed assumptions and failure modes are presented as direct mappings from operational observations rather than internally derived results.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 5 Pith papers
-
Parallax: Why AI Agents That Think Must Never Act
Parallax enforces structural separation between AI thinking and acting via independent multi-tier validation, information flow control, and state rollback, blocking 98.9% of 280 adversarial attacks with zero false pos...
-
Security Considerations for Multi-agent Systems
No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
-
Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation
A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.
-
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
-
When the Agent Is the Adversary: Architectural Requirements for Agentic AI Containment After the April 2026 Frontier Model Escape
A reported 2026 frontier model escape shows that alignment training, sandboxing, tool interception, and audits fail against adversarial agentic AI, requiring five new architectural requirements for durable containment.
Reference graph
Works this paper leans on
-
[1]
S. Abdelnabi, A. Fay, G. Cherubin, A. Salem, M. Fritz, and A. Paverd. Get my drift? catching LLM task drift with activation deltas, 2025. URLhttps://arxiv.org/abs/2406.00799
-
[2]
Agent skills open standard specification.https://agentskills.io, October 2025
Agent Skills. Agent skills open standard specification.https://agentskills.io, October 2025. Open standard for portable agent skills
work page 2025
-
[3]
H. An, J. Zhang, T. Du, C. Zhou, Q. Li, T. Lin, and S. Ji. IPIGuard: A novel tool dependency graph-based defense against indirect prompt injection in LLM agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA, 2025. Association for Computational Linguistics. URLhttps://aclanthology.org/2025.em...
work page 2025
-
[4]
Code execution with MCP: Building more efficient agents.https://www.anthropic
Anthropic. Code execution with MCP: Building more efficient agents.https://www.anthropic. com/engineering/code-execution-with-mcp, Feb. 2025
work page 2025
-
[5]
Anthropic. Computer use tool.https://platform.claude.com/docs/en/agents-and-tools/t ool-use/computer-use-tool, Feb. 2025
work page 2025
-
[6]
S. Axelsson. The base-rate fallacy and the difficulty of intrusion detection.ACM Transactions on Information and System Security, 3(3):186–205, Aug 2000. doi: 10.1145/357830.357849. URL https://dl.acm.org/doi/10.1145/357830.357849
-
[7]
S. Chen, J. Piet, C. Sitawarin, and D. A. Wagner. StruQ: Defending against prompt injection with structured queries. InProceedings of the 34th USENIX Security Symposium, USENIX Security’25, pages 2383–2400. USENIX Association, 2025
work page 2025
-
[8]
P.-C. Cheng, P. Rohatgi, C. Keser, P. A. Karger, G. M. Wagner, and A. S. Reninger. Fuzzy multi- level security: An experiment on quantified risk-adaptive access control. InProceedings of the IEEE Symposium on Security and Privacy (S&P), pages 222–230. IEEE, 2007. doi: 10.1109/SP.2007.21
-
[9]
Defeating Prompt Injections by Design
E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, A. Terzis, and F. Tramèr. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025. 13 Security Considerations for Artificial Intelligence Agents (Perplexity Response to NIST/CAISI Request for Information 2025-0035)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
D. F. Ferraiolo and R. Kuhn. Role-based access controls. In15th National Computer Security Conference, pages 554–563. NIST, 1992
work page 1992
-
[11]
D. F. Ferraiolo, R. Sandhu, S. Gavrila, R. Kuhn, and R. Chandramouli. Proposed NIST standard for role-based access control.ACM Transactions on Information and System Security, 4(3):224– 274, 2001. doi: 10.1145/501978.501980
-
[12]
T. Geng, Z. Xu, Y. Qu, and W. E. Wong. Prompt injection attacks on large language models: A survey of attack methods, root causes, and defense strategies.Computers, Materials & Continua, 87(1):4, 2026. doi: 10.32604/cmc.2025.074081. URLhttps://doi.org/10.32604/cmc.2025.07 4081
- [14]
-
[15]
K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023
work page 2023
-
[16]
N. Hardy. The confused deputy: (or why capabilities might have been invented). InProceedings of the USENIX Summer Conference, pages 36–38. USENIX Association, 1988
work page 1988
-
[17]
Defending Against Indirect Prompt Injection Attacks With Spotlighting
K. Hines, G. Lopez, M. Hall, F. Zarfati, Y. Zunger, and E. Kiciman. Defending against indirect prompt injection attacks with spotlighting.arXiv preprint arXiv:2403.14720, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[18]
K.-H. Hung, C.-Y. Ko, A. Rawat, I.-H. Chung, W. H. Hsu, and P.-Y. Chen. Attention tracker: Detecting prompt injection attacks in LLMs. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 2309–2322, Albuquerque, New Mexico, 2025. Association for Computational Linguistics. doi: 10.18653/v1/2025.findings-naacl.123. URLhttps://aclan...
-
[19]
Horizontal integration: Broader access models for realizing information dominance
JASON Program Office. Horizontal integration: Broader access models for realizing information dominance. Technical Report JSR-04-132, MITRE Corporation, McLean, VA, Dec. 2004. URL https://irp.fas.org/agency/dod/jason/classpol.pdf. Prepared for the U.S. Department of Defense
work page 2004
-
[20]
H. Li, X. Liu, H.-C. Chiu, D. Li, N. Zhang, and C. Xiao. DRIFT: Dynamic rule-based defense with injection isolation for securing LLM agents. InAdvances in Neural Information Processing Systems (NeurIPS), 2025. URLhttps://neurips.cc/virtual/2025/poster/116028
work page 2025
- [21]
-
[22]
Y. Liu, Y. Jia, R. Geng, J. Jia, and N. Z. Gong. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Security Symposium (USENIX Security 24), pages 1831– 1847, Philadelphia, PA, aug 2024. USENIX Association. ISBN 978-1-939133-44-1. URLhttps: //www.usenix.org/conference/usenixsecurity24/presentation/liu-yupei
work page 2024
-
[23]
N. Maloyan and D. Namiot. Prompt injection attacks on agentic coding assistants: A systematic analysis of vulnerabilities in skills, tools, and protocol ecosystems, 2026. URLhttps://arxiv.or g/abs/2601.17548. 14 Security Considerations for Artificial Intelligence Agents (Perplexity Response to NIST/CAISI Request for Information 2025-0035)
-
[24]
G. McGraw. Risk-adaptable access control (RAdAC).IEEE Security & Privacy, 7(2):80–83, 2009. doi: 10.1109/MSP.2009.47
-
[25]
Kimi Agent Swarm.https://kimi.com/blog/agent-swarm.html, Feb
Moonshot AI. Kimi Agent Swarm.https://kimi.com/blog/agent-swarm.html, Feb. 2026
work page 2026
-
[26]
National Institute of Standards and Technology. Request for Information Regarding Security Considerations for Artificial Intelligence Agents.https://www.federalregister.gov/document s/2026/01/08/2026-00206/request-for-information-regarding-security-consideration s-for-artificial-intelligence-agents, Jan. 2026. 91 FR 698, Document No. 2026-00206
work page 2026
-
[27]
NIST National Vulnerability Database. CVE-2026-25253: One-click remote code execution in openclaw via token leakage and websocket abuse.https://nvd.nist.gov/vuln/detail/CVE-2 026-25253, Feb. 2026
work page 2026
-
[28]
NIST National Vulnerability Database. CVE-2026-26327: Insufficient verification of data authen- ticity.https://nvd.nist.gov/vuln/detail/CVE-2026-26327, Feb. 2026
work page 2026
-
[29]
Introducing AgentKit.https://openai.com/index/introducing-agentkit/, Feb
OpenAI. Introducing AgentKit.https://openai.com/index/introducing-agentkit/, Feb. 2025
work page 2025
-
[30]
Tools.https://openai.github.io/openai-agents-python/tools/, Feb
OpenAI. Tools.https://openai.github.io/openai-agents-python/tools/, Feb. 2025
work page 2025
-
[31]
New tools for building agents.https://openai.com/index/new-tools-for-buildin g-agents/, Feb
OpenAI. New tools for building agents.https://openai.com/index/new-tools-for-buildin g-agents/, Feb. 2025
work page 2025
-
[32]
Docs.https://docs.openclaw.ai/, Feb
OpenClaw. Docs.https://docs.openclaw.ai/, Feb. 2026
work page 2026
-
[33]
Ignore Previous Prompt: Attack Techniques For Language Models
F. Perez and I. Ribeiro. Ignore previous prompt: Attack techniques for language models, 2022. URLhttps://arxiv.org/abs/2211.09527. Preprint
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[34]
Agent API.https://docs.perplexity.ai/docs/agent-api/quickstart, Feb
Perplexity. Agent API.https://docs.perplexity.ai/docs/agent-api/quickstart, Feb. 2026
work page 2026
-
[35]
Perplexity API Platform.https://docs.perplexity.ai/docs/getting-started/o verview, Feb
Perplexity. Perplexity API Platform.https://docs.perplexity.ai/docs/getting-started/o verview, Feb. 2026
work page 2026
-
[36]
Perplexity MCP Server.https://docs.perplexity.ai/docs/getting-started/i ntegrations/mcp-server, Feb
Perplexity. Perplexity MCP Server.https://docs.perplexity.ai/docs/getting-started/i ntegrations/mcp-server, Feb. 2026
work page 2026
-
[37]
Introducing model council.https://www.perplexity.ai/hub/blog/introducing-m odel-council, Feb
Perplexity. Introducing model council.https://www.perplexity.ai/hub/blog/introducing-m odel-council, Feb. 2026
work page 2026
-
[38]
Perplexity research.https://research.perplexity.ai/, Feb
Perplexity. Perplexity research.https://research.perplexity.ai/, Feb. 2026
work page 2026
-
[39]
Sonar API.https://docs.perplexity.ai/docs/sonar/quickstart, Feb
Perplexity. Sonar API.https://docs.perplexity.ai/docs/sonar/quickstart, Feb. 2026
work page 2026
-
[40]
Tools overview.https://docs.perplexity.ai/docs/agent-api/tools/overview, Feb
Perplexity. Tools overview.https://docs.perplexity.ai/docs/agent-api/tools/overview, Feb. 2026
work page 2026
-
[41]
Perplexity AI. Introducing Comet: An AI-Native Browser.https://www.perplexity.ai/hub/ blog/introducing-comet, July 2025. 15 Security Considerations for Artificial Intelligence Agents (Perplexity Response to NIST/CAISI Request for Information 2025-0035)
work page 2025
-
[42]
Perplexity AI. Introducing Perplexity Computer.https://www.perplexity.ai/hub/blog/int roducing-perplexity-computer, Feb. 2026
work page 2026
-
[43]
Y. Qin, K. Song, Y. Hu, W. Yao, S. Cho, X. Wang, X. Wu, F. Liu, P. Liu, and D. Yu. InFoBench: Evaluating instruction following ability in large language models. InFindings of the Association for Computational Linguistics: ACL 2024, 2024
work page 2024
- [44]
-
[45]
B. Rababah, S. T. Wu, M. Kwiatkowski, C. K. Leung, and C. G. Akcora. SoK: Prompt hacking of large language models. InProceedings of the IEEE International Conference on Big Data (Big Data 2024), pages 5392–5401, New York, NY, USA, 2024. IEEE. doi: 10.1109/BIGDATA62323.2 024.10825103
-
[46]
A. RoyChowdhury, M. Luo, P. Sahu, S. Banerjee, and M. Tiwari. ConfusedPilot: Confused deputy risks in RAG-based llms, 2024. URLhttps://arxiv.org/abs/2408.04870
-
[47]
J. H. Saltzer and M. D. Schroeder. The protection of information in computer systems.Proceedings of the IEEE, 63(9):1278–1308, 1975. doi: 10.1109/PROC.1975.9939
-
[48]
R. S. Sandhu, E. J. Coyne, H. L. Feinstein, and C. E. Youman. Role-based access control models. IEEE Computer, 29(2):38–47, 1996. doi: 10.1109/2.485845
-
[49]
L. Tsai and E. Bagdasarian. Contextual agent security: A policy for every purpose. InProceedings of the 2025 Workshop on Hot Topics in Operating Systems, pages 8–17, 2025
work page 2025
-
[50]
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel. The instruction hierarchy: Training LLMs to prioritize privileged instructions.arXiv preprint arXiv:2404.13208, 2024. URL https://arxiv.org/abs/2404.13208
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[51]
T. Wu, S. Zhang, K. Song, S. Xu, S. Zhao, R. Agrawal, S. R. Indurthi, C. Xiang, P. Mittal, and W. Zhou. Instructional segment embedding: Improving LLM safety with instruction hierarchy. InProceedings of the 13th International Conference on Learning Representations (ICLR 2025), Singapore, 2025. URLhttps://arxiv.org/abs/2410.09102
-
[52]
Y. Wu, F. Roesner, T. Kohno, N. Zhang, and U. Iqbal. IsolateGPT: An execution isolation architecture for llm-based agentic systems. InNetwork and Distributed System Security (NDSS) Symposium, 2025
work page 2025
-
[53]
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward
R. Xu and Y. Yan. Agent Skills for large language models: Architecture, acquisition, security, and the path forward.arXiv preprint arXiv:2602.12430, 2026. URLhttps://arxiv.org/abs/2602.1 2430
work page internal anchor Pith review Pith/arXiv arXiv 2026
- [54]
-
[55]
Browsesafe: Understanding and preventing prompt injection within ai browser agents,
K. Zhang, M. Tenenholtz, K. Polley, J. Ma, D. Yarats, and N. Li. BrowseSafe: Understanding and preventing prompt injection within AI browser agents.arXiv preprint arXiv:2511.20597, 2025. 16 Security Considerations for Artificial Intelligence Agents (Perplexity Response to NIST/CAISI Request for Information 2025-0035)
-
[56]
Z. Zhang, S. Li, Z. Zhang, X. Liu, H. Jiang, X. Tang, Y. Gao, Z. Li, H. Wang, Z. Tan, Y. Li, Q. Yin, B. Yin, and M. Jiang. IHEval: Evaluating language models on following the instruction hierarchy. InProceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2025), ...
work page 2025
- [57]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.