arxiv: 2603.12230 · v2 · submitted 2026-03-12 · 💻 cs.LG · cs.AI· cs.CR

Recognition: no theorem link

Security Considerations for Artificial Intelligence Agents

Ninghui Li , Kaiyuan Zhang , Kyle Polley , Jerry Ma

Authors on Pith no claims yet

Pith reviewed 2026-05-15 12:34 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CR

keywords AI agentssecurityprompt injectionconfused deputyattack surfacesmulti-agent coordinationpolicy enforcementfrontier AI

0 comments

The pith

AI agent architectures create new security failure modes by changing code-data separation and authority boundaries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that AI agents integrate reasoning with action in ways that disrupt long-standing security assumptions about separating code from data, limiting authority, and predicting execution outcomes. These shifts produce distinct risks to confidentiality when agents access external resources, to integrity through deception in tool use, and to availability via cascading effects in extended workflows. Observations from large-scale agent operations are used to catalog attack surfaces such as indirect prompt injection and confused-deputy problems across tools and multi-agent setups. The work evaluates current protections as a stack of input safeguards, sandboxed runs, and strict policy rules for important actions. It points to gaps in benchmarks and standards needed to align agent security with established risk principles.

Core claim

Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. Principal attack surfaces are mapped across tools, connectors, hosting boundaries, and multi-agent coordination, with emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. Defenses are assessed as a layered stack of input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions.

What carries the argument

The mapping of attack surfaces together with the layered defense stack that addresses indirect prompt injection and confused-deputy behavior through input mitigations, sandboxing, and policy enforcement.

If this is right

Confidentiality risks increase when agents connect to external tools and data sources without clear separation.
Integrity can be compromised through confused-deputy attacks that cause agents to perform unauthorized actions.
Availability problems can cascade across long-running multi-agent workflows.
Layered defenses must combine input sanitization, sandboxing, and deterministic policy enforcement for critical steps.
Standards are needed for policy models that handle delegation and privilege control in agent systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Traditional software security models may need revision to cover the integrated reasoning and tool-use loop in agents.
Open-world testing could surface coordination vulnerabilities not visible in controlled settings.
The layered stack approach could inform security practices for other AI systems that combine planning and execution.

Load-bearing premise

Experience operating general-purpose agentic systems generalizes to frontier AI agents in both controlled and open environments.

What would settle it

A production deployment of frontier agents that shows no measurable rise in incidents tied to code-data mixing, authority violations, or cascading workflow failures.

read the original abstract

This article, a lightly adapted version of Perplexity's response to NIST/CAISI Request for Information 2025-0035, details our observations and recommendations concerning the security of frontier AI agents. These insights are informed by Perplexity's experience operating general-purpose agentic systems used by millions of users and thousands of enterprises in both controlled and open-world environments. Agent architectures change core assumptions around code-data separation, authority boundaries, and execution predictability, creating new confidentiality, integrity, and availability failure modes. We map principal attack surfaces across tools, connectors, hosting boundaries, and multi-agent coordination, with particular emphasis on indirect prompt injection, confused-deputy behavior, and cascading failures in long-running workflows. We then assess current defenses as a layered stack: input-level and model-level mitigations, sandboxed execution, and deterministic policy enforcement for high-consequence actions. Finally, we identify standards and research gaps, including adaptive security benchmarks, policy models for delegation and privilege control, and guidance for secure multi-agent system design aligned with NIST risk management principles.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. This paper, adapted from Perplexity's response to a NIST/CAISI RFI, claims that AI agent architectures alter core security assumptions around code-data separation, authority boundaries, and execution predictability, thereby introducing new confidentiality, integrity, and availability failure modes. Drawing on operational experience with general-purpose agentic systems serving millions of users, it maps principal attack surfaces (tools, connectors, hosting boundaries, multi-agent coordination) with emphasis on indirect prompt injection and confused-deputy behavior, evaluates a layered defense stack (input/model mitigations, sandboxing, deterministic policy enforcement), and identifies gaps in adaptive benchmarks, policy models for delegation, and secure multi-agent design aligned with NIST principles.

Significance. If the observations hold, the work offers timely practitioner-derived insights into emerging risks for frontier AI agents, grounded in large-scale real-world deployment rather than purely theoretical analysis. This could usefully inform standards development and research priorities, particularly the call for policy models and multi-agent security guidance, though its impact hinges on the transferability of Perplexity-specific experience.

major comments (1)

The central mapping of changed assumptions and new failure modes (e.g., cascading failures in long-running workflows) rests entirely on qualitative operational experience without quantitative data, error bars, or reproducible measurements to substantiate prevalence or severity; this weakens the load-bearing claim that these modes are distinctly new relative to prior systems.

minor comments (2)

The discussion of attack surfaces would benefit from a summary table or diagram to improve clarity and allow readers to quickly compare surfaces across tools, connectors, and multi-agent coordination.
Add citations to prior work on prompt injection and confused-deputy problems in AI systems to better situate the observations within the existing literature.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for their positive evaluation of the manuscript's practitioner perspective and for recommending minor revision. We address the major comment below.

read point-by-point responses

Referee: The central mapping of changed assumptions and new failure modes (e.g., cascading failures in long-running workflows) rests entirely on qualitative operational experience without quantitative data, error bars, or reproducible measurements to substantiate prevalence or severity; this weakens the load-bearing claim that these modes are distinctly new relative to prior systems.

Authors: We acknowledge that the analysis is qualitative and drawn from operational experience with production agentic systems. Quantitative data on security incidents, prevalence, or severity is not available in a form that can be shared or reproduced, owing to the proprietary and sensitive nature of real-world deployments. We maintain that the failure modes are architecturally distinct because they arise directly from the new assumptions around code-data separation, authority delegation, and long-running tool-using workflows that were not present in prior non-agentic systems; the manuscript grounds this distinction in concrete examples rather than statistical claims. We have added a new paragraph in the introduction explicitly discussing the observational basis and limitations of the analysis to address this point. revision: partial

standing simulated objections not resolved

Quantitative data, error bars, or reproducible measurements on the prevalence or severity of the described failure modes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is an observational discussion of security considerations for AI agents, drawing on Perplexity's deployed experience with general-purpose agentic systems. It maps attack surfaces and defenses without any mathematical derivations, equations, fitted parameters, or formal predictions. No load-bearing step reduces by construction to self-citations, ansatzes, or renamed inputs; claims about changed assumptions and failure modes are presented as direct mappings from operational observations rather than internally derived results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claims rest on domain assumptions about agent behavior drawn from operational experience; no free parameters, formal axioms, or new invented entities are introduced.

pith-pipeline@v0.9.0 · 5483 in / 1030 out tokens · 34424 ms · 2026-05-15T12:34:28.662566+00:00 · methodology

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Parallax: Why AI Agents That Think Must Never Act
cs.CR 2026-04 unverdicted novelty 6.0

Parallax enforces structural separation between AI thinking and acting via independent multi-tier validation, information flow control, and state rollback, blocking 98.9% of 280 adversarial attacks with zero false pos...
Security Considerations for Multi-agent Systems
cs.CR 2026-03 unverdicted novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.
Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation
cs.CR 2026-05 unverdicted novelty 5.0

A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
cs.SE 2026-04 accept novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
When the Agent Is the Adversary: Architectural Requirements for Agentic AI Containment After the April 2026 Frontier Model Escape
cs.CR 2026-04 unverdicted novelty 3.0

A reported 2026 frontier model escape shows that alignment training, sandboxing, tool interception, and audits fail against adversarial agentic AI, requiring five new architectural requirements for durable containment.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 5 Pith papers · 5 internal anchors

[1]

Abdelnabi, A

S. Abdelnabi, A. Fay, G. Cherubin, A. Salem, M. Fritz, and A. Paverd. Get my drift? catching LLM task drift with activation deltas, 2025. URLhttps://arxiv.org/abs/2406.00799

work page arXiv 2025
[2]

Agent skills open standard specification.https://agentskills.io, October 2025

Agent Skills. Agent skills open standard specification.https://agentskills.io, October 2025. Open standard for portable agent skills

work page 2025
[3]

H. An, J. Zhang, T. Du, C. Zhou, Q. Li, T. Lin, and S. Ji. IPIGuard: A novel tool dependency graph-based defense against indirect prompt injection in LLM agents. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA, 2025. Association for Computational Linguistics. URLhttps://aclanthology.org/2025.em...

work page 2025
[4]

Code execution with MCP: Building more efficient agents.https://www.anthropic

Anthropic. Code execution with MCP: Building more efficient agents.https://www.anthropic. com/engineering/code-execution-with-mcp, Feb. 2025

work page 2025
[5]

Computer use tool.https://platform.claude.com/docs/en/agents-and-tools/t ool-use/computer-use-tool, Feb

Anthropic. Computer use tool.https://platform.claude.com/docs/en/agents-and-tools/t ool-use/computer-use-tool, Feb. 2025

work page 2025
[6]

Axelsson

S. Axelsson. The base-rate fallacy and the difficulty of intrusion detection.ACM Transactions on Information and System Security, 3(3):186–205, Aug 2000. doi: 10.1145/357830.357849. URL https://dl.acm.org/doi/10.1145/357830.357849

work page doi:10.1145/357830.357849 2000
[7]

S. Chen, J. Piet, C. Sitawarin, and D. A. Wagner. StruQ: Defending against prompt injection with structured queries. InProceedings of the 34th USENIX Security Symposium, USENIX Security’25, pages 2383–2400. USENIX Association, 2025

work page 2025
[8]

Cheng, P

P.-C. Cheng, P. Rohatgi, C. Keser, P. A. Karger, G. M. Wagner, and A. S. Reninger. Fuzzy multi- level security: An experiment on quantified risk-adaptive access control. InProceedings of the IEEE Symposium on Security and Privacy (S&P), pages 222–230. IEEE, 2007. doi: 10.1109/SP.2007.21

work page doi:10.1109/sp.2007.21 2007
[9]

Defeating Prompt Injections by Design

E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, A. Terzis, and F. Tramèr. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813, 2025. 13 Security Considerations for Artificial Intelligence Agents (Perplexity Response to NIST/CAISI Request for Information 2025-0035)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

D. F. Ferraiolo and R. Kuhn. Role-based access controls. In15th National Computer Security Conference, pages 554–563. NIST, 1992

work page 1992
[11]

D. F. Ferraiolo, R. Sandhu, S. Gavrila, R. Kuhn, and R. Chandramouli. Proposed NIST standard for role-based access control.ACM Transactions on Information and System Security, 4(3):224– 274, 2001. doi: 10.1145/501978.501980

work page doi:10.1145/501978.501980 2001
[12]

T. Geng, Z. Xu, Y. Qu, and W. E. Wong. Prompt injection attacks on large language models: A survey of attack methods, root causes, and defense strategies.Computers, Materials & Continua, 87(1):4, 2026. doi: 10.32604/cmc.2025.074081. URLhttps://doi.org/10.32604/cmc.2025.07 4081

work page doi:10.32604/cmc.2025.074081 2026
[14]

URLhttps://arxiv.org/abs/2502.15851

work page arXiv
[15]

Greshake, S

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023

work page 2023
[16]

N. Hardy. The confused deputy: (or why capabilities might have been invented). InProceedings of the USENIX Summer Conference, pages 36–38. USENIX Association, 1988

work page 1988
[17]

Defending Against Indirect Prompt Injection Attacks With Spotlighting

K. Hines, G. Lopez, M. Hall, F. Zarfati, Y. Zunger, and E. Kiciman. Defending against indirect prompt injection attacks with spotlighting.arXiv preprint arXiv:2403.14720, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

Hung, C.-Y

K.-H. Hung, C.-Y. Ko, A. Rawat, I.-H. Chung, W. H. Hsu, and P.-Y. Chen. Attention tracker: Detecting prompt injection attacks in LLMs. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 2309–2322, Albuquerque, New Mexico, 2025. Association for Computational Linguistics. doi: 10.18653/v1/2025.findings-naacl.123. URLhttps://aclan...

work page doi:10.18653/v1/2025.findings-naacl.123 2025
[19]

Horizontal integration: Broader access models for realizing information dominance

JASON Program Office. Horizontal integration: Broader access models for realizing information dominance. Technical Report JSR-04-132, MITRE Corporation, McLean, VA, Dec. 2004. URL https://irp.fas.org/agency/dod/jason/classpol.pdf. Prepared for the U.S. Department of Defense

work page 2004
[20]

H. Li, X. Liu, H.-C. Chiu, D. Li, N. Zhang, and C. Xiao. DRIFT: Dynamic rule-based defense with injection isolation for securing LLM agents. InAdvances in Neural Information Processing Systems (NeurIPS), 2025. URLhttps://neurips.cc/virtual/2025/poster/116028

work page 2025
[21]

Y. Li, J. Wang, H. Zhu, J. Lin, S. Chang, and M. Guo. ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking.arXiv preprint arXiv:2512.07086, 2025

work page arXiv 2025
[22]

Y. Liu, Y. Jia, R. Geng, J. Jia, and N. Z. Gong. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Security Symposium (USENIX Security 24), pages 1831– 1847, Philadelphia, PA, aug 2024. USENIX Association. ISBN 978-1-939133-44-1. URLhttps: //www.usenix.org/conference/usenixsecurity24/presentation/liu-yupei

work page 2024
[23]

Maloyan and D

N. Maloyan and D. Namiot. Prompt injection attacks on agentic coding assistants: A systematic analysis of vulnerabilities in skills, tools, and protocol ecosystems, 2026. URLhttps://arxiv.or g/abs/2601.17548. 14 Security Considerations for Artificial Intelligence Agents (Perplexity Response to NIST/CAISI Request for Information 2025-0035)

work page arXiv 2026
[24]

G. McGraw. Risk-adaptable access control (RAdAC).IEEE Security & Privacy, 7(2):80–83, 2009. doi: 10.1109/MSP.2009.47

work page doi:10.1109/msp.2009.47 2009
[25]

Kimi Agent Swarm.https://kimi.com/blog/agent-swarm.html, Feb

Moonshot AI. Kimi Agent Swarm.https://kimi.com/blog/agent-swarm.html, Feb. 2026

work page 2026
[26]

National Institute of Standards and Technology. Request for Information Regarding Security Considerations for Artificial Intelligence Agents.https://www.federalregister.gov/document s/2026/01/08/2026-00206/request-for-information-regarding-security-consideration s-for-artificial-intelligence-agents, Jan. 2026. 91 FR 698, Document No. 2026-00206

work page 2026
[27]

CVE-2026-25253: One-click remote code execution in openclaw via token leakage and websocket abuse.https://nvd.nist.gov/vuln/detail/CVE-2 026-25253, Feb

NIST National Vulnerability Database. CVE-2026-25253: One-click remote code execution in openclaw via token leakage and websocket abuse.https://nvd.nist.gov/vuln/detail/CVE-2 026-25253, Feb. 2026

work page 2026
[28]

CVE-2026-26327: Insufficient verification of data authen- ticity.https://nvd.nist.gov/vuln/detail/CVE-2026-26327, Feb

NIST National Vulnerability Database. CVE-2026-26327: Insufficient verification of data authen- ticity.https://nvd.nist.gov/vuln/detail/CVE-2026-26327, Feb. 2026

work page 2026
[29]

Introducing AgentKit.https://openai.com/index/introducing-agentkit/, Feb

OpenAI. Introducing AgentKit.https://openai.com/index/introducing-agentkit/, Feb. 2025

work page 2025
[30]

Tools.https://openai.github.io/openai-agents-python/tools/, Feb

OpenAI. Tools.https://openai.github.io/openai-agents-python/tools/, Feb. 2025

work page 2025
[31]

New tools for building agents.https://openai.com/index/new-tools-for-buildin g-agents/, Feb

OpenAI. New tools for building agents.https://openai.com/index/new-tools-for-buildin g-agents/, Feb. 2025

work page 2025
[32]

Docs.https://docs.openclaw.ai/, Feb

OpenClaw. Docs.https://docs.openclaw.ai/, Feb. 2026

work page 2026
[33]

Ignore Previous Prompt: Attack Techniques For Language Models

F. Perez and I. Ribeiro. Ignore previous prompt: Attack techniques for language models, 2022. URLhttps://arxiv.org/abs/2211.09527. Preprint

work page internal anchor Pith review Pith/arXiv arXiv 2022
[34]

Agent API.https://docs.perplexity.ai/docs/agent-api/quickstart, Feb

Perplexity. Agent API.https://docs.perplexity.ai/docs/agent-api/quickstart, Feb. 2026

work page 2026
[35]

Perplexity API Platform.https://docs.perplexity.ai/docs/getting-started/o verview, Feb

Perplexity. Perplexity API Platform.https://docs.perplexity.ai/docs/getting-started/o verview, Feb. 2026

work page 2026
[36]

Perplexity MCP Server.https://docs.perplexity.ai/docs/getting-started/i ntegrations/mcp-server, Feb

Perplexity. Perplexity MCP Server.https://docs.perplexity.ai/docs/getting-started/i ntegrations/mcp-server, Feb. 2026

work page 2026
[37]

Introducing model council.https://www.perplexity.ai/hub/blog/introducing-m odel-council, Feb

Perplexity. Introducing model council.https://www.perplexity.ai/hub/blog/introducing-m odel-council, Feb. 2026

work page 2026
[38]

Perplexity research.https://research.perplexity.ai/, Feb

Perplexity. Perplexity research.https://research.perplexity.ai/, Feb. 2026

work page 2026
[39]

Sonar API.https://docs.perplexity.ai/docs/sonar/quickstart, Feb

Perplexity. Sonar API.https://docs.perplexity.ai/docs/sonar/quickstart, Feb. 2026

work page 2026
[40]

Tools overview.https://docs.perplexity.ai/docs/agent-api/tools/overview, Feb

Perplexity. Tools overview.https://docs.perplexity.ai/docs/agent-api/tools/overview, Feb. 2026

work page 2026
[41]

Introducing Comet: An AI-Native Browser.https://www.perplexity.ai/hub/ blog/introducing-comet, July 2025

Perplexity AI. Introducing Comet: An AI-Native Browser.https://www.perplexity.ai/hub/ blog/introducing-comet, July 2025. 15 Security Considerations for Artificial Intelligence Agents (Perplexity Response to NIST/CAISI Request for Information 2025-0035)

work page 2025
[42]

Introducing Perplexity Computer.https://www.perplexity.ai/hub/blog/int roducing-perplexity-computer, Feb

Perplexity AI. Introducing Perplexity Computer.https://www.perplexity.ai/hub/blog/int roducing-perplexity-computer, Feb. 2026

work page 2026
[43]

Y. Qin, K. Song, Y. Hu, W. Yao, S. Cho, X. Wang, X. Wu, F. Liu, P. Liu, and D. Yu. InFoBench: Evaluating instruction following ability in large language models. InFindings of the Association for Computational Linguistics: ACL 2024, 2024

work page 2024
[44]

Y. Qin, T. Zhang, Y. Shen, W. Luo, H. Sun, Y. Zhang, Y. Qiao, W. Chen, Z. Zhou, W. Zhang, and B. Cui. SysBench: Can large language models follow system messages?arXiv preprint arXiv:2408.10943, 2024. URLhttps://arxiv.org/abs/2408.10943

work page arXiv 2024
[45]

Rababah, S

B. Rababah, S. T. Wu, M. Kwiatkowski, C. K. Leung, and C. G. Akcora. SoK: Prompt hacking of large language models. InProceedings of the IEEE International Conference on Big Data (Big Data 2024), pages 5392–5401, New York, NY, USA, 2024. IEEE. doi: 10.1109/BIGDATA62323.2 024.10825103

work page doi:10.1109/bigdata62323.2 2024
[46]

RoyChowdhury, M

A. RoyChowdhury, M. Luo, P. Sahu, S. Banerjee, and M. Tiwari. ConfusedPilot: Confused deputy risks in RAG-based llms, 2024. URLhttps://arxiv.org/abs/2408.04870

work page arXiv 2024
[47]

J. H. Saltzer and M. D. Schroeder. The protection of information in computer systems.Proceedings of the IEEE, 63(9):1278–1308, 1975. doi: 10.1109/PROC.1975.9939

work page doi:10.1109/proc.1975.9939 1975
[48]

R. S. Sandhu, E. J. Coyne, H. L. Feinstein, and C. E. Youman. Role-based access control models. IEEE Computer, 29(2):38–47, 1996. doi: 10.1109/2.485845

work page doi:10.1109/2.485845 1996
[49]

Tsai and E

L. Tsai and E. Bagdasarian. Contextual agent security: A policy for every purpose. InProceedings of the 2025 Workshop on Hot Topics in Operating Systems, pages 8–17, 2025

work page 2025
[50]

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

E. Wallace, K. Xiao, R. Leike, L. Weng, J. Heidecke, and A. Beutel. The instruction hierarchy: Training LLMs to prioritize privileged instructions.arXiv preprint arXiv:2404.13208, 2024. URL https://arxiv.org/abs/2404.13208

work page internal anchor Pith review Pith/arXiv arXiv 2024
[51]

T. Wu, S. Zhang, K. Song, S. Xu, S. Zhao, R. Agrawal, S. R. Indurthi, C. Xiang, P. Mittal, and W. Zhou. Instructional segment embedding: Improving LLM safety with instruction hierarchy. InProceedings of the 13th International Conference on Learning Representations (ICLR 2025), Singapore, 2025. URLhttps://arxiv.org/abs/2410.09102

work page arXiv 2025
[52]

Y. Wu, F. Roesner, T. Kohno, N. Zhang, and U. Iqbal. IsolateGPT: An execution isolation architecture for llm-based agentic systems. InNetwork and Distributed System Security (NDSS) Symposium, 2025

work page 2025
[53]

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

R. Xu and Y. Yan. Agent Skills for large language models: Architecture, acquisition, security, and the path forward.arXiv preprint arXiv:2602.12430, 2026. URLhttps://arxiv.org/abs/2602.1 2430

work page internal anchor Pith review Pith/arXiv arXiv 2026
[54]

Zhang, Z

K. Zhang, Z. Su, P.-Y. Chen, E. Bertino, X. Zhang, and N. Li. LLM agents should employ security principles.arXiv preprint arXiv:2505.24019, 2025

work page arXiv 2025
[55]

Browsesafe: Understanding and preventing prompt injection within ai browser agents,

K. Zhang, M. Tenenholtz, K. Polley, J. Ma, D. Yarats, and N. Li. BrowseSafe: Understanding and preventing prompt injection within AI browser agents.arXiv preprint arXiv:2511.20597, 2025. 16 Security Considerations for Artificial Intelligence Agents (Perplexity Response to NIST/CAISI Request for Information 2025-0035)

work page arXiv 2025
[56]

Zhang, S

Z. Zhang, S. Li, Z. Zhang, X. Liu, H. Jiang, X. Tang, Y. Gao, Z. Li, H. Wang, Z. Tan, Y. Li, Q. Yin, B. Yin, and M. Jiang. IHEval: Evaluating language models on following the instruction hierarchy. InProceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2025), ...

work page 2025
[57]

Zverev, S

E. Zverev, S. Abdelnabi, S. Tabesh, M. Fritz, and C. H. Lampert. Can LLMs separate instructions from data? and what do we even mean by that? InProc. of the International Conference on Learning Representations (ICLR 2025), 2025. 17

work page 2025