Security, Privacy, and Ethical Risks in OpenClaw

Jianbing Ni; Yutong Jin; Zelin Zhang; Zhijin Lyu

arxiv: 2605.23330 · v1 · pith:QKOY7RXMnew · submitted 2026-05-22 · 💻 cs.CR

Security, Privacy, and Ethical Risks in OpenClaw

Yutong Jin , Zelin Zhang , Zhijin Lyu , Jianbing Ni This is my paper

Pith reviewed 2026-05-25 04:24 UTC · model grok-4.3

classification 💻 cs.CR

keywords security risksprivacy risksethical risksAI agentOpenClawtraceabilitypersistent storagetool invocation

0 comments

The pith

OpenClaw's architecture introduces security, privacy, and ethical risks that form major barriers to its trustworthy deployment and adoption.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper investigates the risks in OpenClaw, a locally executable AI agent for natural language tasks and real-world actions. It analyzes the system's architecture, functionalities, and scenarios to highlight issues with persistent local storage, tool invocation, cross-context aggregation, multi-user interactions, and plugin integration. The authors argue these create significant concerns for security, privacy, ethics, and traceability. A sympathetic reader would care because OpenClaw aims at personal assistance and automation but could undermine trust in digital environments if these risks go unaddressed.

Core claim

The central claim is that OpenClaw's highly privileged agent, when integrated into personal and organizational environments, raises risks from its persistent local storage, tool invocation capabilities, cross-context information aggregation, multi-user interaction features, and integration of plugins and external services, and that these issues constitute major barriers to the trustworthy deployment and widespread adoption of this technology.

What carries the argument

The system architecture and representative application scenarios of OpenClaw, which enable persistent local storage, tool invocation, cross-context aggregation, multi-user interaction, and plugin integration.

If this is right

Persistent local storage may expose sensitive user data to unauthorized access.
Tool invocation could enable unintended or malicious actions on connected systems.
Cross-context information aggregation risks linking private data across domains.
Multi-user interactions may lead to access conflicts or unauthorized sharing.
Plugin and external service integration opens additional attack surfaces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Comparable local AI agent systems would likely require similar risk analyses to ensure safe use.
Standardized traceability requirements could emerge as a practical response to the identified gaps.
Developers of agent platforms might prioritize built-in controls for storage and invocation as a direct follow-on.

Load-bearing premise

That the listed risks are inherent to the system and remain unmitigated, derived solely from architectural analysis without evidence of implemented defenses or assessments of their actual likelihood and impact.

What would settle it

Empirical testing of OpenClaw in real deployments that shows effective mitigation of the risks through existing or added security measures, or quantitative risk assessments indicating negligible impact.

Figures

Figures reproduced from arXiv: 2605.23330 by Jianbing Ni, Yutong Jin, Zelin Zhang, Zhijin Lyu.

**Figure 2.** Figure 2: System architecture and main functions of OpenClaw. Redrawn based on OpenClaw’s official architecture [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Representative application scenarios of OpenClaw: communication, retrieval, and technical assistance [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Privacy risks in OpenClaw task execution: unintended expansion of data access across user-connected [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Reliability and Traceability of the OpenClaw Execution Chain. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

read the original abstract

This paper systematically investigates the security, privacy, and ethical risks, as well as the traceability challenges of OpenClaw, a locally executable AI agent system for natural language interaction and real-world task completion. While OpenClaw shows strong potential for personal assistance, office automation, cross-platform task management, and information integration, it also raises serious security, privacy, and ethical concerns. By analyzing its system architecture, core functionalities, deployment model, and representative application scenarios, this paper aims to reveal the risks that may arise when such a highly privileged agent is integrated into personal and organizational digital environments. We focus in particular on the challenges associated with persistent local storage, tool invocation, cross-context information aggregation, multi-user interaction, and the integration of plugins and external services. We argue that these issues constitute major barriers to the trustworthy deployment and widespread adoption of this technology. Finally, we summarize the open challenges in security defenses, privacy protection, ethical governance, and traceability in agent use, and call for joint efforts from researchers, developers, deployers, and regulators to build AI agent systems that are safer, more reliable, and more trustworthy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OpenClaw risk paper applies known concerns to one system but overstates them as major barriers without mitigation checks.

read the letter

The main point from this paper is that OpenClaw's local AI agent setup carries security, privacy, and ethical risks that the authors see as major barriers to adoption. They reach this by reviewing the architecture and some application scenarios. The paper does a solid job breaking down the specific features that create exposure. Persistent local storage, the ability to invoke tools, pulling information across different contexts, handling multiple users, and adding plugins all get attention. The scenarios for personal assistance and office automation help make the concerns concrete rather than abstract. This kind of targeted analysis can be useful when thinking about how to secure similar systems. Where it falls short is in supporting the strong conclusion. The risks are listed based on how the system works, but there is no examination of whether common security techniques could reduce them to acceptable levels. Sandboxing for local execution, audit logs for tool use, or user approval steps for data aggregation are not discussed. Without that, or any numbers on likelihood and impact, the idea that these are major barriers stays more like an opinion than a demonstrated result. The work stays within standard risk categories from the agent security literature and does not propose new ways to think about the problems or provide any measurements. The final call for more research on defenses and governance is reasonable but expected. This kind of paper fits best for specialists in AI security and privacy who are already following local agent developments. A reader looking for new methods or hard data will not find much here, but the feature-specific risk list might serve as a starting point for their own assessments. I would send it out for peer review. The area is active, and getting comments on the risk scenarios could improve the paper and help the field, even though the current evidence for the main claim is limited.

Referee Report

1 major / 1 minor

Summary. The manuscript systematically investigates security, privacy, and ethical risks, as well as traceability challenges, in OpenClaw, a locally executable AI agent system. Through analysis of its system architecture, core functionalities, deployment model, and representative application scenarios, it identifies risks associated with persistent local storage, tool invocation, cross-context information aggregation, multi-user interaction, and plugin integration with external services. The paper argues that these issues constitute major barriers to trustworthy deployment and widespread adoption, summarizes open challenges in defenses, protection, governance, and traceability, and calls for joint efforts by researchers, developers, deployers, and regulators.

Significance. If the qualitative analysis holds and the enumerated risks are shown to be both inherent and resistant to standard controls, the work would usefully contribute to the literature on AI agent security by providing a structured enumeration of concerns specific to locally executable, highly privileged agents. It could inform design choices and regulatory discussions around personal assistance and office automation tools. The paper's strength lies in its focus on a concrete system and its explicit call for multi-stakeholder collaboration, though its impact is constrained by the lack of quantitative severity assessment or mitigation evaluation.

major comments (1)

[Abstract] Abstract and § on risk analysis (architecture and scenarios): the central claim that the five listed features 'constitute major barriers to the trustworthy deployment and widespread adoption' is load-bearing but unsupported. The text enumerates potential risks from persistent local storage, tool invocation, cross-context aggregation, multi-user interaction, and plugin integration, yet provides no demonstration that standard mechanisms (sandboxing, capability-based access control, audit logging, or consent flows) would be inadequate, nor any attack traces, probability/impact estimates, or residual-risk evaluation after mitigations. This leaves the 'major barriers' conclusion dependent on the untested premise that the risks are both inherent and effectively unmitigable.

minor comments (1)

[Abstract] The abstract and conclusion sections could more explicitly distinguish between risks that are unique to OpenClaw versus those common to other agent frameworks, to strengthen the novelty of the contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive critique, which highlights an important gap in supporting our central claim. We address the comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract and § on risk analysis (architecture and scenarios): the central claim that the five listed features 'constitute major barriers to the trustworthy deployment and widespread adoption' is load-bearing but unsupported. The text enumerates potential risks from persistent local storage, tool invocation, cross-context aggregation, multi-user interaction, and plugin integration, yet provides no demonstration that standard mechanisms (sandboxing, capability-based access control, audit logging, or consent flows) would be inadequate, nor any attack traces, probability/impact estimates, or residual-risk evaluation after mitigations. This leaves the 'major barriers' conclusion dependent on the untested premise that the risks are both inherent and effectively unmitigable.

Authors: We agree that the manuscript's qualitative enumeration of risks does not include quantitative severity assessments, explicit attack traces, or systematic evaluation of residual risk after applying standard controls such as sandboxing or capability-based access. The paper's analysis is grounded in the architectural requirements of OpenClaw (local execution with persistent storage and broad tool privileges needed for its intended functionality), which we argue create tensions that standard mechanisms cannot fully resolve without reducing utility. However, we acknowledge this reasoning is presented at a high level without detailed mitigation analysis. We will revise the abstract and the risk-analysis sections to (1) explicitly state the qualitative basis of the 'major barriers' claim, (2) add a short discussion of why certain standard controls are likely to be insufficient given the agent's design goals, and (3) temper the language to reflect the absence of quantitative evidence while preserving the call for further multi-stakeholder work. revision: yes

Circularity Check

0 steps flagged

No circularity: qualitative risk enumeration from architecture is self-contained

full rationale

The paper performs a direct architectural analysis to enumerate risks (persistent storage, tool invocation, cross-context aggregation, etc.) and concludes they form major barriers. No equations, fitted parameters, predictions, or self-citations appear in the provided text. The central claim follows from describing the system features without any reduction of outputs to inputs by construction, self-definition, or load-bearing self-reference. This is a standard qualitative review with no derivation chain that collapses into its own premises.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a qualitative risk assessment relying on standard assumptions about system behavior in AI agents; no free parameters, mathematical axioms, or new entities are introduced.

pith-pipeline@v0.9.0 · 5731 in / 1053 out tokens · 29608 ms · 2026-05-25T04:24:04.620472+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 7 internal anchors

[1]

R. Shu, N. Das, M. Yuan, M. Sunkara, Y. Zhang, Towards effective genai multi-agent collaboration: design and evaluation for enterprise applications, arXiv preprint arXiv:2412.05449 (2024)

work page arXiv 2024
[2]

Schick, J

T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, T. Scialom, Toolformer: Language models can teach themselves to use tools, Advances in neural information processing systems 36 (2023) 68539–68551

work page 2023
[3]

Chase, Langchain,https://github.com/langchain-ai/langchain, software, accessed 2026-03-24 (2022)

H. Chase, Langchain,https://github.com/langchain-ai/langchain, software, accessed 2026-03-24 (2022)

work page 2026
[4]

P.Steinberger,theOpenClawCommunity,Openclaw,https://github.com/openclaw/openclaw,software,accessed2026-03-24(2026)

work page 2026
[5]

H. Su, J. Luo, C. Liu, X. Yang, Y. Zhang, Y. Dong, J. Zhu, A survey on autonomy-induced security risks in large model-based agents, arXiv preprint arXiv:2506.23844 (2025). First Author et al.:Preprint submitted to ElsevierPage 19 of 21 Security, Privacy, and Ethical Risks in OpenClaw

work page arXiv 2025
[6]

Z.Deng,Y.Guo,C.Han,W.Ma,J.Xiong,S.Wen,Y.Xiang,Aiagentsunderthreat:Asurveyofkeysecuritychallengesandfuturepathways, ACM Computing Surveys 57 (7) (2025) 1–36

work page 2025
[7]

V. S. Narajala, O. Narayan, Securing agentic ai: A comprehensive threat model and mitigation framework for generative ai agents, arXiv preprint arXiv:2504.19956 (2025)

work page arXiv 2025
[8]

The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

M. Lupinacci, F. A. Pironti, F. Blefari, F. Romeo, L. Arena, A. Furfaro, The dark side of llms: Agent-based attacks for complete computer takeover, arXiv preprint arXiv:2507.06850 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

ReAct: Synergizing Reasoning and Acting in Language Models

S.Yao,J.Zhao,D.Yu,N.Du,I.Shafran,K.Narasimhan,Y.Cao,React:Synergizingreasoningandactinginlanguagemodels,in:International Conference on Learning Representations (ICLR), 2023. URLhttps://arxiv.org/abs/2210.03629

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

Generative Agents: Interactive Simulacra of Human Behavior

J.S.Park,J.C.O’Brien,C.J.Cai,M.R.Morris,P.Liang,M.S.Bernstein,Generativeagents:Interactivesimulacraofhumanbehavior,arXiv preprint arXiv:2304.03442 (2023). URLhttps://arxiv.org/abs/2304.03442

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

L.Weng,Llmpoweredautonomousagents,https://lilianweng.github.io/posts/2023-06-23-agent/,lil’Log,accessed2026-03- 24 (Jun. 2023)

work page 2023
[12]

K.Greshake,S.Abdelnabi,S.Mishra,C.Endres,T.Holz,M.Fritz,Notwhatyou’vesignedupfor:Compromisingreal-worldLLM-integrated applicationswithindirectpromptinjection,in:Proceedingsofthe16thACMWorkshoponArtificialIntelligenceandSecurity(AISec@CCS), 2023

work page 2023
[13]

Defeating Prompt Injections by Design

E. Debenedetti, et al., CaMeL: Capability-based security for LLM agents, arXiv preprint arXiv:2503.18813Google DeepMind/ETH Zurich (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Q. Zhan, Z. Liang, Z. Ying, D. Kang, InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents, in: Findings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 10471–10506

work page 2024
[15]

Debenedetti, J

E. Debenedetti, J. Zhang, M. Balunović, L. Beurer-Kellner, M. Fischer, F. Tramèr, AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents, in: NeurIPS 2024, Datasets and Benchmarks Track, 2024

work page 2024
[16]

M.Nasr,N.Carlini,etal.,Theattackermovessecond,arXivpreprintarXiv:2510.09023JointOpenAI/Anthropic/GoogleDeepMindevaluation (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

T. Chen, D. Liu, X. Hu, J. Yu, W. Wang, A trajectory-based safety audit of clawdbot (openclaw), arXiv preprint arXiv:2602.14364 (2026)

work page arXiv 2026
[18]

X. Deng, Y. Zhang, J. Wu, J. Bai, S. Yi, Z. Zou, Y. Xiao, R. Qiu, J. Ma, J. Chen, et al., Taming openclaw: Security analysis and mitigation of autonomous llm agent threats, arXiv preprint arXiv:2603.11619 (2026)

work page arXiv 2026
[19]

Z. Ying, X. Yang, S. Wu, Y. Song, Y. Qu, H. Li, T. Li, J. Wang, A. Liu, X. Liu, Uncovering security threats and architecting defenses in autonomous agents: A case study of openclaw, arXiv preprint arXiv:2603.12644 (2026)

work page arXiv 2026
[20]

Wang, MCPTox: A systematic benchmark for MCP server security, arXiv preprint arXiv:2508.14925 (2025)

o. Wang, MCPTox: A systematic benchmark for MCP server security, arXiv preprint arXiv:2508.14925 (2025)

work page arXiv 2025
[21]

X. Gu, X. Zheng, T. Pang, C. Du, Q. Liu, Y. Wang, J. Jiang, M. Lin, Agent smith: A single image can jailbreak one million multimodal LLM agents exponentially fast, in: International Conference on Machine Learning (ICML), 2024

work page 2024
[22]

Cohen, R

S. Cohen, R. Bitton, B. Nassi, RAGworm: Self-replicating AI worms that spread through interconnected GenAI applications, in: ACM Conference on Computer and Communications Security (CCS), 2025, arXiv:2403.02817, 2024

work page arXiv 2025
[23]

Peigné-Lefebvre, et al., The multi-agent security tax, in: AAAI Conference on Artificial Intelligence, 2025, arXiv:2502.19145

T. Peigné-Lefebvre, et al., The multi-agent security tax, in: AAAI Conference on Artificial Intelligence, 2025, arXiv:2502.19145

work page arXiv 2025
[24]

Cheng, o

o. Cheng, o. Tsao, Privilege separation for OpenClaw, arXiv preprint arXiv:2603.13424 (2026)

work page arXiv 2026
[25]

OpenClaw, Openclaw trust center,https://trust.openclaw.ai/, accessed: 2026-03-19 (2026)

work page 2026
[26]

OpenClaw Documentation, Gateway security,https://docs.openclaw.ai/gateway/security, accessed: 2026-03-19 (2026)

work page 2026
[27]

Microsoft Security, Running openclaw safely: Identity, isolation, and runtime risk,https://www.microsoft.com/en-us/security/ blog/2026/02/19/running-openclaw-safely-identity-isolation-runtime-risk/, accessed: 2026-03-19 (2026)

work page 2026
[28]

OpenClaw Documentation, Memory - openclaw,https://docs.openclaw.ai/concepts/memory, accessed: 2026-03-19 (2026)

work page 2026
[29]

OpenClawDocumentation,Pluginarchitecture-openclaw,https://docs.openclaw.ai/plugins/architecture,accessed:2026-03-19 (2026)

work page 2026
[30]

OpenClaw Documentation, Multi-agent routing - openclaw,https://docs.openclaw.ai/concepts/multi-agent, accessed: 2026-03- 19 (2026)

work page 2026
[31]

OpenClaw Documentation, Context - openclaw,https://docs.openclaw.ai/concepts/context, accessed: 2026-03-19 (2026)

work page 2026
[32]

Zheng, Y

Y. Zheng, Y. Hu, Audagent: Automated auditing of privacy policy compliance in ai agents (2025).arXiv:2511.07441. URLhttps://arxiv.org/abs/2511.07441

work page arXiv 2025
[33]

Ukani, H

A. Ukani, H. Haddadi, A. S. Shamsabadi, P. Snyder, Privacy practices of browser agents (2025).arXiv:2512.07725. URLhttps://arxiv.org/abs/2512.07725

work page arXiv 2025
[34]

J. Zhou, N. Mireshghallah, T. Li, Operationalizing data minimization for privacy-preserving llm prompting (2025).arXiv:2510.03662. URLhttps://arxiv.org/abs/2510.03662

work page arXiv 2025
[35]

AgentCrypt: Advancing Privacy and (Secure) Computation in AI Agent Collaboration

H. Karthikeyan, Y. Guo, L. de Castro, A. Polychroniadou, L. Ardon, U. M. Sehwag, S. Ganesh, M. Veloso, Agentcrypt: Advancing privacy and (secure) computation in ai agent collaboration (2025).arXiv:2512.08104. URLhttps://arxiv.org/abs/2512.08104

work page internal anchor Pith review Pith/arXiv arXiv 2025
[36]

25013–25030.doi:10.18653/v1/2025.acl-long.1227

B.Wang,W.He,P.He,S.Zeng,Z.Xiang,Y.Xing,J.Tang,Unveilingprivacyrisksinllmagentmemory,in:Proceedingsofthe63rdAnnual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, 2025, pp. 25013–25030.doi:10.18653/v1/2025.acl-long.1227. URLhttps://aclanthology.org/2025.acl-long.1227/

work page doi:10.18653/v1/2025.acl-long.1227 2025
[37]

URLhttps://arxiv.org/abs/2601.05504 First Author et al.:Preprint submitted to ElsevierPage 20 of 21 Security, Privacy, and Ethical Risks in OpenClaw

B.D.Sunil,I.Sinha,P.Maheshwari,S.Todmal,S.Mallik,S.Mishra,Memorypoisoningattackanddefenseonretrieval-augmentedgeneration based llm agents (2026).arXiv:2601.05504. URLhttps://arxiv.org/abs/2601.05504 First Author et al.:Preprint submitted to ElsevierPage 20 of 21 Security, Privacy, and Ethical Risks in OpenClaw

work page arXiv 2026
[38]

K. Zhu, X. Yang, J. Wang, W. Guo, W. Y. Wang, Melon: Indirect prompt injection defense via masked re-execution and tool comparison, in: Proceedings of the 42nd International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=gt1MmGaKdZ

work page 2025
[39]

arXiv:2506.12104

H.Li,X.Liu,H.-C.Chiu,D.Li,N.Zhang,C.Xiao,Drift:Dynamicrule-baseddefensewithinjectionisolationforsecuringllmagents(2025). arXiv:2506.12104. URLhttps://arxiv.org/abs/2506.12104

work page arXiv 2025
[40]

rep., World Economic Forum (Nov

World Economic Forum, Ai agents in action: Foundations for evaluation and governance, Tech. rep., World Economic Forum (Nov. 2025). URLhttps://reports.weforum.org/docs/WEF_AI_Agents_in_Action_Foundations_for_Evaluation_and_Governance_ 2025.pdf

work page 2025
[41]

M. Hahn, M. Tretter, P. Dabrock, Ethical perspectives on AI agents and agentic AI, AI and Ethics 6 (2026) 218.doi:10.1007/ s43681-026-01027-0. URLhttps://link.springer.com/article/10.1007/s43681-026-01027-0

work page doi:10.1007/s43681-026-01027-0 2026
[42]

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

N. Kosmyna, E. Hauptmann, Y. T. Yuan, J. Situ, X.-H. Liao, A. V. Beresnitzky, I. Braunstein, P. Maes, Your brain on chatgpt: Accumulation of cognitive debt when using an ai assistant for essay writing task (2025).arXiv:2506.08872. URLhttps://arxiv.org/abs/2506.08872

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

Constantinescu, M

M. Constantinescu, M. Kaptein, Responsibility gaps, LLMs & organisations: Many agents, many levels, and many interactions, Science and Engineering Ethics 31 (2025) 36.doi:10.1007/s11948-025-00560-1. URLhttps://doi.org/10.1007/s11948-025-00560-1

work page doi:10.1007/s11948-025-00560-1 2025
[44]

Lange, G

B. Lange, G. Keeling, A. Manzini, A. McCroskery, We need accountability in human–AI agent relationships, npj Artificial Intelligence 1 (2025) 38.doi:10.1038/s44387-025-00041-7. URLhttps://doi.org/10.1038/s44387-025-00041-7

work page doi:10.1038/s44387-025-00041-7 2025
[45]

Organisation for Economic Co-operation and Development, The agentic ai landscape and its conceptual foundations, Tech. Rep. 56, OECD Publishing, Paris (Feb. 2026).doi:10.1787/396cf758-en. URLhttps://doi.org/10.1787/396cf758-en

work page doi:10.1787/396cf758-en 2026
[46]

Blaise, Openclaw agents can be guilt-tripped into self-sabotage, WIRED (Mar

D. Blaise, Openclaw agents can be guilt-tripped into self-sabotage, WIRED (Mar. 2026). URLhttps://www.wired.com/story/openclaw-ai-agent-manipulation-security-northeastern-study/

work page 2026
[47]

URLhttps://arxiv.org/abs/2601.06223

E.C.Cheng,J.Cheng,A.Siu,Towardsafeandresponsibleaiagents:Athree-pillarmodelfortransparency,accountability,andtrustworthiness (2026).arXiv:2601.06223. URLhttps://arxiv.org/abs/2601.06223

work page arXiv 2026
[48]

o.Gupta,ReliabilityBench:EvaluatingLLMagentreliabilityunderproduction-likestressconditions,arXivpreprintarXiv:2601.06112(2026)

work page arXiv 2026
[49]

Rabanser, S

S. Rabanser, S. Kapoor, P. Kirgis, K. Liu, S. Utpala, A. Narayanan, Towards a science of ai agent reliability (2026).arXiv:2602.16666, doi:10.48550/arXiv.2602.16666. URLhttps://arxiv.org/abs/2602.16666

work page doi:10.48550/arxiv.2602.16666 2026
[50]

Souza, A

R. Souza, A. Gueroudji, S. DeWitt, D. Rosendo, T. Ghosal, R. Ross, P. Balaprakash, R. F. da Silva, Prov-agent: Unified provenance for tracking ai agent interactions in agentic workflows, in: 2025 IEEE International Conference on eScience (eScience), 2025.doi:10.1109/ eScience65000.2025.00093. URLhttps://arxiv.org/abs/2508.02866

work page arXiv 2025
[51]

R. A. Rasheed, S. Banerjee, A. Mukherjee, R. Hazra, From fluent to verifiable: Claim-level auditability for deep research agents (2026). arXiv:2602.13855. URLhttps://arxiv.org/abs/2602.13855

work page arXiv 2026
[52]

First Author et al.:Preprint submitted to ElsevierPage 21 of 21

OpenTelemetry, Semantic conventions for genai agent and framework spans,https://opentelemetry.io/docs/specs/semconv/ gen-ai/gen-ai-agent-spans/, openTelemetry specification, accessed 2026-03-31 (2025). First Author et al.:Preprint submitted to ElsevierPage 21 of 21

work page 2026

[1] [1]

R. Shu, N. Das, M. Yuan, M. Sunkara, Y. Zhang, Towards effective genai multi-agent collaboration: design and evaluation for enterprise applications, arXiv preprint arXiv:2412.05449 (2024)

work page arXiv 2024

[2] [2]

Schick, J

T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, T. Scialom, Toolformer: Language models can teach themselves to use tools, Advances in neural information processing systems 36 (2023) 68539–68551

work page 2023

[3] [3]

Chase, Langchain,https://github.com/langchain-ai/langchain, software, accessed 2026-03-24 (2022)

H. Chase, Langchain,https://github.com/langchain-ai/langchain, software, accessed 2026-03-24 (2022)

work page 2026

[4] [4]

P.Steinberger,theOpenClawCommunity,Openclaw,https://github.com/openclaw/openclaw,software,accessed2026-03-24(2026)

work page 2026

[5] [5]

H. Su, J. Luo, C. Liu, X. Yang, Y. Zhang, Y. Dong, J. Zhu, A survey on autonomy-induced security risks in large model-based agents, arXiv preprint arXiv:2506.23844 (2025). First Author et al.:Preprint submitted to ElsevierPage 19 of 21 Security, Privacy, and Ethical Risks in OpenClaw

work page arXiv 2025

[6] [6]

Z.Deng,Y.Guo,C.Han,W.Ma,J.Xiong,S.Wen,Y.Xiang,Aiagentsunderthreat:Asurveyofkeysecuritychallengesandfuturepathways, ACM Computing Surveys 57 (7) (2025) 1–36

work page 2025

[7] [7]

V. S. Narajala, O. Narayan, Securing agentic ai: A comprehensive threat model and mitigation framework for generative ai agents, arXiv preprint arXiv:2504.19956 (2025)

work page arXiv 2025

[8] [8]

The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

M. Lupinacci, F. A. Pironti, F. Blefari, F. Romeo, L. Arena, A. Furfaro, The dark side of llms: Agent-based attacks for complete computer takeover, arXiv preprint arXiv:2507.06850 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[9] [9]

ReAct: Synergizing Reasoning and Acting in Language Models

S.Yao,J.Zhao,D.Yu,N.Du,I.Shafran,K.Narasimhan,Y.Cao,React:Synergizingreasoningandactinginlanguagemodels,in:International Conference on Learning Representations (ICLR), 2023. URLhttps://arxiv.org/abs/2210.03629

work page internal anchor Pith review Pith/arXiv arXiv 2023

[10] [10]

Generative Agents: Interactive Simulacra of Human Behavior

J.S.Park,J.C.O’Brien,C.J.Cai,M.R.Morris,P.Liang,M.S.Bernstein,Generativeagents:Interactivesimulacraofhumanbehavior,arXiv preprint arXiv:2304.03442 (2023). URLhttps://arxiv.org/abs/2304.03442

work page internal anchor Pith review Pith/arXiv arXiv 2023

[11] [11]

L.Weng,Llmpoweredautonomousagents,https://lilianweng.github.io/posts/2023-06-23-agent/,lil’Log,accessed2026-03- 24 (Jun. 2023)

work page 2023

[12] [12]

K.Greshake,S.Abdelnabi,S.Mishra,C.Endres,T.Holz,M.Fritz,Notwhatyou’vesignedupfor:Compromisingreal-worldLLM-integrated applicationswithindirectpromptinjection,in:Proceedingsofthe16thACMWorkshoponArtificialIntelligenceandSecurity(AISec@CCS), 2023

work page 2023

[13] [13]

Defeating Prompt Injections by Design

E. Debenedetti, et al., CaMeL: Capability-based security for LLM agents, arXiv preprint arXiv:2503.18813Google DeepMind/ETH Zurich (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[14] [14]

Q. Zhan, Z. Liang, Z. Ying, D. Kang, InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents, in: Findings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 10471–10506

work page 2024

[15] [15]

Debenedetti, J

E. Debenedetti, J. Zhang, M. Balunović, L. Beurer-Kellner, M. Fischer, F. Tramèr, AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents, in: NeurIPS 2024, Datasets and Benchmarks Track, 2024

work page 2024

[16] [16]

M.Nasr,N.Carlini,etal.,Theattackermovessecond,arXivpreprintarXiv:2510.09023JointOpenAI/Anthropic/GoogleDeepMindevaluation (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[17] [17]

T. Chen, D. Liu, X. Hu, J. Yu, W. Wang, A trajectory-based safety audit of clawdbot (openclaw), arXiv preprint arXiv:2602.14364 (2026)

work page arXiv 2026

[18] [18]

X. Deng, Y. Zhang, J. Wu, J. Bai, S. Yi, Z. Zou, Y. Xiao, R. Qiu, J. Ma, J. Chen, et al., Taming openclaw: Security analysis and mitigation of autonomous llm agent threats, arXiv preprint arXiv:2603.11619 (2026)

work page arXiv 2026

[19] [19]

Z. Ying, X. Yang, S. Wu, Y. Song, Y. Qu, H. Li, T. Li, J. Wang, A. Liu, X. Liu, Uncovering security threats and architecting defenses in autonomous agents: A case study of openclaw, arXiv preprint arXiv:2603.12644 (2026)

work page arXiv 2026

[20] [20]

Wang, MCPTox: A systematic benchmark for MCP server security, arXiv preprint arXiv:2508.14925 (2025)

o. Wang, MCPTox: A systematic benchmark for MCP server security, arXiv preprint arXiv:2508.14925 (2025)

work page arXiv 2025

[21] [21]

X. Gu, X. Zheng, T. Pang, C. Du, Q. Liu, Y. Wang, J. Jiang, M. Lin, Agent smith: A single image can jailbreak one million multimodal LLM agents exponentially fast, in: International Conference on Machine Learning (ICML), 2024

work page 2024

[22] [22]

Cohen, R

S. Cohen, R. Bitton, B. Nassi, RAGworm: Self-replicating AI worms that spread through interconnected GenAI applications, in: ACM Conference on Computer and Communications Security (CCS), 2025, arXiv:2403.02817, 2024

work page arXiv 2025

[23] [23]

Peigné-Lefebvre, et al., The multi-agent security tax, in: AAAI Conference on Artificial Intelligence, 2025, arXiv:2502.19145

T. Peigné-Lefebvre, et al., The multi-agent security tax, in: AAAI Conference on Artificial Intelligence, 2025, arXiv:2502.19145

work page arXiv 2025

[24] [24]

Cheng, o

o. Cheng, o. Tsao, Privilege separation for OpenClaw, arXiv preprint arXiv:2603.13424 (2026)

work page arXiv 2026

[25] [25]

OpenClaw, Openclaw trust center,https://trust.openclaw.ai/, accessed: 2026-03-19 (2026)

work page 2026

[26] [26]

OpenClaw Documentation, Gateway security,https://docs.openclaw.ai/gateway/security, accessed: 2026-03-19 (2026)

work page 2026

[27] [27]

Microsoft Security, Running openclaw safely: Identity, isolation, and runtime risk,https://www.microsoft.com/en-us/security/ blog/2026/02/19/running-openclaw-safely-identity-isolation-runtime-risk/, accessed: 2026-03-19 (2026)

work page 2026

[28] [28]

OpenClaw Documentation, Memory - openclaw,https://docs.openclaw.ai/concepts/memory, accessed: 2026-03-19 (2026)

work page 2026

[29] [29]

OpenClawDocumentation,Pluginarchitecture-openclaw,https://docs.openclaw.ai/plugins/architecture,accessed:2026-03-19 (2026)

work page 2026

[30] [30]

OpenClaw Documentation, Multi-agent routing - openclaw,https://docs.openclaw.ai/concepts/multi-agent, accessed: 2026-03- 19 (2026)

work page 2026

[31] [31]

OpenClaw Documentation, Context - openclaw,https://docs.openclaw.ai/concepts/context, accessed: 2026-03-19 (2026)

work page 2026

[32] [32]

Zheng, Y

Y. Zheng, Y. Hu, Audagent: Automated auditing of privacy policy compliance in ai agents (2025).arXiv:2511.07441. URLhttps://arxiv.org/abs/2511.07441

work page arXiv 2025

[33] [33]

Ukani, H

A. Ukani, H. Haddadi, A. S. Shamsabadi, P. Snyder, Privacy practices of browser agents (2025).arXiv:2512.07725. URLhttps://arxiv.org/abs/2512.07725

work page arXiv 2025

[34] [34]

J. Zhou, N. Mireshghallah, T. Li, Operationalizing data minimization for privacy-preserving llm prompting (2025).arXiv:2510.03662. URLhttps://arxiv.org/abs/2510.03662

work page arXiv 2025

[35] [35]

AgentCrypt: Advancing Privacy and (Secure) Computation in AI Agent Collaboration

H. Karthikeyan, Y. Guo, L. de Castro, A. Polychroniadou, L. Ardon, U. M. Sehwag, S. Ganesh, M. Veloso, Agentcrypt: Advancing privacy and (secure) computation in ai agent collaboration (2025).arXiv:2512.08104. URLhttps://arxiv.org/abs/2512.08104

work page internal anchor Pith review Pith/arXiv arXiv 2025

[36] [36]

25013–25030.doi:10.18653/v1/2025.acl-long.1227

B.Wang,W.He,P.He,S.Zeng,Z.Xiang,Y.Xing,J.Tang,Unveilingprivacyrisksinllmagentmemory,in:Proceedingsofthe63rdAnnual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, 2025, pp. 25013–25030.doi:10.18653/v1/2025.acl-long.1227. URLhttps://aclanthology.org/2025.acl-long.1227/

work page doi:10.18653/v1/2025.acl-long.1227 2025

[37] [37]

URLhttps://arxiv.org/abs/2601.05504 First Author et al.:Preprint submitted to ElsevierPage 20 of 21 Security, Privacy, and Ethical Risks in OpenClaw

B.D.Sunil,I.Sinha,P.Maheshwari,S.Todmal,S.Mallik,S.Mishra,Memorypoisoningattackanddefenseonretrieval-augmentedgeneration based llm agents (2026).arXiv:2601.05504. URLhttps://arxiv.org/abs/2601.05504 First Author et al.:Preprint submitted to ElsevierPage 20 of 21 Security, Privacy, and Ethical Risks in OpenClaw

work page arXiv 2026

[38] [38]

K. Zhu, X. Yang, J. Wang, W. Guo, W. Y. Wang, Melon: Indirect prompt injection defense via masked re-execution and tool comparison, in: Proceedings of the 42nd International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=gt1MmGaKdZ

work page 2025

[39] [39]

arXiv:2506.12104

H.Li,X.Liu,H.-C.Chiu,D.Li,N.Zhang,C.Xiao,Drift:Dynamicrule-baseddefensewithinjectionisolationforsecuringllmagents(2025). arXiv:2506.12104. URLhttps://arxiv.org/abs/2506.12104

work page arXiv 2025

[40] [40]

rep., World Economic Forum (Nov

World Economic Forum, Ai agents in action: Foundations for evaluation and governance, Tech. rep., World Economic Forum (Nov. 2025). URLhttps://reports.weforum.org/docs/WEF_AI_Agents_in_Action_Foundations_for_Evaluation_and_Governance_ 2025.pdf

work page 2025

[41] [41]

M. Hahn, M. Tretter, P. Dabrock, Ethical perspectives on AI agents and agentic AI, AI and Ethics 6 (2026) 218.doi:10.1007/ s43681-026-01027-0. URLhttps://link.springer.com/article/10.1007/s43681-026-01027-0

work page doi:10.1007/s43681-026-01027-0 2026

[42] [42]

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

N. Kosmyna, E. Hauptmann, Y. T. Yuan, J. Situ, X.-H. Liao, A. V. Beresnitzky, I. Braunstein, P. Maes, Your brain on chatgpt: Accumulation of cognitive debt when using an ai assistant for essay writing task (2025).arXiv:2506.08872. URLhttps://arxiv.org/abs/2506.08872

work page internal anchor Pith review Pith/arXiv arXiv 2025

[43] [43]

Constantinescu, M

M. Constantinescu, M. Kaptein, Responsibility gaps, LLMs & organisations: Many agents, many levels, and many interactions, Science and Engineering Ethics 31 (2025) 36.doi:10.1007/s11948-025-00560-1. URLhttps://doi.org/10.1007/s11948-025-00560-1

work page doi:10.1007/s11948-025-00560-1 2025

[44] [44]

Lange, G

B. Lange, G. Keeling, A. Manzini, A. McCroskery, We need accountability in human–AI agent relationships, npj Artificial Intelligence 1 (2025) 38.doi:10.1038/s44387-025-00041-7. URLhttps://doi.org/10.1038/s44387-025-00041-7

work page doi:10.1038/s44387-025-00041-7 2025

[45] [45]

Organisation for Economic Co-operation and Development, The agentic ai landscape and its conceptual foundations, Tech. Rep. 56, OECD Publishing, Paris (Feb. 2026).doi:10.1787/396cf758-en. URLhttps://doi.org/10.1787/396cf758-en

work page doi:10.1787/396cf758-en 2026

[46] [46]

Blaise, Openclaw agents can be guilt-tripped into self-sabotage, WIRED (Mar

D. Blaise, Openclaw agents can be guilt-tripped into self-sabotage, WIRED (Mar. 2026). URLhttps://www.wired.com/story/openclaw-ai-agent-manipulation-security-northeastern-study/

work page 2026

[47] [47]

URLhttps://arxiv.org/abs/2601.06223

E.C.Cheng,J.Cheng,A.Siu,Towardsafeandresponsibleaiagents:Athree-pillarmodelfortransparency,accountability,andtrustworthiness (2026).arXiv:2601.06223. URLhttps://arxiv.org/abs/2601.06223

work page arXiv 2026

[48] [48]

o.Gupta,ReliabilityBench:EvaluatingLLMagentreliabilityunderproduction-likestressconditions,arXivpreprintarXiv:2601.06112(2026)

work page arXiv 2026

[49] [49]

Rabanser, S

S. Rabanser, S. Kapoor, P. Kirgis, K. Liu, S. Utpala, A. Narayanan, Towards a science of ai agent reliability (2026).arXiv:2602.16666, doi:10.48550/arXiv.2602.16666. URLhttps://arxiv.org/abs/2602.16666

work page doi:10.48550/arxiv.2602.16666 2026

[50] [50]

Souza, A

R. Souza, A. Gueroudji, S. DeWitt, D. Rosendo, T. Ghosal, R. Ross, P. Balaprakash, R. F. da Silva, Prov-agent: Unified provenance for tracking ai agent interactions in agentic workflows, in: 2025 IEEE International Conference on eScience (eScience), 2025.doi:10.1109/ eScience65000.2025.00093. URLhttps://arxiv.org/abs/2508.02866

work page arXiv 2025

[51] [51]

R. A. Rasheed, S. Banerjee, A. Mukherjee, R. Hazra, From fluent to verifiable: Claim-level auditability for deep research agents (2026). arXiv:2602.13855. URLhttps://arxiv.org/abs/2602.13855

work page arXiv 2026

[52] [52]

First Author et al.:Preprint submitted to ElsevierPage 21 of 21

OpenTelemetry, Semantic conventions for genai agent and framework spans,https://opentelemetry.io/docs/specs/semconv/ gen-ai/gen-ai-agent-spans/, openTelemetry specification, accessed 2026-03-31 (2025). First Author et al.:Preprint submitted to ElsevierPage 21 of 21

work page 2026