Understanding and Evaluating Claw-like Agent Security Through a Computer-Systems Lens

Ahmad Tawaha; Basel Alomair; Boyi Li; Chenguang Wang; Chi Wang; Costas Spanos; Dawn Song; Hend Alzahrani; Jiaheng Zhang; Ming Jin

arxiv: 2606.30755 · v1 · pith:JUCKZ2DYnew · submitted 2026-06-29 · 💻 cs.CR · cs.AI

Understanding and Evaluating Claw-like Agent Security Through a Computer-Systems Lens

Peizhi Niu , Wenjie Qu , Shangding Gu , Tianneng Shi , Yuankai Li , Ahmad Tawaha , Hend Alzahrani , Vincent Siu

show 9 more authors

Boyi Li Chenguang Wang Jiaheng Zhang Basel Alomair Ming Jin Muhao Chen Chi Wang Costas Spanos Dawn Song

This is my paper

Pith reviewed 2026-07-01 01:53 UTC · model grok-4.3

classification 💻 cs.CR cs.AI

keywords Claw-like agent securityadversarial benchmarkplugin attackspersistent state exploitationprompt injectionruntime mediationtaint trackingagent platforms

0 comments

The pith

Claw-like agents allow attack success rates up to 70 percent because they lack standard computer-system protections for their runtime, skills, and plugins.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Claw-like agents run as persistent processes with broad access to credentials, files, and services, so security failures affect the entire environment rather than isolated responses. The paper maps these agents to classical computer systems by treating the gateway runtime as an OS mediator, skills as installed applications, and plugins as privileged extensions, then checks which long-established safeguards are missing. It introduces SafeClawArena, a benchmark of 406 tasks across four attack surfaces, executed in containerized replicas with taint tracking to measure real cross-component failures. Results show malicious plugins succeed in every case regardless of the underlying model, while one hardened platform reduces success from 70 percent to 22 percent and another model already sits at that floor. The work therefore claims that current agent platforms expose the same classes of vulnerability that operating systems solved decades ago.

Core claim

Treating a Claw-like agent as an agentic computer system whose gateway runtime mediates access like an OS, whose Skills act like user applications, and whose Plugins act like loadable extensions reveals four unprotected attack surfaces; SafeClawArena measures them with 406 tasks and finds attack success reaching 70 percent, with plugins succeeding in 100 percent of trials, while SeClaw lowers GPT-5.4 success to 22 percent and Claude-Opus-4.6 already meets that floor on every platform.

What carries the argument

The computer-systems analogy that maps gateway runtime to OS, Skills to applications, and Plugins to extensions, which directly determines the four attack surfaces and the design of the SafeClawArena benchmark with automated taint tracking.

If this is right

Platform designers must add runtime mediation checks comparable to an OS kernel before skills or plugins execute.
Plugin distribution channels require supply-chain verification because no model defense stops them once loaded.
Benchmarks limited to single tool calls or prompt responses will miss the persistent-state and cross-boundary failures measured here.
Model-level prompt filtering alone cannot close the gap when the platform itself grants elevated privileges.
A 22 percent floor appears across both hardened and unhardened platforms for at least one frontier model, indicating limits to current mitigation strategies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same mapping could be applied to other persistent agent frameworks to predict which attack surfaces will appear first.
If the analogy holds, existing OS hardening techniques such as capability-based access or mandatory access control could be ported to agent runtimes with measurable effect.
The benchmark could be extended to measure utility loss when the platform adds the missing mediation layers, testing whether the observed tradeoffs are inherent or implementation-specific.

Load-bearing premise

The computer-systems analogy produces attack surfaces and failure modes that match those present in real Claw-like agent deployments.

What would settle it

Executing the 406 SafeClawArena tasks against a live production Claw-like agent and finding attack success rates that differ substantially from the reported 70 percent maximum.

Figures

Figures reproduced from arXiv: 2606.30755 by Ahmad Tawaha, Basel Alomair, Boyi Li, Chenguang Wang, Chi Wang, Costas Spanos, Dawn Song, Hend Alzahrani, Jiaheng Zhang, Ming Jin, Muhao Chen, Peizhi Niu, Shangding Gu, Tianneng Shi, Vincent Siu, Wenjie Qu, Yuankai Li.

**Figure 1.** Figure 1: Architecture of a Claw-like agent and its attack surfaces. The Gateway daemon hosts six core components (LLM core, Skill loader, Plugin loader, Memory, Tool executor, Config), each annotated with the primary security risk it carries. Colored markers indicate the four dimensions SafeClawArena evaluates: red for Skill Supply-Chain Integrity (SSI), violet for Persistent State Exploitation (PSE), green for Cro… view at source ↗

**Figure 2.** Figure 2: Overview of SafeClawArena. Left: 406 tasks across four architectural attack surfaces. Middle: each task runs in a fresh Docker container replicating one of three OpenClaw-family platforms (OpenClaw, NemoClaw, SeClaw) with the LLM backend, canary-tagged credential files, deployed Skills/Plugins/content, and the simulated Sim-Google CLI. Right: the evaluator captures nine output channels and applies dimensio… view at source ↗

**Figure 3.** Figure 3: Defense coverage matrix: 11 system-level defenses (columns, grouped by dimension [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗

**Figure 4.** Figure 4: Per-category attack success rate (%) for each of the 15 (platform, model) configurations [PITH_FULL_IMAGE:figures/full_fig_p028_4.png] view at source ↗

read the original abstract

Claw-like AI agents (e.g., OpenClaw) are always-on processes with persistent access to credentials, files, tools, and external services. They take on system-level responsibilities -- installing packages, maintaining state, scheduling subtasks, and mediating I/O -- making security failures far more severe than in other agents. Yet existing benchmarks focus on model responses and tool calls, leaving cross-component failure modes largely unmeasured. We adopt a computer-system analogy: treating a Claw-like agent as an agentic computer system whose gateway runtime plays an OS-like mediation role, whose Skills resemble user-installed applications, and whose Plugins resemble loadable extensions with runtime privileges. Each component has a classical counterpart whose protection mechanisms -- refined over decades of cybersecurity research -- are absent on the agent side. From this perspective, we develop SafeClawArena, a benchmark of 406 adversarial tasks across four attack surfaces (Skill Supply-Chain Integrity, Persistent State Exploitation, Cross-Boundary Data Flow, and Indirect Prompt Injection), executed in containerized replicas of real agent platforms with canary-marked credentials and evaluated via automated taint tracking across nine output channels. We evaluate three platforms (OpenClaw, NemoClaw, SeClaw) and five frontier LLMs. The highest attack success rate reaches 70%; malicious Plugins succeed in 100% of cases regardless of the LLM. SeClaw cuts GPT-5.4's attack success rate from 70% to 22%, partly through utility-security tradeoffs rather than active defenses, while Claude-Opus-4.6 already sits near a 22% floor on every platform. These results expose the inadequacy of current defenses and suggest directions for future hardening. Code and data: https://github.com/sunblaze-ucb/SafeClawArena.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces SafeClawArena, a benchmark using a systems analogy for persistent agent attacks, with reported rates up to 70% ASR, but the abstract gives no validation details on tasks or taint tracking.

read the letter

The main takeaway is that this work applies a computer-systems framing to always-on agents and measures four attack surfaces with automated tracking, producing concrete failure rates like 70% overall and 100% for malicious plugins.

They do a solid job defining the analogy (runtime as OS, skills as apps, plugins as extensions) and building 406 tasks across supply-chain integrity, state exploitation, cross-boundary flows, and indirect injection. Running them in containerized replicas of three platforms with canary credentials and taint tracking across nine channels is a clear step past single-turn model tests. The platform comparison, including SeClaw's reduction for one model, adds usable data points.

The soft spot is the complete absence of method details. The abstract states the results but says nothing about how the adversarial tasks were validated, how taint tracking was implemented without false positives, or what statistical controls were used. Without that, the 70% and 100% figures cannot be assessed. The claim that the analogy reflects real deployments also sits untested in what is shown.

This is for people working on agent security and systems-style protections for long-running AI. A reader focused on benchmarks would find the new surfaces useful even if the numbers need more backing.

It deserves peer review to check the missing implementation and validation steps.

Referee Report

0 major / 0 minor

Summary. The paper claims that Claw-like agents (e.g., OpenClaw) pose severe security risks due to persistent credentials, files, tools, and system-level responsibilities. Adopting a computer-systems analogy (gateway runtime as OS, Skills as applications, Plugins as loadable extensions), the authors introduce SafeClawArena: a benchmark of 406 adversarial tasks across four attack surfaces (Skill Supply-Chain Integrity, Persistent State Exploitation, Cross-Boundary Data Flow, Indirect Prompt Injection). Tasks are executed in containerized replicas of real platforms with canary-marked credentials and evaluated via automated taint tracking across nine output channels. On three platforms and five frontier LLMs, the highest attack success rate is 70%, malicious Plugins succeed in 100% of cases regardless of LLM, SeClaw reduces GPT-5.4 ASR from 70% to 22% (partly via utility-security tradeoffs), and Claude-Opus-4.6 sits near a 22% floor on all platforms. Code and data are released.

Significance. If the benchmark construction, task validation, and taint-tracking results hold, the work is significant for importing decades of systems-security insight into agent evaluation and exposing cross-component failure modes that model-only benchmarks miss. Strengths include the containerized replicas, automated taint tracking, release of code/data, and concrete quantitative findings (70% ASR, 100% plugin success, SeClaw reduction) that are falsifiable and reproducible.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, accurate summary of the contributions, and recommendation to accept. We appreciate the recognition of the benchmark's design, reproducibility, and the value of the systems-security lens for agent evaluation.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper frames an empirical security benchmark (SafeClawArena) via a computer-systems analogy but does not derive quantitative results, attack success rates, or platform comparisons from that analogy by construction. The 406 tasks, four attack surfaces, taint-tracking evaluation, and reported metrics (70% ASR, 100% plugin success, SeClaw reduction to 22%) are obtained through direct execution on external containerized platforms and automated measurement, not from fitted parameters, self-definitions, or self-citation chains. No equations, uniqueness theorems, or ansatzes appear; the central claims rest on experimental outcomes rather than reduction to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the validity of mapping agent components to classical OS roles; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The gateway runtime, Skills, and Plugins in Claw-like agents have direct classical counterparts whose protection mechanisms are absent on the agent side.
Invoked in the abstract when developing the benchmark from the computer-systems analogy.

pith-pipeline@v0.9.1-grok · 5922 in / 1200 out tokens · 31132 ms · 2026-07-01T01:53:00.669215+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 15 canonical work pages · 8 internal anchors

[1]

Anderson

James P. Anderson. Computer security technology planning study. Technical Report ESD-TR- 73-51, U.S. Air Force Electronic Systems Division, 1972

1972
[2]

AgentHarm: A benchmark for measuring harmfulness of LLM agents

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, Zico Kolter, Matt Fredrikson, Eric Winsor, Jerome Wynne, Yarin Gal, and Xander Davies. AgentHarm: A benchmark for measuring harmfulness of LLM agents. InProceedings of ICLR, 2025

2025
[3]

Elliott Bell and Leonard J

D. Elliott Bell and Leonard J. LaPadula. Secure computer systems: Mathematical foundations. Technical Report MTR-2547, MITRE Corporation, 1973

1973
[4]

Kenneth J. Biba. Integrity considerations for secure computer systems. Technical Report MTR-3153, MITRE Corporation, 1977

1977
[5]

A trajectory-based safety audit of clawdbot (openclaw)

Tianyu Chen, Dongrui Liu, Xia Hu, Jingyi Yu, and Wenjie Wang. A trajectory-based safety audit of clawdbot (openclaw).arXiv preprint arXiv:2602.14364, 2026

work page arXiv 2026
[6]

Ai agent security risks in 2026: The incident landscape and hardening frame- work.https://blog.cyberdesserts.com/ai-agent-security-risks/, 2026

CyberDesserts. Ai agent security risks in 2026: The incident landscape and hardening frame- work.https://blog.cyberdesserts.com/ai-agent-security-risks/, 2026

2026
[7]

AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents

Edoardo Debenedetti, Jie Zhang, Mislav Balunovi´c, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. InProceedings of NeurIPS Datasets and Benchmarks Track, 2024

2024
[8]

OpenClawCVEs: Tracking OpenClaw CVEs

Jerry Gamblin. OpenClawCVEs: Tracking OpenClaw CVEs. https://github.com/ jgamblin/OpenClawCVEs, 2026. Accessed: 2026-05-06

2026
[9]

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

Shangding Gu. From model scaling to system scaling: Scaling the harness in agentic ai.arXiv preprint arXiv:2605.26112, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[10]

Model context protocol (mcp): Landscape, security threats, and future research directions.ACM Transactions on Software Engineering and Methodology, 2025

Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. Model context protocol (mcp): Landscape, security threats, and future research directions.ACM Transactions on Software Engineering and Methodology, 2025

2025
[11]

Malicious openclaw ‘skill’ targets crypto users on clawhub — 14 malicious skills were uploaded to clawhub last month.Tom’s Hardware, 2026

Luke James. Malicious openclaw ‘skill’ targets crypto users on clawhub — 14 malicious skills were uploaded to clawhub last month.Tom’s Hardware, 2026

2026
[12]

The attack and defense landscape of agentic AI: A comprehensive survey.arXiv preprint arXiv:2603.11088, March 2026

Juhee Kim, Xiaoyuan Liu, Zhun Wang, Shi Qiu, Bo Li, Wenbo Guo, and Dawn Song. The attack and defense landscape of agentic AI: A comprehensive survey.arXiv preprint arXiv:2603.11088, March 2026

work page arXiv 2026
[13]

Model Context Protocol

Narek Maloyan and Dmitry Namiot. Breaking the protocol: Security analysis of the model context protocol specification and prompt injection vulnerabilities in tool-integrated llm agents. arXiv preprint arXiv:2601.17549, 2026

work page arXiv 2026
[14]

Narek Maloyan and Dmitry Namiot. Prompt injection attacks on agentic coding assistants: A systematic analysis of vulnerabilities in skills, tools, and protocol ecosystems.International Journal of Open Information Technologies, 14(2):1–10, 2026

2026
[15]

Running openclaw safely: identity, isolation, and runtime risk

Microsoft Security Blog. Running openclaw safely: identity, isolation, and runtime risk. https://www.microsoft.com/en-us/security/blog/2026/02/19/ running-openclaw-safely-identity-isolation-runtime-risk/, February 2026

2026
[16]

Nvidia nemoclaw: Reference stack for running openclaw in openshell

NVIDIA. Nvidia nemoclaw: Reference stack for running openclaw in openshell. https: //github.com/NVIDIA/NemoClaw, 2026. Accessed: 2026-05-06

2026
[17]

Claudy day: Chaining prompt injection and data exfiltration in claude.ai

Oasis Security Research Team. Claudy day: Chaining prompt injection and data exfiltration in claude.ai. https://www.oasis.security/blog/ claude-ai-prompt-injection-data-exfiltration-vulnerability , March 2026. Accessed: 2026-05-02

2026
[18]

SLSA: Supply-chain levels for software artifacts

Open Source Security Foundation. SLSA: Supply-chain levels for software artifacts. https: //slsa.dev/, 2021. 12

2021
[19]

Security (gateway).https://docs.openclaw.ai/, 2026

OpenClaw Documentation. Security (gateway).https://docs.openclaw.ai/, 2026

2026
[20]

OpenClaw Documentation for Skills. Skills. https://docs.openclaw.ai/tools/skills, 2026

2026
[21]

OW ASP top 10 for large language model applications.https://owasp

OW ASP Foundation. OW ASP top 10 for large language model applications.https://owasp. org/www-project-top-10-for-large-language-model-applications/, 2025

2025
[22]

OW ASP top 10 for agentic applications.https://genai.owasp.org/ resource/owasp-top-10-for-agentic-applications-for-2026/, 2026

OW ASP Foundation. OW ASP top 10 for agentic applications.https://genai.owasp.org/ resource/owasp-top-10-for-agentic-applications-for-2026/, 2026

2026
[23]

Coding agents: A comprehensive survey of automated bug fixing systems and benchmarks

Meghana Puvvadi, Sai Kumar Arava, Adarsh Santoria, Sesha Sai Prasanna Chennupati, and Harsha Vardhan Puvvadi. Coding agents: A comprehensive survey of automated bug fixing systems and benchmarks. In2025 IEEE 14th International Conference on Communication Systems and Network Technologies (CSNT), pages 680–686. IEEE, 2025

2025
[24]

Securing LLM Agents Need Intent-to-Execution Integrity

Wenjie Qu, Ming Xu, Peiran Wang, Shengfang Zhai, Jiaheng Zhang, and Dawn Song. Securing llm agents need intent-to-execution integrity.arXiv preprint arXiv:2605.16976, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[25]

Mcp safety audit: Llms with the model context protocol allow major security exploits.arXiv preprint arXiv:2504.03767, 2025

Brandon Radosevich and John Halloran. Mcp safety audit: Llms with the model context protocol allow major security exploits.arXiv preprint arXiv:2504.03767, 2025

work page arXiv 2025
[26]

Maddison, and Tatsunori Hashimoto

Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. Identifying the risks of LM agents with an LM-emulated sandbox. InProceedings of ICLR (Spotlight), 2024

2024
[27]

SeClaw: Secured personal ai assistant

SaFoLab. SeClaw: Secured personal ai assistant. https://safo-lab.github.io/seclaw/,
[28]

Accessed: 2026-05-06

2026
[29]

Saltzer and Michael D

Jerome H. Saltzer and Michael D. Schroeder. The protection of information in computer systems. Proceedings of the IEEE, 63(9):1278–1308, 1975

1975
[30]

Openclaw — personal ai assistant.github, 2026

Openclaw Team. Openclaw — personal ai assistant.github, 2026

2026
[31]

arXiv preprint arXiv:2503.04957 , year =

Ada Defne Tur, Nicholas Meade, Xing Han Lù, Alejandra Zambrano, Arkil Patel, Esin Durmus, Spandana Gella, Karolina Sta ´nczak, and Siva Reddy. Safearena: Evaluating the safety of autonomous web agents.arXiv preprint arXiv:2503.04957, 2025

work page arXiv 2025
[32]

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, and Alex Beutel. The instruction hierarchy: Training LLMs to prioritize privileged instructions.arXiv preprint arXiv:2404.13208, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[33]

The landscape of prompt injection threats in LLM agents: From taxonomy to analysis,

Peiran Wang, Xinfeng Li, Chong Xiang, Jinghuai Zhang, Ying Li, Lixia Zhang, Xiaofeng Wang, and Yuan Tian. The landscape of prompt injection threats in llm agents: From taxonomy to analysis.arXiv preprint arXiv:2602.10453, 2026

work page arXiv 2026
[34]

A Systematic Security Evaluation of OpenClaw and Its Variants

Yuhang Wang, Haichang Gao, Zhenxing Niu, Zhaoxiang Liu, Wenjing Zhang, Xiang Wang, and Shiguo Lian. A systematic security evaluation of OpenClaw and its variants.arXiv preprint arXiv:2604.03131, April 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[35]

ClawSafety: "Safe" LLMs, Unsafe Agents

Bowen Wei, Yunbei Zhang, Jinhao Pan, Kai Mei, Xiao Wang, Jihun Hamm, Ziwei Zhu, and Yingqiang Ge. ClawSafety: “Safe” LLMs, unsafe agents.arXiv preprint arXiv:2604.01438, April 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[36]

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents

Zonghao Ying, Yangguang Shao, Jianle Gan, Gan Xu, Wenxin Zhang, Quanchen Zou, Junzheng Shi, Zhenfei Yin, Mingchuan Zhang, Aishan Liu, et al. Securewebarena: A holistic security evaluation benchmark for lvlm-based web agents.arXiv preprint arXiv:2510.10073, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

R-Judge: Benchmarking safety risk awareness for LLM agents

Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang, Rui Wang, and Gongshen Liu. R-Judge: Benchmarking safety risk awareness for LLM agents. InFindings of EMNLP, 2024

2024
[38]

InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents. InFindings of ACL, 2024. 13

2024
[39]

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (asb): Formalizing and benchmarking attacks and defenses in llm-based agents.arXiv preprint arXiv:2410.02644, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[40]

arXiv preprint arXiv:2508.12752 , year=

Wenlin Zhang, Xiaopeng Li, Yingyi Zhang, Pengyue Jia, Yichao Wang, Huifeng Guo, Yong Liu, and Xiangyu Zhao. Deep research: A survey of autonomous research agents.arXiv preprint arXiv:2508.12752, 2025

work page arXiv 2025
[41]

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Zhexin Zhang, Shiyao Cui, Yida Lu, Jingzhuo Zhou, Junxiao Yang, Hongning Wang, and Minlie Huang. Agent-safetybench: Evaluating the safety of llm agents.arXiv preprint arXiv:2412.14470, 2024. 14 Appendix A Detailed Comparison with Prior Benchmarks 16 B Component-to-Classical-System Mapping and Security Principles 16 B.1 Component-to-Classical-System Mappin...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[1] [1]

Anderson

James P. Anderson. Computer security technology planning study. Technical Report ESD-TR- 73-51, U.S. Air Force Electronic Systems Division, 1972

1972

[2] [2]

AgentHarm: A benchmark for measuring harmfulness of LLM agents

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Zou, Zico Kolter, Matt Fredrikson, Eric Winsor, Jerome Wynne, Yarin Gal, and Xander Davies. AgentHarm: A benchmark for measuring harmfulness of LLM agents. InProceedings of ICLR, 2025

2025

[3] [3]

Elliott Bell and Leonard J

D. Elliott Bell and Leonard J. LaPadula. Secure computer systems: Mathematical foundations. Technical Report MTR-2547, MITRE Corporation, 1973

1973

[4] [4]

Kenneth J. Biba. Integrity considerations for secure computer systems. Technical Report MTR-3153, MITRE Corporation, 1977

1977

[5] [5]

A trajectory-based safety audit of clawdbot (openclaw)

Tianyu Chen, Dongrui Liu, Xia Hu, Jingyi Yu, and Wenjie Wang. A trajectory-based safety audit of clawdbot (openclaw).arXiv preprint arXiv:2602.14364, 2026

work page arXiv 2026

[6] [6]

Ai agent security risks in 2026: The incident landscape and hardening frame- work.https://blog.cyberdesserts.com/ai-agent-security-risks/, 2026

CyberDesserts. Ai agent security risks in 2026: The incident landscape and hardening frame- work.https://blog.cyberdesserts.com/ai-agent-security-risks/, 2026

2026

[7] [7]

AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents

Edoardo Debenedetti, Jie Zhang, Mislav Balunovi´c, Luca Beurer-Kellner, Marc Fischer, and Florian Tramèr. AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. InProceedings of NeurIPS Datasets and Benchmarks Track, 2024

2024

[8] [8]

OpenClawCVEs: Tracking OpenClaw CVEs

Jerry Gamblin. OpenClawCVEs: Tracking OpenClaw CVEs. https://github.com/ jgamblin/OpenClawCVEs, 2026. Accessed: 2026-05-06

2026

[9] [9]

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

Shangding Gu. From model scaling to system scaling: Scaling the harness in agentic ai.arXiv preprint arXiv:2605.26112, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[10] [10]

Model context protocol (mcp): Landscape, security threats, and future research directions.ACM Transactions on Software Engineering and Methodology, 2025

Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. Model context protocol (mcp): Landscape, security threats, and future research directions.ACM Transactions on Software Engineering and Methodology, 2025

2025

[11] [11]

Malicious openclaw ‘skill’ targets crypto users on clawhub — 14 malicious skills were uploaded to clawhub last month.Tom’s Hardware, 2026

Luke James. Malicious openclaw ‘skill’ targets crypto users on clawhub — 14 malicious skills were uploaded to clawhub last month.Tom’s Hardware, 2026

2026

[12] [12]

The attack and defense landscape of agentic AI: A comprehensive survey.arXiv preprint arXiv:2603.11088, March 2026

Juhee Kim, Xiaoyuan Liu, Zhun Wang, Shi Qiu, Bo Li, Wenbo Guo, and Dawn Song. The attack and defense landscape of agentic AI: A comprehensive survey.arXiv preprint arXiv:2603.11088, March 2026

work page arXiv 2026

[13] [13]

Model Context Protocol

Narek Maloyan and Dmitry Namiot. Breaking the protocol: Security analysis of the model context protocol specification and prompt injection vulnerabilities in tool-integrated llm agents. arXiv preprint arXiv:2601.17549, 2026

work page arXiv 2026

[14] [14]

Narek Maloyan and Dmitry Namiot. Prompt injection attacks on agentic coding assistants: A systematic analysis of vulnerabilities in skills, tools, and protocol ecosystems.International Journal of Open Information Technologies, 14(2):1–10, 2026

2026

[15] [15]

Running openclaw safely: identity, isolation, and runtime risk

Microsoft Security Blog. Running openclaw safely: identity, isolation, and runtime risk. https://www.microsoft.com/en-us/security/blog/2026/02/19/ running-openclaw-safely-identity-isolation-runtime-risk/, February 2026

2026

[16] [16]

Nvidia nemoclaw: Reference stack for running openclaw in openshell

NVIDIA. Nvidia nemoclaw: Reference stack for running openclaw in openshell. https: //github.com/NVIDIA/NemoClaw, 2026. Accessed: 2026-05-06

2026

[17] [17]

Claudy day: Chaining prompt injection and data exfiltration in claude.ai

Oasis Security Research Team. Claudy day: Chaining prompt injection and data exfiltration in claude.ai. https://www.oasis.security/blog/ claude-ai-prompt-injection-data-exfiltration-vulnerability , March 2026. Accessed: 2026-05-02

2026

[18] [18]

SLSA: Supply-chain levels for software artifacts

Open Source Security Foundation. SLSA: Supply-chain levels for software artifacts. https: //slsa.dev/, 2021. 12

2021

[19] [19]

Security (gateway).https://docs.openclaw.ai/, 2026

OpenClaw Documentation. Security (gateway).https://docs.openclaw.ai/, 2026

2026

[20] [20]

OpenClaw Documentation for Skills. Skills. https://docs.openclaw.ai/tools/skills, 2026

2026

[21] [21]

OW ASP top 10 for large language model applications.https://owasp

OW ASP Foundation. OW ASP top 10 for large language model applications.https://owasp. org/www-project-top-10-for-large-language-model-applications/, 2025

2025

[22] [22]

OW ASP top 10 for agentic applications.https://genai.owasp.org/ resource/owasp-top-10-for-agentic-applications-for-2026/, 2026

OW ASP Foundation. OW ASP top 10 for agentic applications.https://genai.owasp.org/ resource/owasp-top-10-for-agentic-applications-for-2026/, 2026

2026

[23] [23]

Coding agents: A comprehensive survey of automated bug fixing systems and benchmarks

Meghana Puvvadi, Sai Kumar Arava, Adarsh Santoria, Sesha Sai Prasanna Chennupati, and Harsha Vardhan Puvvadi. Coding agents: A comprehensive survey of automated bug fixing systems and benchmarks. In2025 IEEE 14th International Conference on Communication Systems and Network Technologies (CSNT), pages 680–686. IEEE, 2025

2025

[24] [24]

Securing LLM Agents Need Intent-to-Execution Integrity

Wenjie Qu, Ming Xu, Peiran Wang, Shengfang Zhai, Jiaheng Zhang, and Dawn Song. Securing llm agents need intent-to-execution integrity.arXiv preprint arXiv:2605.16976, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[25] [25]

Mcp safety audit: Llms with the model context protocol allow major security exploits.arXiv preprint arXiv:2504.03767, 2025

Brandon Radosevich and John Halloran. Mcp safety audit: Llms with the model context protocol allow major security exploits.arXiv preprint arXiv:2504.03767, 2025

work page arXiv 2025

[26] [26]

Maddison, and Tatsunori Hashimoto

Yangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, and Tatsunori Hashimoto. Identifying the risks of LM agents with an LM-emulated sandbox. InProceedings of ICLR (Spotlight), 2024

2024

[27] [27]

SeClaw: Secured personal ai assistant

SaFoLab. SeClaw: Secured personal ai assistant. https://safo-lab.github.io/seclaw/,

[28] [28]

Accessed: 2026-05-06

2026

[29] [29]

Saltzer and Michael D

Jerome H. Saltzer and Michael D. Schroeder. The protection of information in computer systems. Proceedings of the IEEE, 63(9):1278–1308, 1975

1975

[30] [30]

Openclaw — personal ai assistant.github, 2026

Openclaw Team. Openclaw — personal ai assistant.github, 2026

2026

[31] [31]

arXiv preprint arXiv:2503.04957 , year =

Ada Defne Tur, Nicholas Meade, Xing Han Lù, Alejandra Zambrano, Arkil Patel, Esin Durmus, Spandana Gella, Karolina Sta ´nczak, and Siva Reddy. Safearena: Evaluating the safety of autonomous web agents.arXiv preprint arXiv:2503.04957, 2025

work page arXiv 2025

[32] [32]

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Eric Wallace, Kai Xiao, Reimar Leike, Lilian Weng, Johannes Heidecke, and Alex Beutel. The instruction hierarchy: Training LLMs to prioritize privileged instructions.arXiv preprint arXiv:2404.13208, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[33] [33]

The landscape of prompt injection threats in LLM agents: From taxonomy to analysis,

Peiran Wang, Xinfeng Li, Chong Xiang, Jinghuai Zhang, Ying Li, Lixia Zhang, Xiaofeng Wang, and Yuan Tian. The landscape of prompt injection threats in llm agents: From taxonomy to analysis.arXiv preprint arXiv:2602.10453, 2026

work page arXiv 2026

[34] [34]

A Systematic Security Evaluation of OpenClaw and Its Variants

Yuhang Wang, Haichang Gao, Zhenxing Niu, Zhaoxiang Liu, Wenjing Zhang, Xiang Wang, and Shiguo Lian. A systematic security evaluation of OpenClaw and its variants.arXiv preprint arXiv:2604.03131, April 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[35] [35]

ClawSafety: "Safe" LLMs, Unsafe Agents

Bowen Wei, Yunbei Zhang, Jinhao Pan, Kai Mei, Xiao Wang, Jihun Hamm, Ziwei Zhu, and Yingqiang Ge. ClawSafety: “Safe” LLMs, unsafe agents.arXiv preprint arXiv:2604.01438, April 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[36] [36]

SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents

Zonghao Ying, Yangguang Shao, Jianle Gan, Gan Xu, Wenxin Zhang, Quanchen Zou, Junzheng Shi, Zhenfei Yin, Mingchuan Zhang, Aishan Liu, et al. Securewebarena: A holistic security evaluation benchmark for lvlm-based web agents.arXiv preprint arXiv:2510.10073, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[37] [37]

R-Judge: Benchmarking safety risk awareness for LLM agents

Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang, Rui Wang, and Gongshen Liu. R-Judge: Benchmarking safety risk awareness for LLM agents. InFindings of EMNLP, 2024

2024

[38] [38]

InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents. InFindings of ACL, 2024. 13

2024

[39] [39]

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (asb): Formalizing and benchmarking attacks and defenses in llm-based agents.arXiv preprint arXiv:2410.02644, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[40] [40]

arXiv preprint arXiv:2508.12752 , year=

Wenlin Zhang, Xiaopeng Li, Yingyi Zhang, Pengyue Jia, Yichao Wang, Huifeng Guo, Yong Liu, and Xiangyu Zhao. Deep research: A survey of autonomous research agents.arXiv preprint arXiv:2508.12752, 2025

work page arXiv 2025

[41] [41]

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Zhexin Zhang, Shiyao Cui, Yida Lu, Jingzhuo Zhou, Junxiao Yang, Hongning Wang, and Minlie Huang. Agent-safetybench: Evaluating the safety of llm agents.arXiv preprint arXiv:2412.14470, 2024. 14 Appendix A Detailed Comparison with Prior Benchmarks 16 B Component-to-Classical-System Mapping and Security Principles 16 B.1 Component-to-Classical-System Mappin...

work page internal anchor Pith review Pith/arXiv arXiv 2024