SafeClawBench supplies 600 staged adversarial tasks and three separate endpoints that show semantic acceptance, audit evidence, and sandbox-observed harm are distinct failure modes in tool-using LLM agents.
arXiv preprint arXiv:2602.14364 , year =
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 9roles
background 1polarities
background 1representative citing papers
The paper introduces SafeClawArena, a 406-task benchmark evaluating security failures in three Claw-like agent platforms across skill supply-chain, state exploitation, data flow, and prompt injection surfaces.
Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.
The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.
Claw AI agents' heartbeat background execution shares memory context with user sessions, allowing ordinary social misinformation to silently pollute long-term memory and shape behavior at rates up to 76% across sessions.
BraveGuard trains guard models on realistic agent trajectories derived from open-world threats, raising detection accuracy on AgentHazard from 38.79% to 82.38%.
The paper analyzes security, privacy, and ethical risks in the OpenClaw AI agent system arising from its architecture, storage, tool use, and integrations, arguing these form major barriers to trustworthy adoption.
This work categorizes seven risks of OpenClaw for non-technical users, provides plain-language mitigations, and supplies a companion Skill to automate security configurations.
A survey that categorizes threats to OpenClaw agents including skill poisoning and cognitive manipulation and reviews defense mechanisms.
citing papers explorer
-
Understanding and Evaluating Claw-like Agent Security Through a Computer-Systems Lens
The paper introduces SafeClawArena, a 406-task benchmark evaluating security failures in three Claw-like agent platforms across skill supply-chain, state exploitation, data flow, and prompt injection surfaces.
-
Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents
Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.
-
Quantifying Trust: Financial Risk Management for Trustworthy AI Agents
The paper introduces the Agentic Risk Standard (ARS) as a payment settlement framework that delivers predefined compensation for AI agent execution failures, misalignment, or unintended outcomes.
-
Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution
Claw AI agents' heartbeat background execution shares memory context with user sessions, allowing ordinary social misinformation to silently pollute long-term memory and shape behavior at rates up to 76% across sessions.
-
BraveGuard: From Open-World Threats to Safer Computer-Use Agents
BraveGuard trains guard models on realistic agent trajectories derived from open-world threats, raising detection accuracy on AgentHazard from 38.79% to 82.38%.
-
Security, Privacy, and Ethical Risks in OpenClaw
The paper analyzes security, privacy, and ethical risks in the OpenClaw AI agent system arising from its architecture, storage, tool use, and integrations, arguing these form major barriers to trustworthy adoption.
-
Understanding and mitigating the risks of OpenClaw for non-technical users: A practical guide with Skill
This work categorizes seven risks of OpenClaw for non-technical users, provides plain-language mitigations, and supplies a companion Skill to automate security configurations.
-
Security of OpenClaw Agents: Fundamentals, Attacks, and Countermeasures
A survey that categorizes threats to OpenClaw agents including skill poisoning and cognitive manipulation and reviews defense mechanisms.