hub

Don’t let the claw grip your hand: A security analysis and defense framework for OpenClaw

author Shan, Z · 2026 · arXiv 2603.10387

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 baseline 1

citation-polarity summary

background 1 baseline 1 unclear 1

representative citing papers

Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values

cs.AI · 2026-05-11 · unverdicted · novelty 8.0

Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.

AgentCanary: A Security Evaluation Framework for Autonomous AI Agents in Real Executable Environments

cs.CR · 2026-06-09 · unverdicted · novelty 7.0

AgentCanary introduces an Entry × Impact risk taxonomy, high-fidelity real tool environments with persistent state, and multi-dimensional trajectory evaluation to assess AI agent security across models and attacks.

From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents

cs.CR · 2026-06-03 · unverdicted · novelty 7.0

The study identifies four memory write channels and nine structural vulnerabilities in LLM agents, proposes a taxonomy of six attack classes, introduces MPBench, and finds that aggressive memory use increases exploitability while existing defenses fail.

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

cs.AI · 2026-04-03 · unverdicted · novelty 7.0

AgentHazard benchmark shows computer-use agents remain highly vulnerable, with attack success rates reaching 73.63% on models like Qwen3-Coder powering Claude Code.

SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

cs.CR · 2026-06-01 · unverdicted · novelty 6.0

SeClaw provides spec-driven synthesis of security tasks and an execution-based docker testbed for evaluating unsafe behaviors in autonomous LLM agents.

Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions

cs.CR · 2026-05-21 · unverdicted · novelty 6.0

A3S-Bench evaluates LLM agents against temporal, spatial, and semantic evasions, raising average risk trigger rates from 28.3% to 52.6% across 2,254 trajectories and 20 scenarios.

LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection

cs.CR · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

LivePI benchmark reports indirect prompt injection success rates of 10.7-29.6% across five models on seven input surfaces and shows a two-layer defense blocking all malicious completions while preserving utility.

A Systematic Security Evaluation of OpenClaw and Its Variants

cs.CR · 2026-04-03 · unverdicted · novelty 6.0

All six evaluated OpenClaw agent frameworks exhibit substantial security vulnerabilities, with reconnaissance behaviors as the most common weakness and agent systems proving significantly riskier than isolated backbone models.

Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution

cs.CR · 2026-03-24 · unverdicted · novelty 6.0

Claw AI agents' heartbeat background execution shares memory context with user sessions, allowing ordinary social misinformation to silently pollute long-term memory and shape behavior at rates up to 76% across sessions.

Overlaying Governance: A Compositional Authorization Framework for Delegation and Scope in Agentic AI

cs.AI · 2026-06-02 · unverdicted · novelty 5.0

Introduces a compositional governance framework defining delegation types, resource scope attenuation, and an overlay operator for agentic AI authorization policies.

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

cs.CR · 2026-05-31 · unverdicted · novelty 5.0

BraveGuard trains guard models on realistic agent trajectories derived from open-world threats, raising detection accuracy on AgentHazard from 38.79% to 82.38%.

Reframing LLM Agent Security as an Agent-Human Interaction Problem

cs.CR · 2026-05-23 · unverdicted · novelty 5.0

LLM agent security is reframed as an agent-human interaction issue, supported by a survey showing industry preference for human-centric mechanisms over academic favorites and proposing a new research agenda.

Generative AI Advertising as a Problem of Trustworthy Commercial Intervention

cs.CY · 2026-05-18 · unverdicted · novelty 5.0

Generative AI advertising is reframed as a problem of trustworthy commercial intervention on the generative process, with a taxonomy of influence tiers from product mentions to long-term preference shaping.

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

cs.CR · 2026-05-07 · unverdicted · novelty 5.0

A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.

AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent

cs.LG · 2026-04-07 · unverdicted · novelty 5.0

AgentOpt introduces a framework-agnostic package that uses algorithms like UCB-E to find cost-effective model assignments in multi-step LLM agent pipelines, cutting evaluation budgets by 62-76% while maintaining near-optimal accuracy on benchmarks.

Understanding and mitigating the risks of OpenClaw for non-technical users: A practical guide with Skill

cs.CR · 2026-06-09 · unverdicted · novelty 2.0

This work categorizes seven risks of OpenClaw for non-technical users, provides plain-language mitigations, and supplies a companion Skill to automate security configurations.

Security of OpenClaw Agents: Fundamentals, Attacks, and Countermeasures

cs.AI · 2026-05-25 · unverdicted · novelty 2.0

A survey that categorizes threats to OpenClaw agents including skill poisoning and cognitive manipulation and reviews defense mechanisms.

citing papers explorer

Showing 17 of 17 citing papers.

Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values cs.AI · 2026-05-11 · unverdicted · none · ref 15
Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.
AgentCanary: A Security Evaluation Framework for Autonomous AI Agents in Real Executable Environments cs.CR · 2026-06-09 · unverdicted · none · ref 12
AgentCanary introduces an Entry × Impact risk taxonomy, high-fidelity real tool environments with persistent state, and multi-dimensional trajectory evaluation to assess AI agent security across models and attacks.
From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents cs.CR · 2026-06-03 · unverdicted · none · ref 1
The study identifies four memory write channels and nine structural vulnerabilities in LLM agents, proposes a taxonomy of six attack classes, introduces MPBench, and finds that aggressive memory use increases exploitability while existing defenses fail.
AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents cs.AI · 2026-04-03 · unverdicted · none · ref 23
AgentHazard benchmark shows computer-use agents remain highly vulnerable, with attack success rates reaching 73.63% on models like Qwen3-Coder powering Claude Code.
SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents cs.CR · 2026-06-01 · unverdicted · none · ref 12
SeClaw provides spec-driven synthesis of security tasks and an execution-based docker testbed for evaluating unsafe behaviors in autonomous LLM agents.
Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions cs.CR · 2026-05-21 · unverdicted · none · ref 24
A3S-Bench evaluates LLM agents against temporal, spatial, and semantic evasions, raising average risk trigger rates from 28.3% to 52.6% across 2,254 trajectories and 20 scenarios.
LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection cs.CR · 2026-05-18 · unverdicted · none · ref 14 · 2 links
LivePI benchmark reports indirect prompt injection success rates of 10.7-29.6% across five models on seven input surfaces and shows a two-layer defense blocking all malicious completions while preserving utility.
A Systematic Security Evaluation of OpenClaw and Its Variants cs.CR · 2026-04-03 · unverdicted · none · ref 4
All six evaluated OpenClaw agent frameworks exhibit substantial security vulnerabilities, with reconnaissance behaviors as the most common weakness and agent systems proving significantly riskier than isolated backbone models.
Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution cs.CR · 2026-03-24 · unverdicted · none · ref 11
Claw AI agents' heartbeat background execution shares memory context with user sessions, allowing ordinary social misinformation to silently pollute long-term memory and shape behavior at rates up to 76% across sessions.
Overlaying Governance: A Compositional Authorization Framework for Delegation and Scope in Agentic AI cs.AI · 2026-06-02 · unverdicted · none · ref 33
Introduces a compositional governance framework defining delegation types, resource scope attenuation, and an overlay operator for agentic AI authorization policies.
BraveGuard: From Open-World Threats to Safer Computer-Use Agents cs.CR · 2026-05-31 · unverdicted · none · ref 31
BraveGuard trains guard models on realistic agent trajectories derived from open-world threats, raising detection accuracy on AgentHazard from 38.79% to 82.38%.
Reframing LLM Agent Security as an Agent-Human Interaction Problem cs.CR · 2026-05-23 · unverdicted · none · ref 46
LLM agent security is reframed as an agent-human interaction issue, supported by a survey showing industry preference for human-centric mechanisms over academic favorites and proposing a new research agenda.
Generative AI Advertising as a Problem of Trustworthy Commercial Intervention cs.CY · 2026-05-18 · unverdicted · none · ref 69
Generative AI advertising is reframed as a problem of trustworthy commercial intervention on the generative process, with a taxonomy of influence tiers from product mentions to long-term preference shaping.
Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation cs.CR · 2026-05-07 · unverdicted · none · ref 19
A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.
AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent cs.LG · 2026-04-07 · unverdicted · none · ref 22
AgentOpt introduces a framework-agnostic package that uses algorithms like UCB-E to find cost-effective model assignments in multi-step LLM agent pipelines, cutting evaluation budgets by 62-76% while maintaining near-optimal accuracy on benchmarks.
Understanding and mitigating the risks of OpenClaw for non-technical users: A practical guide with Skill cs.CR · 2026-06-09 · unverdicted · none · ref 18
This work categorizes seven risks of OpenClaw for non-technical users, provides plain-language mitigations, and supplies a companion Skill to automate security configurations.
Security of OpenClaw Agents: Fundamentals, Attacks, and Countermeasures cs.AI · 2026-05-25 · unverdicted · none · ref 43
A survey that categorizes threats to OpenClaw agents including skill poisoning and cognitive manipulation and reviews defense mechanisms.

Don’t let the claw grip your hand: A security analysis and defense framework for OpenClaw

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer