hub

From assistant to double agent: Formalizing and benchmarking attacks on OpenClaw for personalized local AI agent

Yuhang Wang, Feiming Xu, Zheng Lin, Guangyu He, Yuzhe Huang, Haichang Gao, Zhenxing Niu, Shiguo Lian, Zhaoxiang Liu · 2026 · DOI 10.48550/arxiv.2602.08412 · arXiv 2602.08412

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

open at publisher browse 15 citing papers arXiv PDF

hub tools

JSON dossier citing papers JSON publisher DOI arXiv source

citation-role summary

background 3

citation-polarity summary

background 2 support 1

representative citing papers

MalSkillBench: A Runtime-Verified Benchmark of Malicious Agent Skills

cs.CR · 2026-06-05 · unverdicted · novelty 8.0

MalSkillBench supplies the first sandbox-verified dataset of malicious agent skills and shows that existing detectors achieve high recall on code injection but collapse on prompt injection and agent-control attacks.

When Claws Remember but Do Not Tell: Stealthy Memory Injection in Persistent Personal Agents

cs.CR · 2026-07-06 · conditional · novelty 7.0

A trained attack model generates single emails that silently inject false memories into persistent AI agents, achieving 87.5% end-to-end success on GPT-5.4 and transferring across architectures and memory backends.

SafeClawBench: Separating Semantic, Audit-Evidence, and Sandbox Harm in Tool-Using LLM Agents

cs.CR · 2026-06-16 · accept · novelty 7.0

SafeClawBench supplies 600 staged adversarial tasks and three separate endpoints that show semantic acceptance, audit evidence, and sandbox-observed harm are distinct failure modes in tool-using LLM agents.

Systems-Level Attack Surface of Edge Agent Deployments on IoT

cs.CR · 2026-02-26 · unverdicted · novelty 7.0

Edge-local LLM agent deployments on IoT eliminate routine cloud data exposure but degrade sovereignty during fallbacks and create exploitable failover windows, making architecture a primary security determinant.

UniClawBench: A Universal Benchmark for Proactive Agents on Real-World Tasks

cs.CL · 2026-07-09 · conditional · novelty 6.0

A capability-driven benchmark of 400 bilingual real-world tasks shows current proactive agents fail >50% of the time, with framework architecture impacting performance more than base model choice.

ASEval: Automated Trajectory-Level Security Testing for Autonomous Agents

cs.CR · 2026-05-21 · conditional · novelty 6.0

A new multi-turn security benchmark shows OpenClaw agents across ten LLMs trigger harmful actions in 28–53% of adversarial cases, and fragmented or file-hidden attacks roughly double baseline risk rates.

LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection

cs.CR · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

LivePI benchmark reports indirect prompt injection success rates of 10.7-29.6% across five models on seven input surfaces and shows a two-layer defense blocking all malicious completions while preserving utility.

When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

Routine user chats can unintentionally poison the long-term state of personalized LLM agents, causing authorization drift, tool escalation, and unchecked autonomy, as measured by a new benchmark and reduced by the StateGuard defense.

Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution

cs.CR · 2026-03-24 · unverdicted · novelty 6.0

Claw AI agents' heartbeat background execution shares memory context with user sessions, allowing ordinary social misinformation to silently pollute long-term memory and shape behavior at rates up to 76% across sessions.

HearthNet: Edge Multi-Agent Orchestration for Smart Homes

cs.DC · 2026-03-16 · unverdicted · novelty 6.0

HearthNet is an edge multi-agent orchestration system that runs role-specialized LLM agents locally to handle natural-language smart-home control, conflict resolution, and failure recovery through MQTT and shared state.

Harnessing Embodied Agents: Runtime Governance for Policy-Constrained Execution

cs.RO · 2026-04-09 · conditional · novelty 5.5 · 2 refs

An external runtime governance layer for embodied agents intercepts unauthorized actions at ~96% and recovers from runtime drift at ~91% under policy constraints in simulation, outperforming pre-execution-only baselines on continuous detection and recovery.

Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability

cs.CL · 2026-05-08 · unverdicted · novelty 4.0

The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

cs.CR · 2026-04-30 · conditional · novelty 4.0

The survey organizes security threats and defenses in autonomous LLM agents into four layers and identifies that risks can propagate across layers from inputs to ecosystem impacts.

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

cs.CR · 2026-06-09 · unverdicted · novelty 3.0

A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

Security of OpenClaw Agents: Fundamentals, Attacks, and Countermeasures

cs.AI · 2026-05-25 · unverdicted · novelty 2.0

A survey that categorizes threats to OpenClaw agents including skill poisoning and cognitive manipulation and reviews defense mechanisms.

citing papers explorer

Showing 15 of 15 citing papers.

MalSkillBench: A Runtime-Verified Benchmark of Malicious Agent Skills cs.CR · 2026-06-05 · unverdicted · none · ref 55
MalSkillBench supplies the first sandbox-verified dataset of malicious agent skills and shows that existing detectors achieve high recall on code injection but collapse on prompt injection and agent-control attacks.
When Claws Remember but Do Not Tell: Stealthy Memory Injection in Persistent Personal Agents cs.CR · 2026-07-06 · conditional · none · ref 29
A trained attack model generates single emails that silently inject false memories into persistent AI agents, achieving 87.5% end-to-end success on GPT-5.4 and transferring across architectures and memory backends.
SafeClawBench: Separating Semantic, Audit-Evidence, and Sandbox Harm in Tool-Using LLM Agents cs.CR · 2026-06-16 · accept · none · ref 32
SafeClawBench supplies 600 staged adversarial tasks and three separate endpoints that show semantic acceptance, audit evidence, and sandbox-observed harm are distinct failure modes in tool-using LLM agents.
Systems-Level Attack Surface of Edge Agent Deployments on IoT cs.CR · 2026-02-26 · unverdicted · none · ref 29
Edge-local LLM agent deployments on IoT eliminate routine cloud data exposure but degrade sovereignty during fallbacks and create exploitable failover windows, making architecture a primary security determinant.
UniClawBench: A Universal Benchmark for Proactive Agents on Real-World Tasks cs.CL · 2026-07-09 · conditional · none · ref 41
A capability-driven benchmark of 400 bilingual real-world tasks shows current proactive agents fail >50% of the time, with framework architecture impacting performance more than base model choice.
ASEval: Automated Trajectory-Level Security Testing for Autonomous Agents cs.CR · 2026-05-21 · conditional · none · ref 17
A new multi-turn security benchmark shows OpenClaw agents across ten LLMs trigger harmful actions in 28–53% of adversarial cases, and fragmented or file-hidden attacks roughly double baseline risk rates.
LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection cs.CR · 2026-05-18 · unverdicted · none · ref 11 · 2 links
LivePI benchmark reports indirect prompt injection success rates of 10.7-29.6% across five models on seven input surfaces and shows a two-layer defense blocking all malicious completions while preserving utility.
When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents cs.CR · 2026-05-07 · unverdicted · none · ref 26
Routine user chats can unintentionally poison the long-term state of personalized LLM agents, causing authorization drift, tool escalation, and unchecked autonomy, as measured by a new benchmark and reduced by the StateGuard defense.
Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution cs.CR · 2026-03-24 · unverdicted · none · ref 14
Claw AI agents' heartbeat background execution shares memory context with user sessions, allowing ordinary social misinformation to silently pollute long-term memory and shape behavior at rates up to 76% across sessions.
HearthNet: Edge Multi-Agent Orchestration for Smart Homes cs.DC · 2026-03-16 · unverdicted · none · ref 19
HearthNet is an edge multi-agent orchestration system that runs role-specialized LLM agents locally to handle natural-language smart-home control, conflict resolution, and failure recovery through MQTT and shared state.
Harnessing Embodied Agents: Runtime Governance for Policy-Constrained Execution cs.RO · 2026-04-09 · conditional · none · ref 17 · 2 links
An external runtime governance layer for embodied agents intercepts unauthorized actions at ~96% and recovers from runtime drift at ~91% under policy constraints in simulation, outperforming pre-execution-only baselines on continuous detection and recovery.
Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability cs.CL · 2026-05-08 · unverdicted · none · ref 128
The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.
Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study cs.CR · 2026-04-30 · conditional · none · ref 15
The survey organizes security threats and defenses in autonomous LLM agents into four layers and identifies that risks can propagate across layers from inputs to ecosystem impacts.
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation cs.CR · 2026-06-09 · unverdicted · none · ref 192
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.
Security of OpenClaw Agents: Fundamentals, Attacks, and Countermeasures cs.AI · 2026-05-25 · unverdicted · none · ref 10
A survey that categorizes threats to OpenClaw agents including skill poisoning and cognitive manipulation and reviews defense mechanisms.

From assistant to double agent: Formalizing and benchmarking attacks on OpenClaw for personalized local AI agent

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer