Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.
Don’t let the claw grip your hand: A security analysis and defense framework for OpenClaw
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 9representative citing papers
AgentHazard benchmark shows computer-use agents remain highly vulnerable, with attack success rates reaching 73.63% on models like Qwen3-Coder powering Claude Code.
A3S-Bench evaluates LLM agents against temporal, spatial, and semantic evasions, raising average risk trigger rates from 28.3% to 52.6% across 2,254 trajectories and 20 scenarios.
All six evaluated OpenClaw agent frameworks exhibit substantial security vulnerabilities, with reconnaissance behaviors as the most common weakness and agent systems proving significantly riskier than isolated backbone models.
Claw AI agents' heartbeat background execution shares memory context with user sessions, allowing ordinary social misinformation to silently pollute long-term memory and shape behavior at rates up to 76% across sessions.
Generative AI advertising is reframed as a problem of trustworthy commercial intervention on the generative process, with a taxonomy of influence tiers from product mentions to long-term preference shaping.
A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.
AgentOpt introduces a framework-agnostic package that uses algorithms like UCB-E to find cost-effective model assignments in multi-step LLM agent pipelines, cutting evaluation budgets by 62-76% while maintaining near-optimal accuracy on benchmarks.
citing papers explorer
-
Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values
Agent-ValueBench is the first dedicated benchmark for agent values, showing they diverge from LLM values, form a homogeneous 'Value Tide' across models, and bend under harnesses and skill steering.
-
AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents
AgentHazard benchmark shows computer-use agents remain highly vulnerable, with attack success rates reaching 73.63% on models like Qwen3-Coder powering Claude Code.
-
Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions
A3S-Bench evaluates LLM agents against temporal, spatial, and semantic evasions, raising average risk trigger rates from 28.3% to 52.6% across 2,254 trajectories and 20 scenarios.
-
A Systematic Security Evaluation of OpenClaw and Its Variants
All six evaluated OpenClaw agent frameworks exhibit substantial security vulnerabilities, with reconnaissance behaviors as the most common weakness and agent systems proving significantly riskier than isolated backbone models.
-
Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution
Claw AI agents' heartbeat background execution shares memory context with user sessions, allowing ordinary social misinformation to silently pollute long-term memory and shape behavior at rates up to 76% across sessions.
-
Generative AI Advertising as a Problem of Trustworthy Commercial Intervention
Generative AI advertising is reframed as a problem of trustworthy commercial intervention on the generative process, with a taxonomy of influence tiers from product mentions to long-term preference shaping.
-
Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation
A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.
-
AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent
AgentOpt introduces a framework-agnostic package that uses algorithms like UCB-E to find cost-effective model assignments in multi-step LLM agent pipelines, cutting evaluation budgets by 62-76% while maintaining near-optimal accuracy on benchmarks.
- LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection