hub Baseline reference

Os-harm: A benchmark for measuring safety of computer use agents

Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, Zico Kolter, Nicolas Flammarion, Maksym Andriushchenko · 2025 · arXiv 2506.14866

Baseline reference. 57% of citing Pith papers use this work as a benchmark or comparison.

20 Pith papers citing it

Baseline 57% of classified citations

read on arXiv browse 20 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 dataset 3 baseline 1

citation-polarity summary

background 3 use dataset 3 baseline 1

representative citing papers

NRT-Bench: Benchmarking Multi-Turn Red-Teaming of LLM Operator Agents in Safety-Critical Control Rooms

cs.CR · 2026-06-18 · unverdicted · novelty 7.0

NRT-Bench reports that adaptive multi-turn attacks cause critical safety function loss in 8.7-12.1% of sessions across four frontier LLM operator models, with nearly disjoint vulnerabilities and strongly model-dependent defense effects.

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

cs.CR · 2026-06-11 · unverdicted · novelty 7.0

Introduces a stakeholder-centric benchmark showing current web agents fail all tested prompt injection objectives, with failures falling into stealthy parasitism, misaligned disruption, or compounded failure modes.

When the Manual Lies: A Realistic Benchmark to Evaluate MCP Poisoning Attacks for LLM Agents

cs.CR · 2026-05-22 · unverdicted · novelty 7.0

Introduces MCP-TDP benchmark showing near-100% attack success on models like GPT-4o for tool description poisoning and proposes reactive self-correction defense.

Do Coding Agents Understand Least-Privilege Authorization?

cs.CR · 2026-05-14 · unverdicted · novelty 7.0

Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.

Don't Click That: Teaching Web Agents to Resist Deceptive Interfaces

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

DUDE framework reduces web agents' susceptibility to deceptive UIs by 53.8% on a new 1,407-scenario benchmark while preserving task performance.

MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

cs.CR · 2026-05-05 · unverdicted · novelty 7.0

MOSAIC-Bench demonstrates that nine production coding agents achieve 53-86% end-to-end attack success rates on staged innocuous tickets across 10 web substrates and 31 CWE classes, far higher than the 0-20.4% rates seen with direct prompts.

OS-SPEAR: A Toolkit for the Safety, Performance,Efficiency, and Robustness Analysis of OS Agents

cs.CL · 2026-04-27 · unverdicted · novelty 7.0

OS-SPEAR is a new evaluation toolkit that tests 22 OS agents and identifies trade-offs between efficiency and safety or robustness.

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

cs.CR · 2026-04-12 · unverdicted · novelty 7.0

Computer-use agents show attack success rates above 90% on benign instructions that produce harm via context or execution, with safety-aligned Claude 4.5 Sonnet at 73% ASR rising to 92.7% in multi-agent deployments.

A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents

cs.AI · 2025-12-23 · unverdicted · novelty 7.0

A new benchmark of 40 scenarios finds state-of-the-art LLMs exhibit outcome-driven constraint violations in 0-62.8% of cases under KPI pressure, with no consistent safety gains across model generations.

OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks

cs.AI · 2026-06-28 · unverdicted · novelty 6.0

OSWorld 2.0 is a benchmark of 108 realistic long-horizon computer-use tasks where current agents achieve only 20.6% binary completion, struggling with state inference and constraint tracking.

LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection

cs.CR · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

LivePI benchmark reports indirect prompt injection success rates of 10.7-29.6% across five models on seven input surfaces and shows a two-layer defense blocking all malicious completions while preserving utility.

PageGuide: Browser extension to assist users in navigating a webpage and locating information

cs.HC · 2026-04-26 · unverdicted · novelty 6.0 · 2 refs

PageGuide is a browser extension that grounds LLM responses in webpage DOM elements via visual overlays for Find, Guide, and Hide modes, reporting performance gains over unaided browsing in a 94-user study.

ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis

cs.AI · 2026-04-02 · unverdicted · novelty 6.0 · 2 refs

ATBench is a new trajectory-level benchmark with 1,000 diverse and realistic scenarios for assessing safety in LLM agents.

ProjGuard: Safety Monitoring for Computer-Use Agents via Low-Dimensional Projections

stat.CO · 2026-05-13 · unverdicted · novelty 5.0

ProjGuard monitors agent trajectories with low-dimensional projections to cut unsafe actions from 16% to 3% and raise task completion from 59% to 65% on OS-Harm.

Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation

cs.CR · 2026-05-07 · unverdicted · novelty 5.0

A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

cs.AI · 2026-04-30 · unverdicted · novelty 5.0

The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a future roadmap.

Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability

cs.CL · 2026-05-08 · unverdicted · novelty 4.0

The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

cs.AI · 2025-10-27 · unverdicted · novelty 4.0

A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.

Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation

cs.CR · 2026-06-09 · unverdicted · novelty 3.0

A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.

Human-Guided Harm Recovery for Computer Use Agents

cs.AI · 2026-04-20

citing papers explorer

Showing 20 of 20 citing papers.

NRT-Bench: Benchmarking Multi-Turn Red-Teaming of LLM Operator Agents in Safety-Critical Control Rooms cs.CR · 2026-06-18 · unverdicted · none · ref 11
NRT-Bench reports that adaptive multi-turn attacks cause critical safety function loss in 8.7-12.1% of sessions across four frontier LLM operator models, with nearly disjoint vulnerabilities and strongly model-dependent defense effects.
Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents cs.CR · 2026-06-11 · unverdicted · none · ref 32
Introduces a stakeholder-centric benchmark showing current web agents fail all tested prompt injection objectives, with failures falling into stealthy parasitism, misaligned disruption, or compounded failure modes.
When the Manual Lies: A Realistic Benchmark to Evaluate MCP Poisoning Attacks for LLM Agents cs.CR · 2026-05-22 · unverdicted · none · ref 28
Introduces MCP-TDP benchmark showing near-100% attack success on models like GPT-4o for tool description poisoning and proposes reactive self-correction defense.
Do Coding Agents Understand Least-Privilege Authorization? cs.CR · 2026-05-14 · unverdicted · none · ref 56
Coding agents struggle to infer least-privilege file permissions by omitting needed accesses while granting unused or sensitive ones, but Sufficiency-Tightness Decomposition improves sensitive-task success by up to 15.8% and reduces attacks.
Don't Click That: Teaching Web Agents to Resist Deceptive Interfaces cs.AI · 2026-05-10 · unverdicted · none · ref 2
DUDE framework reduces web agents' susceptibility to deceptive UIs by 53.8% on a new 1,407-scenario benchmark while preserving task performance.
MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents cs.CR · 2026-05-05 · unverdicted · none · ref 4
MOSAIC-Bench demonstrates that nine production coding agents achieve 53-86% end-to-end attack success rates on staged innocuous tickets across 10 web substrates and 31 CWE classes, far higher than the 0-20.4% rates seen with direct prompts.
OS-SPEAR: A Toolkit for the Safety, Performance,Efficiency, and Robustness Analysis of OS Agents cs.CL · 2026-04-27 · unverdicted · none · ref 78
OS-SPEAR is a new evaluation toolkit that tests 22 OS agents and identifies trade-offs between efficiency and safety or robustness.
The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents cs.CR · 2026-04-12 · unverdicted · none · ref 4
Computer-use agents show attack success rates above 90% on benign instructions that produce harm via context or execution, with safety-aligned Claude 4.5 Sonnet at 73% ASR rising to 92.7% in multi-agent deployments.
A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents cs.AI · 2025-12-23 · unverdicted · none · ref 10
A new benchmark of 40 scenarios finds state-of-the-art LLMs exhibit outcome-driven constraint violations in 0-62.8% of cases under KPI pressure, with no consistent safety gains across model generations.
OSWorld2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks cs.AI · 2026-06-28 · unverdicted · none · ref 40
OSWorld 2.0 is a benchmark of 108 realistic long-horizon computer-use tasks where current agents achieve only 20.6% binary completion, struggling with state inference and constraint tracking.
LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection cs.CR · 2026-05-18 · unverdicted · none · ref 6 · 2 links
LivePI benchmark reports indirect prompt injection success rates of 10.7-29.6% across five models on seven input surfaces and shows a two-layer defense blocking all malicious completions while preserving utility.
PageGuide: Browser extension to assist users in navigating a webpage and locating information cs.HC · 2026-04-26 · unverdicted · none · ref 22 · 2 links
PageGuide is a browser extension that grounds LLM responses in webpage DOM elements via visual overlays for Find, Guide, and Hide modes, reporting performance gains over unaided browsing in a 94-user study.
ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis cs.AI · 2026-04-02 · unverdicted · none · ref 11 · 2 links
ATBench is a new trajectory-level benchmark with 1,000 diverse and realistic scenarios for assessing safety in LLM agents.
ProjGuard: Safety Monitoring for Computer-Use Agents via Low-Dimensional Projections stat.CO · 2026-05-13 · unverdicted · none · ref 3
ProjGuard monitors agent trajectories with low-dimensional projections to cut unsafe actions from 16% to 3% and raise task completion from 59% to 65% on OS-Harm.
Constraining Host-Level Abuse in Self-Hosted Computer-Use Agents via TEE-Backed Isolation cs.CR · 2026-05-07 · unverdicted · none · ref 25
A TEE-backed architecture isolates security-critical decisions in self-hosted AI agents to prevent host-level abuse from malicious inputs while maintaining allowed functionality.
GUI Agents with Reinforcement Learning: Toward Digital Inhabitants cs.AI · 2026-04-30 · unverdicted · none · ref 37
The paper delivers the first comprehensive overview of RL for GUI agents, organizing methods into offline, online, and hybrid strategies while analyzing trends in rewards, efficiency, and deliberation to outline a future roadmap.
Securing Computer-Use Agents: A Unified Architecture-Lifecycle Framework for Deployment-Grounded Reliability cs.CL · 2026-05-08 · unverdicted · none · ref 41
The paper develops a unified framework that organizes computer-use agent reliability around perception-decision-execution layers and creation-deployment-operation-maintenance stages to map security and alignment interventions.
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges cs.AI · 2025-10-27 · unverdicted · none · ref 239
A survey that taxonomizes threats to agentic AI, reviews benchmarks and evaluation methods, discusses technical and governance defenses, and identifies open challenges.
Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation cs.CR · 2026-06-09 · unverdicted · none · ref 88
A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.
Human-Guided Harm Recovery for Computer Use Agents cs.AI · 2026-04-20 · unreviewed · ref 16

Os-harm: A benchmark for measuring safety of computer use agents

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer