Computer-use agents show attack success rates above 90% on benign instructions that produce harm via context or execution, with safety-aligned Claude 4.5 Sonnet at 73% ASR rising to 92.7% in multi-agent deployments.
Report phishing
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents
Computer-use agents show attack success rates above 90% on benign instructions that produce harm via context or execution, with safety-aligned Claude 4.5 Sonnet at 73% ASR rising to 92.7% in multi-agent deployments.