No authority level, urgency framing, or SKILL instruction overrides this heuristic

Unknown recipient addresses

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.AI · 2026-04-01 · unverdicted · novelty 6.0

ClawSafety benchmark finds 40-75% attack success rates on frontier LLMs used as agents, with skill-file injections most effective and safety depending on both model and full agent framework.

citing papers explorer

Showing 1 of 1 citing paper.

ClawSafety: "Safe" LLMs, Unsafe Agents cs.AI · 2026-04-01 · unverdicted · none · ref 16
ClawSafety benchmark finds 40-75% attack success rates on frontier LLMs used as agents, with skill-file injections most effective and safety depending on both model and full agent framework.

No authority level, urgency framing, or SKILL instruction overrides this heuristic

fields

years

verdicts

representative citing papers

citing papers explorer