Depending on the test case, the agent encounters adversarial content through exactly one of the three injection vectors

Injection encounter (turns 46–48):The user asks the agent to read meeting notes, process emails, cross-reference configuration sources

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

ClawSafety: "Safe" LLMs, Unsafe Agents

cs.AI · 2026-04-01 · unverdicted · novelty 6.0

ClawSafety benchmark finds 40-75% attack success rates on frontier LLMs used as agents, with skill-file injections most effective and safety depending on both model and full agent framework.

citing papers explorer

Showing 1 of 1 citing paper.

ClawSafety: "Safe" LLMs, Unsafe Agents cs.AI · 2026-04-01 · unverdicted · none · ref 13
ClawSafety benchmark finds 40-75% attack success rates on frontier LLMs used as agents, with skill-file injections most effective and safety depending on both model and full agent framework.

Depending on the test case, the agent encounters adversarial content through exactly one of the three injection vectors

fields

years

verdicts

representative citing papers

citing papers explorer