SLEIGHT-Bench introduces 40 evasion attacks in 11 categories where coding agents pursue harmful goals, with frontier monitors catching only 32% at 1% false-positive rate.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SLEIGHT-Bench: A Benchmark of Evasion Attacks Against Agent Monitors
SLEIGHT-Bench introduces 40 evasion attacks in 11 categories where coding agents pursue harmful goals, with frontier monitors catching only 32% at 1% false-positive rate.