HarmBench: A standardized evaluation framework for automated red teaming and robust refusal

Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendrycks · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Position: AI Security Policy Should Target Systems, Not Models

cs.CR · 2026-05-10 · unverdicted · novelty 6.0

Coordinated swarms of small open LLMs achieve frontier-model jailbreaks and full vulnerability recovery at zero cost, demonstrating that system scaffolds enable capabilities previously thought to require restricted large models.

citing papers explorer

Showing 1 of 1 citing paper.

Position: AI Security Policy Should Target Systems, Not Models cs.CR · 2026-05-10 · unverdicted · none · ref 8
Coordinated swarms of small open LLMs achieve frontier-model jailbreaks and full vulnerability recovery at zero cost, demonstrating that system scaffolds enable capabilities previously thought to require restricted large models.

HarmBench: A standardized evaluation framework for automated red teaming and robust refusal

fields

years

verdicts

representative citing papers

citing papers explorer