If this work reduces existential risks indirectly or diffusely, what are the main contributing factors that it affects? Answer: Improved monitoring tools, safety culture

Diffuse Effects

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

citation-role summary

other 1

citation-polarity summary

unclear 1

representative citing papers

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

cs.LG · 2024-02-06 · unverdicted · novelty 6.0

HarmBench is a new standardized benchmark for red teaming LLMs that supports large-scale comparisons of 18 attack methods and 33 models plus an efficient adversarial training defense.

citing papers explorer

Showing 1 of 1 citing paper.

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal cs.LG · 2024-02-06 · unverdicted · none · ref 32
HarmBench is a new standardized benchmark for red teaming LLMs that supports large-scale comparisons of 18 attack methods and 33 models plus an efficient adversarial training defense.

If this work reduces existential risks indirectly or diffusely, what are the main contributing factors that it affects? Answer: Improved monitoring tools, safety culture

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer