pith. machine review for the scientific record. sign in

You may need tools such as pins, saws, and tape

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2024 1

verdicts

CONDITIONAL 1

representative citing papers

A StrongREJECT for Empty Jailbreaks

cs.LG · 2024-02-15 · conditional · novelty 6.0

StrongREJECT provides a standardized benchmark and evaluator for jailbreak attacks that aligns better with human judgments than prior methods and reveals that successful jailbreaks often reduce model capabilities.

citing papers explorer

Showing 1 of 1 citing paper.

  • A StrongREJECT for Empty Jailbreaks cs.LG · 2024-02-15 · conditional · none · ref 53

    StrongREJECT provides a standardized benchmark and evaluator for jailbreak attacks that aligns better with human judgments than prior methods and reveals that successful jailbreaks often reduce model capabilities.