StrongREJECT for empty jailbreaks, 2024

Alexandra Souly, Qingyuan Lu, Dillon Hamber, Tushar Khot, Shashank Goel, Johnny Xue, Andy Zou, Fazl Barez, Dylan Hadfield-Menell, Jacob Steinhardt · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Context Over Content: Exposing Evaluation Faking in Automated Judges

cs.AI · 2026-04-16 · conditional · novelty 8.0

LLM judges exhibit up to 9.8 percentage point leniency bias from stakes signaling in prompts, acting implicitly without mentioning it in chain-of-thought.

citing papers explorer

Showing 1 of 1 citing paper.

Context Over Content: Exposing Evaluation Faking in Automated Judges cs.AI · 2026-04-16 · conditional · none · ref 12
LLM judges exhibit up to 9.8 percentage point leniency bias from stakes signaling in prompts, acting implicitly without mentioning it in chain-of-thought.

StrongREJECT for empty jailbreaks, 2024

fields

years

verdicts

representative citing papers

citing papers explorer