Each portion of the given instruction should appear in at most one constraint, and must not be repeated across multiple constraints

Do not output duplicate constraints

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation

cs.CL · 2026-03-05 · unverdicted · novelty 7.0

IF-RewardBench uses preference graphs for listwise evaluation of judge models on instruction-following, exposing deficiencies in current judges and achieving stronger correlation with downstream task performance than existing benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation cs.CL · 2026-03-05 · unverdicted · none · ref 45
IF-RewardBench uses preference graphs for listwise evaluation of judge models on instruction-following, exposing deficiencies in current judges and achieving stronger correlation with downstream task performance than existing benchmarks.

Each portion of the given instruction should appear in at most one constraint, and must not be repeated across multiple constraints

fields

years

verdicts

representative citing papers

citing papers explorer