Validating llm-as-a-judge systems under rating indeterminacy

Luke Guerdan, Solon Barocas, Ken Holstein, Hanna Wallach, Steven Wu, Alexandra Chouldechova · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Heterogeneous Judge-Aware Ranking with Sensitivity, Disagreement, and Confidence

stat.ME · 2026-05-06 · unverdicted · novelty 6.0

HJA ranking separates consensus ranking, judge sensitivity, and residual disagreement as distinct inferential targets with identifiability conditions and an anchored alternating algorithm, yielding better recovery and uncertainty calibration than pooled baselines on synthetic and real data.

citing papers explorer

Showing 1 of 1 citing paper.

Heterogeneous Judge-Aware Ranking with Sensitivity, Disagreement, and Confidence stat.ME · 2026-05-06 · unverdicted · none · ref 11
HJA ranking separates consensus ranking, judge sensitivity, and residual disagreement as distinct inferential targets with identifiability conditions and an anchored alternating algorithm, yielding better recovery and uncertainty calibration than pooled baselines on synthetic and real data.

Validating llm-as-a-judge systems under rating indeterminacy

fields

years

verdicts

representative citing papers

citing papers explorer