HJA ranking separates consensus ranking, judge sensitivity, and residual disagreement as distinct inferential targets with identifiability conditions and an anchored alternating algorithm, yielding better recovery and uncertainty calibration than pooled baselines on synthetic and real data.
Validating llm-as-a-judge systems under rating indeterminacy
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ME 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Heterogeneous Judge-Aware Ranking with Sensitivity, Disagreement, and Confidence
HJA ranking separates consensus ranking, judge sensitivity, and residual disagreement as distinct inferential targets with identifiability conditions and an anchored alternating algorithm, yielding better recovery and uncertainty calibration than pooled baselines on synthetic and real data.