pith. sign in

Heterogeneous Judge-Aware Ranking with Sensitivity, Disagreement, and Confidence

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Pairwise comparisons from multiple judges are central to large language model evaluation and preference modeling, yet standard ranking pipelines often pool judgments into a single score vector, treating systematic judge disagreement as noise. We propose Heterogeneous Judge-Aware (HJA) ranking, a structured multi-judge ranking framework that separates consensus ranking, judge-specific sensitivity to consensus, and residual preference disagreement. HJA thereby treats ranking, judge sensitivity, and structured disagreement as separate inferential targets. We establish conditions under which this decomposition is identifiable and develop an anchored alternating algorithm that preserves the identifying geometry. For confidence quantification, we study a fixed-panel repeated-comparison regime in which the judge panel may remain fixed or modest while information grows through repeated judgments. This yields uncertainty statements for consensus and judge-specific ranking contrasts, sensitivity parameters, pairwise probabilities, and summaries of residual disagreement.Experiments on synthetic and real multi-judge comparison data show that HJA improves recovery, robustness, uncertainty calibration, and near-tie performance relative to pooled and sensitivity-only baselines. The fitted model also provides diagnostics for judge disagreement and model-affinity patterns, giving a statistically grounded framework for ranking under heterogeneous comparative judgments.

fields

cs.CL 1

years

2026 1

verdicts

UNVERDICTED 1

clear filters

representative citing papers

A Finite-Calibration Regime Map for LLM Judge Panels

cs.CL · 2026-05-31 · unverdicted · novelty 6.0

The paper introduces a finite-calibration regime map and Finite-Calibration Panel Selection selector, finding scalar aggregation wins on most real benchmark-budget combinations while joint tables help when interactions are present.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • A Finite-Calibration Regime Map for LLM Judge Panels cs.CL · 2026-05-31 · unverdicted · none · ref 19 · internal anchor

    The paper introduces a finite-calibration regime map and Finite-Calibration Panel Selection selector, finding scalar aggregation wins on most real benchmark-budget combinations while joint tables help when interactions are present.