Instance-Optimal Estimation with Multiple LLM Judges on a Budget

Alexandre Prouti\`ere; Junghyun Lee; Sanghwa Kim; Se-Young Yun; Yassir Jedra

arxiv: 2605.23362 · v1 · pith:TC4WMJGLnew · submitted 2026-05-22 · 💻 cs.LG · cs.IT· math.IT· math.ST· stat.ML· stat.TH

Instance-Optimal Estimation with Multiple LLM Judges on a Budget

Junghyun Lee , Sanghwa Kim , Yassir Jedra , Alexandre Prouti\`ere , Se-Young Yun This is my paper

Pith reviewed 2026-05-25 05:23 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.ITmath.STstat.MLstat.TH

keywords budgeted estimationmulti-judge evaluationinstance optimalityinverse-variance weightingadaptive allocationLLM evaluationheteroskedastic estimation

0 comments

The pith

An adaptive algorithm using optimistically biased variance estimates matches the oracle inverse-variance weighted estimator rate for multi-judge LLM score estimation under a fixed budget.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formalizes budgeted heteroskedastic multi-judge estimation, where a fixed budget must be allocated across prompt-response pairs and judges that differ in cost and reliability to minimize error in a bounded score vector. It first derives the oracle allocation that minimizes error for the inverse-variance weighted estimator when variances are known. For the realistic case of unknown variances, it introduces EST-IVWE, which constructs optimistically biased variance estimates to produce a stable empirical allocation. The central result is that EST-IVWE attains the oracle error rate up to lower-order budget terms, and this performance is instance-optimal by a matching local minimax lower bound proved via an Assouad-type argument that preserves local variance structure. A sympathetic reader cares because LLM evaluations are expensive and heterogeneous; a provably near-optimal allocation directly improves accuracy for any given spend.

Core claim

The central claim is that EST-IVWE, an adaptive procedure that builds and uses optimistically biased variance estimates, matches the error rate of the oracle inverse-variance weighted estimator up to lower-order terms in the budget. A matching local minimax lower bound, obtained via an Assouad-type in-expectation argument based on local perturbations, establishes that the proposed algorithms are instance-optimal. This bound is sharper than what Fano-type packing arguments can deliver because the latter lose the local variance information that determines the optimal allocation.

What carries the argument

The inverse-variance weighted estimator (IVWE) whose error is minimized by an oracle allocation depending on unknown query-judge variances; EST-IVWE extends this to the unknown-variance case by constructing optimistically biased variance estimates that stabilize empirical allocation without rate loss.

If this is right

EST-IVWE attains the oracle IVWE error rate up to lower-order budget terms even when variances are unknown.
A local minimax lower bound shows the achieved rate is instance-optimal for each fixed variance configuration.
The Assouad-type argument based on local perturbations yields an allocation-dependent lower bound that Fano-type arguments cannot recover.
Numerical comparisons on synthetic data and HelpSteer2 confirm lower error than uniform allocation under the same budget.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same optimistic-bias stabilization technique may extend to other budgeted allocation problems where measurement costs and noise levels are heterogeneous and initially unknown.
Local-perturbation lower-bound constructions could be applied to other estimation settings where global packing arguments erase the structure that governs optimal resource use.
The instance-optimality result implies that uniform or non-adaptive allocations are provably suboptimal on instances with strong variance heterogeneity.

Load-bearing premise

The adaptive algorithm can construct and leverage optimistically biased variance estimates to stabilize the empirical allocation without degrading the final estimator's rate.

What would settle it

On synthetic instances where the true variances are known, if the squared error of EST-IVWE exceeds the oracle IVWE error by more than lower-order terms in the budget for large enough budgets, the rate-matching claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.23362 by Alexandre Prouti\`ere, Junghyun Lee, Sanghwa Kim, Se-Young Yun, Yassir Jedra.

**Figure 2.** Figure 2: Additional Experimental results on synthetic datasets. All results are averaged over [PITH_FULL_IMAGE:figures/full_fig_p050_2.png] view at source ↗

**Figure 3.** Figure 3: Full experimental results for the datasets (Complexity, Correctness, Helpfulness, Verbosity). Each column corresponds to an error metric ( [PITH_FULL_IMAGE:figures/full_fig_p052_3.png] view at source ↗

**Figure 4.** Figure 4: Impact of Cost Structure on Performances of [PITH_FULL_IMAGE:figures/full_fig_p053_4.png] view at source ↗

read the original abstract

Evaluating large language models increasingly relies on LLM-as-a-judge protocols, but such evaluations remain costly: different judges have different prices and reliabilities, and the difficulty of each prompt-response pair can vary substantially. This raises a basic allocation question: under a fixed budget, how should one distribute evaluation queries across heterogeneous judges and instances to obtain the most accurate score estimates? We formalize this question as *budgeted heteroskedastic multi-judge estimation*. Given $K$ prompt-response pairs, $J$ judges with known costs, and unknown query-judge variances, the goal is to estimate a bounded score vector while minimizing an $\ell_p$-error. Our first contribution is to analyze the inverse-variance weighted estimator (IVWE) and to derive the oracle allocation that minimizes its error rate. Since this allocation depends on the unknown variances, we then address the practical unknown-variance setting by proposing EST-IVWE, an adaptive algorithm that constructs and leverages *optimistically biased* variance estimates to stabilize the empirical allocation. We prove that EST-IVWE matches the oracle IVWE rate up to lower-order terms in the budget. Our second and central theoretical contribution is a matching *local* minimax lower bound, which establishes the instance-optimality of the proposed algorithms. A key technical insight is that Fano-type high-probability arguments are too coarse for this problem: their packing construction loses the local variance structure that governs the optimal allocation. We instead use an Assouad-type in-expectation argument, based on local perturbations, which preserves this structure and yields the sharp allocation-dependent lower bound. Finally, we numerically validate the superiority of our approach over na\"ive uniform allocation on synthetic and HelpSteer2 datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper formulates budgeted multi-judge LLM estimation and gives an adaptive estimator plus a local Assouad lower bound that preserves variance structure.

read the letter

The core contribution is a clean formulation of allocating a fixed budget across judges with different costs and per-instance variances to minimize estimation error on a score vector. They derive the oracle inverse-variance allocation for the known-variance case, then introduce EST-IVWE, which builds optimistic bias into the variance estimates so the empirical allocation stays stable when variances are unknown. The claim is that this matches the oracle rate up to lower-order terms. The second piece is a local minimax lower bound via an Assouad-style in-expectation argument on local perturbations; this keeps the variance dependence that Fano-style packings would smear out, which is the right technical move for instance optimality. Experiments on synthetic data and HelpSteer2 show gains over uniform allocation, as one would expect. The weakest link is the optimistic-bias construction itself. The abstract asserts it works without degrading the leading constant, but gives no explicit form or concentration argument, so it is unclear whether the bias stays negligible relative to the 1/sqrt(B) term or quietly adds a log factor on hard instances. If the full proofs close that gap cleanly, the result is sharp; otherwise the adaptivity may only be approximate. This is useful reading for people working on efficient LLM evaluation pipelines or on heteroskedastic allocation problems more generally. The problem is timely, the lower-bound technique is honest, and the paper is coherent on its own terms, so it should go to referees rather than desk rejection.

Referee Report

2 major / 2 minor

Summary. The paper formalizes budgeted heteroskedastic multi-judge estimation for LLM evaluations with heterogeneous judge costs and instance difficulties. It analyzes the inverse-variance weighted estimator (IVWE), derives its oracle allocation minimizing ℓ_p error under a fixed budget, and proposes the adaptive EST-IVWE algorithm that uses optimistically biased variance estimates to handle unknown variances while matching the oracle rate up to lower-order terms. It establishes instance-optimality via a matching local minimax lower bound derived from an Assouad-type in-expectation argument with local perturbations (avoiding coarse Fano packings), and validates the approach empirically against uniform allocation on synthetic data and the HelpSteer2 dataset.

Significance. If the central claims hold, the work delivers a practically relevant, instance-optimal framework for cost-efficient LLM-as-a-judge scoring that adapts to per-instance and per-judge variance heterogeneity. The local minimax lower bound that preserves the variance structure governing optimal allocation, together with the explicit adaptive procedure for unknown variances, constitutes a technical contribution beyond standard inverse-variance weighting. The empirical results on HelpSteer2 further support applicability.

major comments (2)

[EST-IVWE algorithm and its analysis] The claim that EST-IVWE matches the oracle IVWE rate up to lower-order terms (abstract) rests on the optimistic bias construction stabilizing allocation without degrading the leading 1/sqrt(B) constant. The bias must be strong enough to avoid unstable allocations on high-variance instances yet weak enough that the resulting estimator retains the exact oracle leading term; an explicit bias definition and concentration argument showing the bias term is o(1/sqrt(B)) are required to confirm this.
[Local minimax lower bound section] The Assouad-type local-perturbation argument is presented as yielding the sharp allocation-dependent lower bound. The specific local perturbation construction and the in-expectation calculation that retains the per-instance variance structure (rather than averaging it away) should be verified to ensure the lower bound exactly matches the oracle upper bound's leading constant.

minor comments (2)

Clarify the precise meaning of 'lower-order terms in the budget' (e.g., whether o(1/sqrt(B)) or O(log B / sqrt(B)) is intended) and state the dependence on K, J, and p explicitly.
The synthetic data generation process and the precise definition of the ℓ_p error metric used in the experiments should be described in more detail to allow reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight opportunities to strengthen the explicitness of our technical arguments, and we address each point below. We are prepared to revise the manuscript to incorporate additional details where needed.

read point-by-point responses

Referee: [EST-IVWE algorithm and its analysis] The claim that EST-IVWE matches the oracle IVWE rate up to lower-order terms (abstract) rests on the optimistic bias construction stabilizing allocation without degrading the leading 1/sqrt(B) constant. The bias must be strong enough to avoid unstable allocations on high-variance instances yet weak enough that the resulting estimator retains the exact oracle leading term; an explicit bias definition and concentration argument showing the bias term is o(1/sqrt(B)) are required to confirm this.

Authors: We agree that the optimistic bias construction merits a more explicit treatment to confirm the leading constant is preserved. In the revision we will add a dedicated subsection (or lemma) in Section 4 that (i) states the precise bias term (a multiple of the estimated standard deviation scaled by a slowly growing function of the number of samples per instance), (ii) proves that the resulting allocation deviates from the oracle allocation by an o(1/sqrt(B)) term in total variation with high probability, and (iii) shows via a direct calculation that this deviation contributes only lower-order terms to the final ℓ_p error. This will make the matching claim fully rigorous. revision: yes
Referee: [Local minimax lower bound section] The Assouad-type local-perturbation argument is presented as yielding the sharp allocation-dependent lower bound. The specific local perturbation construction and the in-expectation calculation that retains the per-instance variance structure (rather than averaging it away) should be verified to ensure the lower bound exactly matches the oracle upper bound's leading constant.

Authors: The local perturbation is constructed by adding an independent Rademacher perturbation of size Θ(1/σ_{k j}) to each instance-judge mean, with the scale chosen small enough to remain inside the bounded score interval. The in-expectation lower bound is obtained by linearity of expectation over the independent sign flips; because each coordinate's contribution appears separately in the total risk and the variance of the estimator for that coordinate is exactly the reciprocal of the total weight allocated to it, the per-instance variance structure is retained and the resulting lower bound matches the leading 1/sqrt(B) term of the oracle upper bound. We will insert a short clarifying paragraph after the main proof in Section 5 that spells out this coordinate-wise calculation. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivations rely on independent technical arguments

full rationale

The paper analyzes the inverse-variance weighted estimator to derive an oracle allocation, then proposes EST-IVWE using optimistically biased variance estimates to match the oracle rate up to lower-order terms, with a matching local minimax lower bound obtained via a new Assouad-type in-expectation argument based on local perturbations. No load-bearing step reduces by construction to its inputs, fitted parameters renamed as predictions, or self-citation chains; the central technical insight (preserving local variance structure in the lower bound) is presented as novel and independent of the algorithm definition. This matches the expectation that most papers are non-circular when the proof techniques are self-contained and externally falsifiable.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard domain assumptions about bounded scores and heteroskedastic unknown variances; the main technical work is in the adaptive procedure and lower-bound construction. No free parameters or invented entities are introduced beyond the problem setup itself.

axioms (2)

domain assumption The target score vector is bounded.
Explicitly stated as part of the estimation goal in the problem formulation.
domain assumption Judges have known per-query costs but unknown query-judge variances.
Core modeling choice that defines the heteroskedastic multi-judge setting.

pith-pipeline@v0.9.0 · 5873 in / 1368 out tokens · 38573 ms · 2026-05-25T05:23:08.078375+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove that EST-IVWE matches the oracle IVWE rate up to lower-order terms... matching local minimax lower bound... Assouad-type in-expectation argument
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

optimal allocation... A∗p(σ,c) = (∑k (cj∗(k)σ²k,j∗(k))^{p/(p+2)} )^{(p+2)/p}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

91 extracted references · 91 canonical work pages · 3 internal anchors

[1]

Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms , pages =

Chierichetti, Flavio and Dasgupta, Anirban and Kumar, Ravi and Lattanzi, Silvio , title =. Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms , pages =. 2014 , isbn =

work page 2014
[2]

2020 , volume =

Liang, Yingyu and Yuan, Hui , booktitle =. 2020 , volume =

work page 2020
[3]

and Liu, Sihan and Pittas, Thanasis , title =

Diakonikolas, Ilias and Kane, Daniel M. and Liu, Sihan and Pittas, Thanasis , title =. 2025 , isbn =. doi:10.1145/3717823.3718162 , booktitle =

work page doi:10.1145/3717823.3718162 2025
[4]

2023 , volume =

Kulkarni, Adithya and Chakraborty, Mohna and Xie, Sihong and Li, Qi , booktitle =. 2023 , volume =

work page 2023
[5]

2026 , issn =

The Innovation , pages =. 2026 , issn =. doi:10.1016/j.xinn.2025.101253 , author =

work page doi:10.1016/j.xinn.2025.101253 2026
[6]

Rossi and Andrew Lan and Zichao Wang , booktitle=

Nigel Fernandez and Branislav Kveton and Ryan A. Rossi and Andrew Lan and Zichao Wang , booktitle=. 2026 , url=

work page 2026
[7]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Kim, Seungone and Suk, Juyoung and Longpre, Shayne and Lin, Bill Yuchen and Shin, Jamin and Welleck, Sean and Neubig, Graham and Lee, Moontae and Lee, Kyungjae and Seo, Minjoon. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.248

work page doi:10.18653/v1/2024.emnlp-main.248 2024
[8]

Training language models to follow instructions with human feedback , url =

Ouyang, Long and Wu, Jeffrey and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul F and Leike, Jan and Lowe,...

work page
[9]

Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Zhang, Ruoyu and Ma, Shirong and Bi, Xiao and Zhang, Xiaokang and Yu, Xingkai and Wu, Yu and Wu, Z. F. and Gou, Zhibin and Shao, Zhihong and Li, Zhuoshu and Gao, Ziyi and Liu, Aixin and Xue, Bing and Wang, Bingxuan and Wu, Bochao and Feng, Bei ...

work page
[10]

Rafailov, Rafael and Sharma, Archit and Mitchell, Eric and Manning, Christopher D and Ermon, Stefano and Finn, Chelsea , booktitle =

work page
[11]

Bowman and Zac Hatfield-Dodds and Ben Mann and Dario Amodei and Nicholas Joseph and Sam McCandlish and Tom Brown and Jared Kaplan , year=

Yuntao Bai and Saurav Kadavath and Sandipan Kundu and Amanda Askell and Jackson Kernion and Andy Jones and Anna Chen and Anna Goldie and Azalia Mirhoseini and Cameron McKinnon and Carol Chen and Catherine Olsson and Christopher Olah and Danny Hernandez and Dawn Drain and Deep Ganguli and Dustin Li and Eli Tran-Johnson and Ethan Perez and Jamie Kerr and Ja...

work page
[12]

2026 , series =

Aadirupa Saha and Aniket Wagde and Branislav Kveton , booktitle =. 2026 , series =

work page 2026
[13]

Graybill and R

Franklin A. Graybill and R. B. Deal , journal =. 1959 , doi=

work page 1959
[14]

and Klaus Hinkelmann , title =

Norwood Jr, Thomas E. and Klaus Hinkelmann , title =. The Annals of Statistics , number =. 1977 , doi =

work page 1977
[15]

Aiyappan Nair , title =

K. Aiyappan Nair , title =. The Annals of Statistics , number =. 1980 , doi =

work page 1980
[16]

V. G. Voinov , journal =. 1984 , url=

work page 1984
[17]

2007 , issn =

Computational Statistics & Data Analysis , volume =. 2007 , issn =. doi:10.1016/j.csda.2007.04.004 , author =

work page doi:10.1016/j.csda.2007.04.004 2007
[18]

J. K. Ghosh and Bimal K. Sinha , title =. The Annals of Statistics , number =. 1981 , doi =

work page 1981
[19]

Communications in Statistics - Theory and Methods , volume =

Bimal Kumar Sinha and Omar Mouqadem , title =. Communications in Statistics - Theory and Methods , volume =. 1982 , publisher =

work page 1982
[20]

1997 , issn =

Journal of Statistical Planning and Inference , volume =. 1997 , issn =. doi:10.1016/S0378-3758(96)00202-9 , author =

work page doi:10.1016/s0378-3758(96)00202-9 1997
[21]

Dubois, Yann and Li, Chen Xuechen and Taori, Rohan and Zhang, Tianyi and Gulrajani, Ishaan and Ba, Jimmy and Guestrin, Carlos and Liang, Percy S and Hashimoto, Tatsunori B , booktitle =

work page
[22]

2024 , url=

Li, Haitao and Dong, Qian and Chen, Junjie and Su, Huixue and Zhou, Yujia and Ai, Qingyao and Ye, Ziyi and Liu, Yiqun , journal=. 2024 , url=

work page 2024
[23]

Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)

Raju, Ravi Shanker and Jain, Swayambhoo and Li, Bo and Li, Jonathan Lingjie and Thakker, Urmish. Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U). 2024. doi:10.18653/v1/2024.customnlp4u-1.14

work page doi:10.18653/v1/2024.customnlp4u-1.14 2024
[24]

Gonzalez and Ion Stoica , booktitle=

Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica , booktitle=. 2023 , url=

work page 2023
[25]

arXiv preprint arXiv:2506.02945 , url=

Aishwarya Sahoo and Jeevana Kruthi Karnuthala and Tushar Parmanand Budhwani and Pranchal Agarwal and Sankaran Vaidyanathan and Alexa Siu and Franck Dernoncourt and Jennifer Healey and Nedim Lipka and Ryan Rossi and Uttaran Bhattacharya and Branislav Kveton , year=. arXiv preprint arXiv:2506.02945 , url=

work page arXiv
[26]

Luyu Chen and Zeyu Zhang and Haoran Tan and Quanyu Dai and Hao Yang and Zhenhua Dong and Xu Chen , booktitle =

work page
[27]

arXiv preprint arXiv:2601.05420 , url=

Yiqun T Chen and Sizhu Lu and Sijia Li and Moran Guo and Shengyi Li , year=. arXiv preprint arXiv:2601.05420 , url=

work page arXiv
[28]

arXiv preprint arXiv:2511.21140 , url=

Chungpa Lee and Thomas Zeng and Jongwon Jeong and Jy-yong Sohn and Kangwook Lee , year=. arXiv preprint arXiv:2511.21140 , url=

work page arXiv
[29]

CyclicJudge: Mitigating Judge Bias Efficiently in LLM-based Evaluation

Ziyi Zhu and Olivier Tieleman and Alexey Bukhtiyarov and Jinghong Chen , year=. arXiv preprint arXiv:2603.01865 , url=

work page internal anchor Pith review Pith/arXiv arXiv
[30]

arXiv preprint arXiv:2411.00640 , url=

Evan Miller , year=. arXiv preprint arXiv:2411.00640 , url=

work page arXiv
[31]

Dorner and Vivian Yvonne Nastl and Moritz Hardt , booktitle=

Florian E. Dorner and Vivian Yvonne Nastl and Moritz Hardt , booktitle=. 2025 , url=

work page 2025
[32]

Ivanova , booktitle=

Sam Bowyer and Laurence Aitchison and Desi R. Ivanova , booktitle=. 2025 , url=

work page 2025
[33]

arXiv preprint arXiv:2505.19145 , url=

Weijie Su , year=. arXiv preprint arXiv:2505.19145 , url=

work page arXiv
[34]

arXiv preprint arXiv:2505.12050 , url=

Vinod Raman and Hilal Asi and Satyen Kale , year=. arXiv preprint arXiv:2505.12050 , url=

work page arXiv
[35]

2026 , url=

Bowen Zuo and Yinglun Zhu , booktitle=. 2026 , url=

work page 2026
[36]

2026 , url=

Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2026 , url=

work page 2026
[37]

Advances in Neural Information Processing Systems , publisher =

Riccardo Poiani and R. Advances in Neural Information Processing Systems , publisher =

work page
[38]

2025 , volume =

Wu, Di and Shi, Chengshuai and Zhou, Ruida and Shen, Cong , booktitle =. 2025 , volume =

work page 2025
[39]

Liu, Xinyu and You, Wei and Qin, Chao , year=

work page
[40]

Laurent and P

B. Laurent and P. Massart , title =. The Annals of Statistics , number =. 2000 , doi =

work page 2000
[41]

Proceedings of the 22nd Annual Conference on Learning Theory (COLT) , year =

Maurer, Andreas and Pontil, Massimiliano , title =. Proceedings of the 22nd Annual Conference on Learning Theory (COLT) , year =

work page
[42]

2021 , volume =

Fontaine, Xavier and Perrault, Pierre and Valko, Michal and Perchet, Vianney , booktitle =. 2021 , volume =

work page 2021
[43]

Mathematics of Operations Research , volume =

Garivier, Aur\'. Mathematics of Operations Research , volume =. 2019 , doi =

work page 2019
[44]

IEEE Transactions on Automatic Control , title=

Jedra, Yassir and Prouti\`. IEEE Transactions on Automatic Control , title=. 2023 , volume=

work page 2023
[45]

Advances in Neural Information Processing Systems , pages =

Yun, Se-Young and Prouti\`. Advances in Neural Information Processing Systems , pages =

work page
[46]

Lai and Herbert Robbins , journal =

Tse L. Lai and Herbert Robbins , journal =. 1985 , doi=

work page 1985
[47]

Philosophical Transactions of the Royal Society of London

Neyman, Jerzy and Pearson, Egon Sharpe , title =. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character , volume =. 1933 , doi =

work page 1933
[48]

Journal of the Royal Statistical Society , volume =

Neyman, Jerzy , title =. Journal of the Royal Statistical Society , volume =. 1934 , month =

work page 1934
[49]

Supplement to the Journal of the Royal Statistical Society , volume=

Problems arising in the analysis of a series of similar experiments , author=. Supplement to the Journal of the Royal Statistical Society , volume=. 1937 , month=. doi:10.2307/2984123 , publisher=

work page doi:10.2307/2984123 1937
[50]

Biometrics , volume=

The combination of estimates from different experiments , author=. Biometrics , volume=. 1954 , month=. doi:10.2307/3001666 , publisher=

work page doi:10.2307/3001666 1954
[51]

, title=

Bernstein, Sergei N. , title=. Mathematische Annalen , year=

work page
[52]

Mathematica (Cluj) , volume =

Tiberiu Popoviciu , title =. Mathematica (Cluj) , volume =. 1935 , language =

work page 1935
[53]

Journal of the American Statistical Association , volume =

Wassily Hoeffding , title =. Journal of the American Statistical Association , volume =. 1963 , publisher =

work page 1963
[54]

1977 , volume=

Yao, Andrew Chi-Chin , booktitle=. 1977 , volume=

work page 1977
[55]

Archiv der Mathematik , number =

Bauer, Heinz , doi =. Archiv der Mathematik , number =

work page
[56]

, journal=

Anderson, Theodore W. , journal=. 1955 , publisher=. doi:10.1090/S0002-9939-1955-0069229-1 , mrnumber=

work page doi:10.1090/s0002-9939-1955-0069229-1 1955
[57]

and Huber, C

Bretagnolle, J. and Huber, C. , doi =. Estimation des densit. Zeitschrift f

work page
[58]

Comptes rendus des séances de l'Académie des sciences

Assouad, Patrice , title=. Comptes rendus des séances de l'Académie des sciences. Série 1, Mathématique , year=

work page
[59]

Statistical Science , number =

S. Statistical Science , number =. 2020 , doi =

work page 2020
[60]

The Annals of Statistics , number =

Le Cam, Lucien , title =. The Annals of Statistics , number =. 1973 , doi =

work page 1973
[61]

Yu, Bin. 1997. doi:10.1007/978-1-4612-1880-7_29

work page doi:10.1007/978-1-4612-1880-7_29 1997
[62]

2024 , journal=

High-probability minimax lower bounds , author=. 2024 , journal=

work page 2024
[63]

Gill and Boris Y

Richard D. Gill and Boris Y. Levit , title =. Bernoulli , number =. 1995 , doi=

work page 1995
[64]

, title =

van Trees, Harry L. , title =. 1968 , address =

work page 1968
[65]

2024 , volume=

Chen, Wei-Ning and Özgür, Ayfer , booktitle=. 2024 , volume=

work page 2024
[66]

Efroimovich, S. Yu. , title =. Problems of Information Transmission , year =

work page
[67]

, booktitle=

Aras, Efe and Lee, Kuan-Yun and Pananjady, Ashwin and Courtade, Thomas A. , booktitle=. 2019 , volume=

work page 2019
[68]

Young , year=

Elliot H. Young , year=. arXiv preprint arXiv:2603.04686 , url=

work page arXiv
[69]

Chen, Wei-Ning and Kairouz, Peter and Özgür, Ayfer , booktitle =

work page
[70]

2023 , volume =

Lalitha, Anusha Lalitha and Kalantari, Kousha and Ma, Yifei and Deoras, Anoop and Kveton, Branislav , booktitle =. 2023 , volume =

work page 2023
[71]

, booktitle =

Simchowitz, Max and Foster, Dylan J. , booktitle =. 2020 , volume =

work page 2020
[72]

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics , pages =

Abeille, Marc and Faury, Louis and Calauz\`. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics , pages =. 2021 , volume =

work page 2021
[73]

Proceedings of The 29th International Conference on Artificial Intelligence and Statistics , year =

Lee, Junghyun and Jang, Kyoungseok and Vojnovi\'. Proceedings of The 29th International Conference on Artificial Intelligence and Statistics , year =

work page
[74]

Zhu, Yuancheng and Lafferty, John , booktitle =

work page
[75]

2017 , volume =

Kamalika Chaudhuri and Prateek Jain and Nagarajan Natarajan , booktitle =. 2017 , volume =

work page 2017
[76]

2018 , volume =

Kirschner, Johannes and Krause, Andreas , booktitle =. 2018 , volume =

work page 2018
[77]

arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[78]

arXiv preprint arXiv:2412.15115 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[79]

2015 , volume =

Foundations and Trends® in Machine Learning , title =. 2015 , volume =. doi:10.1561/2200000048 , issn =

work page doi:10.1561/2200000048 2015
[80]

Foundations and Trends® in Machine Learning , title =

S\'. Foundations and Trends® in Machine Learning , title =. 2012 , volume =. doi:10.1561/2200000024 , issn =

work page doi:10.1561/2200000024 2012

Showing first 80 references.

[1] [1]

Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms , pages =

Chierichetti, Flavio and Dasgupta, Anirban and Kumar, Ravi and Lattanzi, Silvio , title =. Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms , pages =. 2014 , isbn =

work page 2014

[2] [2]

2020 , volume =

Liang, Yingyu and Yuan, Hui , booktitle =. 2020 , volume =

work page 2020

[3] [3]

and Liu, Sihan and Pittas, Thanasis , title =

Diakonikolas, Ilias and Kane, Daniel M. and Liu, Sihan and Pittas, Thanasis , title =. 2025 , isbn =. doi:10.1145/3717823.3718162 , booktitle =

work page doi:10.1145/3717823.3718162 2025

[4] [4]

2023 , volume =

Kulkarni, Adithya and Chakraborty, Mohna and Xie, Sihong and Li, Qi , booktitle =. 2023 , volume =

work page 2023

[5] [5]

2026 , issn =

The Innovation , pages =. 2026 , issn =. doi:10.1016/j.xinn.2025.101253 , author =

work page doi:10.1016/j.xinn.2025.101253 2026

[6] [6]

Rossi and Andrew Lan and Zichao Wang , booktitle=

Nigel Fernandez and Branislav Kveton and Ryan A. Rossi and Andrew Lan and Zichao Wang , booktitle=. 2026 , url=

work page 2026

[7] [7]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Kim, Seungone and Suk, Juyoung and Longpre, Shayne and Lin, Bill Yuchen and Shin, Jamin and Welleck, Sean and Neubig, Graham and Lee, Moontae and Lee, Kyungjae and Seo, Minjoon. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.248

work page doi:10.18653/v1/2024.emnlp-main.248 2024

[8] [8]

Training language models to follow instructions with human feedback , url =

Ouyang, Long and Wu, Jeffrey and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul F and Leike, Jan and Lowe,...

work page

[9] [9]

Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Zhang, Ruoyu and Ma, Shirong and Bi, Xiao and Zhang, Xiaokang and Yu, Xingkai and Wu, Yu and Wu, Z. F. and Gou, Zhibin and Shao, Zhihong and Li, Zhuoshu and Gao, Ziyi and Liu, Aixin and Xue, Bing and Wang, Bingxuan and Wu, Bochao and Feng, Bei ...

work page

[10] [10]

Rafailov, Rafael and Sharma, Archit and Mitchell, Eric and Manning, Christopher D and Ermon, Stefano and Finn, Chelsea , booktitle =

work page

[11] [11]

Bowman and Zac Hatfield-Dodds and Ben Mann and Dario Amodei and Nicholas Joseph and Sam McCandlish and Tom Brown and Jared Kaplan , year=

Yuntao Bai and Saurav Kadavath and Sandipan Kundu and Amanda Askell and Jackson Kernion and Andy Jones and Anna Chen and Anna Goldie and Azalia Mirhoseini and Cameron McKinnon and Carol Chen and Catherine Olsson and Christopher Olah and Danny Hernandez and Dawn Drain and Deep Ganguli and Dustin Li and Eli Tran-Johnson and Ethan Perez and Jamie Kerr and Ja...

work page

[12] [12]

2026 , series =

Aadirupa Saha and Aniket Wagde and Branislav Kveton , booktitle =. 2026 , series =

work page 2026

[13] [13]

Graybill and R

Franklin A. Graybill and R. B. Deal , journal =. 1959 , doi=

work page 1959

[14] [14]

and Klaus Hinkelmann , title =

Norwood Jr, Thomas E. and Klaus Hinkelmann , title =. The Annals of Statistics , number =. 1977 , doi =

work page 1977

[15] [15]

Aiyappan Nair , title =

K. Aiyappan Nair , title =. The Annals of Statistics , number =. 1980 , doi =

work page 1980

[16] [16]

V. G. Voinov , journal =. 1984 , url=

work page 1984

[17] [17]

2007 , issn =

Computational Statistics & Data Analysis , volume =. 2007 , issn =. doi:10.1016/j.csda.2007.04.004 , author =

work page doi:10.1016/j.csda.2007.04.004 2007

[18] [18]

J. K. Ghosh and Bimal K. Sinha , title =. The Annals of Statistics , number =. 1981 , doi =

work page 1981

[19] [19]

Communications in Statistics - Theory and Methods , volume =

Bimal Kumar Sinha and Omar Mouqadem , title =. Communications in Statistics - Theory and Methods , volume =. 1982 , publisher =

work page 1982

[20] [20]

1997 , issn =

Journal of Statistical Planning and Inference , volume =. 1997 , issn =. doi:10.1016/S0378-3758(96)00202-9 , author =

work page doi:10.1016/s0378-3758(96)00202-9 1997

[21] [21]

Dubois, Yann and Li, Chen Xuechen and Taori, Rohan and Zhang, Tianyi and Gulrajani, Ishaan and Ba, Jimmy and Guestrin, Carlos and Liang, Percy S and Hashimoto, Tatsunori B , booktitle =

work page

[22] [22]

2024 , url=

Li, Haitao and Dong, Qian and Chen, Junjie and Su, Huixue and Zhou, Yujia and Ai, Qingyao and Ye, Ziyi and Liu, Yiqun , journal=. 2024 , url=

work page 2024

[23] [23]

Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)

Raju, Ravi Shanker and Jain, Swayambhoo and Li, Bo and Li, Jonathan Lingjie and Thakker, Urmish. Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U). 2024. doi:10.18653/v1/2024.customnlp4u-1.14

work page doi:10.18653/v1/2024.customnlp4u-1.14 2024

[24] [24]

Gonzalez and Ion Stoica , booktitle=

Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica , booktitle=. 2023 , url=

work page 2023

[25] [25]

arXiv preprint arXiv:2506.02945 , url=

Aishwarya Sahoo and Jeevana Kruthi Karnuthala and Tushar Parmanand Budhwani and Pranchal Agarwal and Sankaran Vaidyanathan and Alexa Siu and Franck Dernoncourt and Jennifer Healey and Nedim Lipka and Ryan Rossi and Uttaran Bhattacharya and Branislav Kveton , year=. arXiv preprint arXiv:2506.02945 , url=

work page arXiv

[26] [26]

Luyu Chen and Zeyu Zhang and Haoran Tan and Quanyu Dai and Hao Yang and Zhenhua Dong and Xu Chen , booktitle =

work page

[27] [27]

arXiv preprint arXiv:2601.05420 , url=

Yiqun T Chen and Sizhu Lu and Sijia Li and Moran Guo and Shengyi Li , year=. arXiv preprint arXiv:2601.05420 , url=

work page arXiv

[28] [28]

arXiv preprint arXiv:2511.21140 , url=

Chungpa Lee and Thomas Zeng and Jongwon Jeong and Jy-yong Sohn and Kangwook Lee , year=. arXiv preprint arXiv:2511.21140 , url=

work page arXiv

[29] [29]

CyclicJudge: Mitigating Judge Bias Efficiently in LLM-based Evaluation

Ziyi Zhu and Olivier Tieleman and Alexey Bukhtiyarov and Jinghong Chen , year=. arXiv preprint arXiv:2603.01865 , url=

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

arXiv preprint arXiv:2411.00640 , url=

Evan Miller , year=. arXiv preprint arXiv:2411.00640 , url=

work page arXiv

[31] [31]

Dorner and Vivian Yvonne Nastl and Moritz Hardt , booktitle=

Florian E. Dorner and Vivian Yvonne Nastl and Moritz Hardt , booktitle=. 2025 , url=

work page 2025

[32] [32]

Ivanova , booktitle=

Sam Bowyer and Laurence Aitchison and Desi R. Ivanova , booktitle=. 2025 , url=

work page 2025

[33] [33]

arXiv preprint arXiv:2505.19145 , url=

Weijie Su , year=. arXiv preprint arXiv:2505.19145 , url=

work page arXiv

[34] [34]

arXiv preprint arXiv:2505.12050 , url=

Vinod Raman and Hilal Asi and Satyen Kale , year=. arXiv preprint arXiv:2505.12050 , url=

work page arXiv

[35] [35]

2026 , url=

Bowen Zuo and Yinglun Zhu , booktitle=. 2026 , url=

work page 2026

[36] [36]

2026 , url=

Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2026 , url=

work page 2026

[37] [37]

Advances in Neural Information Processing Systems , publisher =

Riccardo Poiani and R. Advances in Neural Information Processing Systems , publisher =

work page

[38] [38]

2025 , volume =

Wu, Di and Shi, Chengshuai and Zhou, Ruida and Shen, Cong , booktitle =. 2025 , volume =

work page 2025

[39] [39]

Liu, Xinyu and You, Wei and Qin, Chao , year=

work page

[40] [40]

Laurent and P

B. Laurent and P. Massart , title =. The Annals of Statistics , number =. 2000 , doi =

work page 2000

[41] [41]

Proceedings of the 22nd Annual Conference on Learning Theory (COLT) , year =

Maurer, Andreas and Pontil, Massimiliano , title =. Proceedings of the 22nd Annual Conference on Learning Theory (COLT) , year =

work page

[42] [42]

2021 , volume =

Fontaine, Xavier and Perrault, Pierre and Valko, Michal and Perchet, Vianney , booktitle =. 2021 , volume =

work page 2021

[43] [43]

Mathematics of Operations Research , volume =

Garivier, Aur\'. Mathematics of Operations Research , volume =. 2019 , doi =

work page 2019

[44] [44]

IEEE Transactions on Automatic Control , title=

Jedra, Yassir and Prouti\`. IEEE Transactions on Automatic Control , title=. 2023 , volume=

work page 2023

[45] [45]

Advances in Neural Information Processing Systems , pages =

Yun, Se-Young and Prouti\`. Advances in Neural Information Processing Systems , pages =

work page

[46] [46]

Lai and Herbert Robbins , journal =

Tse L. Lai and Herbert Robbins , journal =. 1985 , doi=

work page 1985

[47] [47]

Philosophical Transactions of the Royal Society of London

Neyman, Jerzy and Pearson, Egon Sharpe , title =. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character , volume =. 1933 , doi =

work page 1933

[48] [48]

Journal of the Royal Statistical Society , volume =

Neyman, Jerzy , title =. Journal of the Royal Statistical Society , volume =. 1934 , month =

work page 1934

[49] [49]

Supplement to the Journal of the Royal Statistical Society , volume=

Problems arising in the analysis of a series of similar experiments , author=. Supplement to the Journal of the Royal Statistical Society , volume=. 1937 , month=. doi:10.2307/2984123 , publisher=

work page doi:10.2307/2984123 1937

[50] [50]

Biometrics , volume=

The combination of estimates from different experiments , author=. Biometrics , volume=. 1954 , month=. doi:10.2307/3001666 , publisher=

work page doi:10.2307/3001666 1954

[51] [51]

, title=

Bernstein, Sergei N. , title=. Mathematische Annalen , year=

work page

[52] [52]

Mathematica (Cluj) , volume =

Tiberiu Popoviciu , title =. Mathematica (Cluj) , volume =. 1935 , language =

work page 1935

[53] [53]

Journal of the American Statistical Association , volume =

Wassily Hoeffding , title =. Journal of the American Statistical Association , volume =. 1963 , publisher =

work page 1963

[54] [54]

1977 , volume=

Yao, Andrew Chi-Chin , booktitle=. 1977 , volume=

work page 1977

[55] [55]

Archiv der Mathematik , number =

Bauer, Heinz , doi =. Archiv der Mathematik , number =

work page

[56] [56]

, journal=

Anderson, Theodore W. , journal=. 1955 , publisher=. doi:10.1090/S0002-9939-1955-0069229-1 , mrnumber=

work page doi:10.1090/s0002-9939-1955-0069229-1 1955

[57] [57]

and Huber, C

Bretagnolle, J. and Huber, C. , doi =. Estimation des densit. Zeitschrift f

work page

[58] [58]

Comptes rendus des séances de l'Académie des sciences

Assouad, Patrice , title=. Comptes rendus des séances de l'Académie des sciences. Série 1, Mathématique , year=

work page

[59] [59]

Statistical Science , number =

S. Statistical Science , number =. 2020 , doi =

work page 2020

[60] [60]

The Annals of Statistics , number =

Le Cam, Lucien , title =. The Annals of Statistics , number =. 1973 , doi =

work page 1973

[61] [61]

Yu, Bin. 1997. doi:10.1007/978-1-4612-1880-7_29

work page doi:10.1007/978-1-4612-1880-7_29 1997

[62] [62]

2024 , journal=

High-probability minimax lower bounds , author=. 2024 , journal=

work page 2024

[63] [63]

Gill and Boris Y

Richard D. Gill and Boris Y. Levit , title =. Bernoulli , number =. 1995 , doi=

work page 1995

[64] [64]

, title =

van Trees, Harry L. , title =. 1968 , address =

work page 1968

[65] [65]

2024 , volume=

Chen, Wei-Ning and Özgür, Ayfer , booktitle=. 2024 , volume=

work page 2024

[66] [66]

Efroimovich, S. Yu. , title =. Problems of Information Transmission , year =

work page

[67] [67]

, booktitle=

Aras, Efe and Lee, Kuan-Yun and Pananjady, Ashwin and Courtade, Thomas A. , booktitle=. 2019 , volume=

work page 2019

[68] [68]

Young , year=

Elliot H. Young , year=. arXiv preprint arXiv:2603.04686 , url=

work page arXiv

[69] [69]

Chen, Wei-Ning and Kairouz, Peter and Özgür, Ayfer , booktitle =

work page

[70] [70]

2023 , volume =

Lalitha, Anusha Lalitha and Kalantari, Kousha and Ma, Yifei and Deoras, Anoop and Kveton, Branislav , booktitle =. 2023 , volume =

work page 2023

[71] [71]

, booktitle =

Simchowitz, Max and Foster, Dylan J. , booktitle =. 2020 , volume =

work page 2020

[72] [72]

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics , pages =

Abeille, Marc and Faury, Louis and Calauz\`. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics , pages =. 2021 , volume =

work page 2021

[73] [73]

Proceedings of The 29th International Conference on Artificial Intelligence and Statistics , year =

Lee, Junghyun and Jang, Kyoungseok and Vojnovi\'. Proceedings of The 29th International Conference on Artificial Intelligence and Statistics , year =

work page

[74] [74]

Zhu, Yuancheng and Lafferty, John , booktitle =

work page

[75] [75]

2017 , volume =

Kamalika Chaudhuri and Prateek Jain and Nagarajan Natarajan , booktitle =. 2017 , volume =

work page 2017

[76] [76]

2018 , volume =

Kirschner, Johannes and Krause, Andreas , booktitle =. 2018 , volume =

work page 2018

[77] [77]

arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[78] [78]

arXiv preprint arXiv:2412.15115 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[79] [79]

2015 , volume =

Foundations and Trends® in Machine Learning , title =. 2015 , volume =. doi:10.1561/2200000048 , issn =

work page doi:10.1561/2200000048 2015

[80] [80]

Foundations and Trends® in Machine Learning , title =

S\'. Foundations and Trends® in Machine Learning , title =. 2012 , volume =. doi:10.1561/2200000024 , issn =

work page doi:10.1561/2200000024 2012