Density-Ratio Losses for Post-Hoc Learning to Defer
Pith reviewed 2026-05-20 02:38 UTC · model grok-4.3
The pith
Post-hoc learning to defer reduces to estimating the density ratio between a model's and an expert's ideal distributions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We study post-hoc Learning to Defer (L2D) through the lens of ideal distributions: divergence-regularized reweightings of the data distribution under which a model attains low loss. We define deferral via the density-ratio between a model's and an expert's ideals. Using the reduction from density-ratio estimation to class-probability estimation, we derive the DR CPE losses for post-hoc L2D scorers. Deferral decisions are then made by thresholding the scorer, allowing deferral rates to be adjusted without retraining. For KL-based ideal distributions, our deferral rules recovers Chow's rule under the original distribution and a connection to an expert-tilted Bayes posterior -- which incorporat
What carries the argument
The density ratio between a model's ideal distribution and an expert's ideal distribution, obtained via divergence-regularized reweightings and reduced to class-probability estimation losses for training a post-hoc deferral scorer.
Load-bearing premise
The reduction from density-ratio estimation to class-probability estimation holds for the chosen ideal distributions and the resulting scorer can be thresholded to produce valid deferral decisions without additional calibration or assumptions on the joint distribution of model and expert errors.
What would settle it
An experiment in which the DR CPE scorer's thresholded outputs fail to recover Chow's rule under the original distribution or fail to match the expert-tilted Bayes posterior for KL-based ideal distributions would falsify the central derivation.
Figures
read the original abstract
We study post-hoc Learning to Defer (L2D) through the lens of ideal distributions: divergence-regularized reweightings of the data distribution under which a model attains low loss. We define deferral via the density-ratio between a model's and an expert's ideals. Using the reduction from density-ratio estimation to class-probability estimation, we derive the DR CPE losses for post-hoc L2D scorers. Deferral decisions are then made by thresholding the scorer, allowing deferral rates to be adjusted without retraining. For KL-based ideal distributions, our deferral rules recovers Chow's rule under the original distribution and a connection to an expert-tilted Bayes posterior -- which incorporates the expert's performance -- depending on if the ideal distributions are joint or marginal distributions. Experimentally, our approach is competitive compared to common baselines and more robust across dataset settings. More broadly, our results cast post-hoc L2D as density-ratio learning between ideal distributions, bridging Chow-style rules, expert comparison, and elucidating connections to related learning settings including anomaly detection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes framing post-hoc learning to defer (L2D) as density-ratio estimation between 'ideal' distributions, defined as divergence-regularized reweightings of the original data measure under which a model or expert attains low loss. It derives DR-CPE losses by reducing density-ratio estimation to class-probability estimation, obtains deferral scorers that can be thresholded to control deferral rate without retraining, and shows that the KL-based case recovers Chow's rule (under joint ideals) or connects to an expert-tilted Bayes posterior (under marginal ideals). Experiments are reported as competitive with baselines and more robust across settings.
Significance. If the central reduction and thresholding argument hold without additional joint-error assumptions, the work supplies a clean density-ratio perspective that unifies Chow-style rules with post-hoc L2D and links to anomaly detection. The post-hoc, thresholdable nature is practically attractive. The manuscript does not yet provide quantitative tables, error bars, or dataset details in the abstract, so the strength of the empirical claim remains to be verified from the full experiments.
major comments (2)
- [§3] §3 (or the derivation following Eq. (3)): the claim that thresholding the DR-CPE scorer produces valid deferral sets appears to rest on the assumption that the class probability is monotonic in the conditional advantage of the expert over the model. The skeptic note correctly flags that marginal ideal distributions alone do not encode instance-level error correlation; the manuscript must explicitly state whether the joint versus marginal choice of ideals is sufficient to guarantee monotonicity, or whether an extra assumption on the joint distribution of model/expert errors is required.
- [§4] The recovery of Chow's rule for KL-based joint ideals is stated in the abstract and presumably derived in §4. The derivation must be checked for circularity: if the reweighting factors that define the ideals are realized by a loss on the original data, the resulting ratio estimator must not implicitly presuppose the very deferral decision it is meant to produce. A concrete walk-through of the steps from the ideal densities to the thresholded rule would clarify this.
minor comments (2)
- [Abstract] The abstract asserts that experiments are 'competitive and more robust' yet supplies no quantitative results, error bars, or dataset names. These details belong in the abstract or a prominent table in §5.
- [Abstract] Notation for the ideal distributions (joint vs. marginal) should be introduced once and used consistently; the current abstract switches between them without a clear forward pointer to the section that defines both.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and the recommendation of major revision. We address each major comment point by point below and will revise the manuscript to incorporate the requested clarifications.
read point-by-point responses
-
Referee: [§3] §3 (or the derivation following Eq. (3)): the claim that thresholding the DR-CPE scorer produces valid deferral sets appears to rest on the assumption that the class probability is monotonic in the conditional advantage of the expert over the model. The skeptic note correctly flags that marginal ideal distributions alone do not encode instance-level error correlation; the manuscript must explicitly state whether the joint versus marginal choice of ideals is sufficient to guarantee monotonicity, or whether an extra assumption on the joint distribution of model/expert errors is required.
Authors: We agree that explicit clarification is needed on this point. When joint ideal distributions are used, the density ratio is taken with respect to the joint measure over instances, which directly incorporates the per-instance losses of both the model and the expert. Consequently the class probability obtained from the DR-CPE reduction is monotonic in the conditional advantage of the expert, so that thresholding yields valid deferral sets without further assumptions. When marginal ideal distributions are used, instance-level error correlations are not encoded and monotonicity would indeed require an additional assumption on the joint distribution of model/expert errors. We will revise §3 to state this distinction clearly and to specify the conditions under which the thresholding argument holds. revision: yes
-
Referee: [§4] The recovery of Chow's rule for KL-based joint ideals is stated in the abstract and presumably derived in §4. The derivation must be checked for circularity: if the reweighting factors that define the ideals are realized by a loss on the original data, the resulting ratio estimator must not implicitly presuppose the very deferral decision it is meant to produce. A concrete walk-through of the steps from the ideal densities to the thresholded rule would clarify this.
Authors: We thank the referee for raising the possibility of circularity. The ideal distributions are purely theoretical objects: divergence-regularized reweightings of the original measure under which the model or expert attains low loss. The density-ratio estimator is obtained by applying the DR-CPE reduction directly to samples drawn from the original data distribution; the estimator is trained using only the observed per-instance losses of the model and expert and does not involve any deferral decisions or thresholds. The subsequent thresholding step is applied after estimation and does not feed back into the training of the scorer. We will add a concise, numbered walk-through of the steps from the definition of the ideal densities through the DR-CPE reduction to the final thresholded rule in the revised §4. revision: yes
Circularity Check
No significant circularity; derivation uses external definitions and standard reductions
full rationale
The paper starts from the external notion of ideal distributions (divergence-regularized reweightings of the data measure) and defines deferral explicitly as the density ratio between model and expert ideals. It then invokes the known, independently established reduction from density-ratio estimation to class-probability estimation to obtain the DR-CPE losses. Thresholding the resulting scorer is presented as a direct consequence of this construction. For the KL case the paper shows recovery of Chow's rule under the original distribution, which is an external benchmark rather than a self-derived quantity. No equation or step equates a fitted parameter to a prediction by construction, and no load-bearing premise rests solely on self-citation. The central claim therefore remains independent of its own outputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Ideal distributions exist and can be used to define a meaningful density ratio for deferral decisions.
- standard math The standard reduction from density-ratio estimation to class-probability estimation applies directly to the chosen ideal distributions.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
ideal distribution Q_m ∈ arg min_Q E[ℓ] + γ D(Q∥P_x); KL yields w(x;γ) = Z^{-1} exp(−E_η[ℓ]/γ)
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
deferral ri(x) = J ρ_i(x;γ,γ^e) ≤ τ K with ρ = dQi/dQ^e_i
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Methods of information geometry , volume 191
Shun-ichi Amari and Hiroshi Nagaoka. Methods of information geometry , volume 191. American Mathematical Soc., 2000
work page 2000
-
[2]
Classification with a Reject Option using a Hinge Loss
Peter L Bartlett and Marten H Wegkamp. Classification with a Reject Option using a Hinge Loss. Journal of Machine Learning Research, 9 0 (8), 2008
work page 2008
-
[3]
Discriminative learning under covariate shift
Steffen Bickel, Michael Br \"u ckner, and Tobias Scheffer. Discriminative learning under covariate shift. Journal of Machine Learning Research, 10 0 (9), 2009
work page 2009
-
[4]
Andreas Buja, Werner Stuetzle, and Yi Shen. Loss functions for binary class probability estimation and classification: Structure and applications . Working draft, November, 3: 0 13, 2005
work page 2005
-
[5]
How the machine `thinks': Understanding opacity in machine learning algorithms
Jenna Burrell. How the machine `thinks': Understanding opacity in machine learning algorithms . Big data & society, 3 0 (1): 0 2053951715622512, 2016
work page 2016
-
[6]
Yuzhou Cao, Tianchi Cai, Lei Feng, Lihong Gu, Jinjie Gu, Bo An, Gang Niu, and Masashi Sugiyama. Generalizing consistent multi-class classification with rejection to be compatible with arbitrary losses . Advances in neural information processing systems, 35: 0 521--534, 2022
work page 2022
-
[7]
Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey . ACM computing surveys (CSUR), 41 0 (3): 0 1--58, 2009
work page 2009
-
[8]
Classification with Rejection Based on Cost-sensitive Classification
Nontawat Charoenphakdee, Zhenghang Cui, Yivan Zhang, and Masashi Sugiyama. Classification with Rejection Based on Cost-sensitive Classification . In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 1507--1517. PMLR, 18--24 Jul 2021. URL...
work page 2021
-
[9]
A unifying post-processing framework for multi-objective learn-to-defer problems
Mohammad-Amin Charusaie and Samira Samadi. A unifying post-processing framework for multi-objective learn-to-defer problems . Advances in Neural Information Processing Systems, 37: 0 23705--23755, 2024
work page 2024
-
[10]
C. Chow. On optimum recognition error and reject tradeoff . IEEE Transactions on Information Theory, 16: 0 41--46, 1970. doi:10.1109/TIT.1970.1054406
-
[11]
An optimum character recognition system using decision functions
Chi-Keung Chow. An optimum character recognition system using decision functions . IRE Transactions on Electronic Computers, EC-6 0 (4): 0 247--254, 1957. doi:10.1109/TEC.1957.5222035
-
[12]
Corinna Cortes, Giulia DeSalvo, and Mehryar Mohri. Learning with rejection . In International conference on algorithmic learning theory, pages 67--82. Springer, 2016
work page 2016
-
[13]
arXiv preprint arXiv:2510.26706 , year=
Giulia DeSalvo, Clara Mohri, Mehryar Mohri, and Yutao Zhong. Budgeted multiple-expert deferral . arXiv preprint arXiv:2510.26706, 2025
- [14]
-
[15]
Statistics of robust optimization: A generalized empirical likelihood approach
John C Duchi, Peter W Glynn, and Hongseok Namkoong. Statistics of robust optimization: A generalized empirical likelihood approach . Mathematics of Operations Research, 46 0 (3): 0 946--969, 2021
work page 2021
-
[16]
A framework for robustness certification of smoothed classifiers using f-divergences
Krishnamurthy Dj Dvijotham, Jamie Hayes, Borja Balle, Zico Kolter, Chongli Qin, Andras Gyorgy, Kai Xiao, Sven Gowal, and Pushmeet Kohli. A framework for robustness certification of smoothed classifiers using f-divergences . In International Conference on Learning Representations, 2020
work page 2020
-
[17]
On the Foundations of Noise-free Selective Classification
Ran El-Yaniv et al. On the Foundations of Noise-free Selective Classification. Journal of Machine Learning Research, 11 0 (5), 2010
work page 2010
-
[18]
On the probability function in the collective theory of risk
F Escher. On the probability function in the collective theory of risk . Skand. Aktuarie Tidskr., 15: 0 175--195, 1932
work page 1932
-
[19]
Dermatologist-level classification of skin cancer with deep neural networks
Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks . nature, 542 0 (7639): 0 115--118, 2017
work page 2017
-
[20]
Optimal strategies for reject option classifiers
Vojtech Franc, Daniel Prusa, and Vaclav Voracek. Optimal strategies for reject option classifiers . Journal of Machine Learning Research, 24 0 (11): 0 1--49, 2023
work page 2023
-
[21]
Selective classification for deep neural networks
Yonatan Geifman and Ran El-Yaniv. Selective classification for deep neural networks . Advances in neural information processing systems, 30, 2017
work page 2017
-
[22]
S elective N et: A Deep Neural Network with an Integrated Reject Option
Yonatan Geifman and Ran El-Yaniv. S elective N et: A Deep Neural Network with an Integrated Reject Option . In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2151--2159. PMLR, 09--15 Jun 2019. URL https://proceedings.ml...
work page 2019
-
[23]
Language Model Cascades: Token-Level Uncertainty And Beyond
Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, and Sanjiv Kumar. Language Model Cascades: Token-Level Uncertainty And Beyond . In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=KgaBScZ4VI
work page 2024
-
[24]
Classification with reject option
Radu Herbei and Marten H Wegkamp. Classification with reject option . The Canadian Journal of Statistics/La Revue Canadienne de Statistique, pages 709--721, 2006
work page 2006
-
[25]
Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods
Eyke H \"u llermeier and Willem Waegeman. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods . Machine learning, 110 0 (3): 0 457--506, 2021
work page 2021
-
[26]
Wittawat Jitkrittum, Neha Gupta, Aditya K Menon, Harikrishna Narasimhan, Ankit Rawat, and Sanjiv Kumar. When does confidence-based cascade deferral suffice? Advances in Neural Information Processing Systems, 36: 0 9891--9906, 2023
work page 2023
-
[27]
Mohammad Ali Kadampur and Sulaiman Al Riyaee. Skin cancer detection: Applying a deep learning based model driven architecture in the cloud for classifying dermal cell images . Informatics in Medicine Unlocked, 18: 0 100282, 2020
work page 2020
-
[28]
Efficient edge inference by selective query
Anil Kag, Igor Fedorov, Aditya Gangrade, Paul Whatmough, and Venkatesh Saligrama. Efficient edge inference by selective query . In The Eleventh International Conference on Learning Representations, 2022
work page 2022
-
[29]
A least-squares approach to direct importance estimation
Takafumi Kanamori, Shohei Hido, and Masashi Sugiyama. A least-squares approach to direct importance estimation. The Journal of Machine Learning Research, 10: 0 1391--1445, 2009
work page 2009
-
[30]
BabyBear: Cheap inference triage for expensive language models
Leila Khalili, Yao You, and John Bohannon. BabyBear: Cheap inference triage for expensive language models . arXiv preprint arXiv:2205.11747, 2022. URL https://arxiv.org/abs/2205.11747
-
[31]
Adam: A Method for Stochastic Optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization . arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[32]
An optimization-centric view on Bayes' rule: Reviewing and generalizing variational inference
Jeremias Knoblauch, Jack Jewson, and Theodoros Damoulas. An optimization-centric view on Bayes' rule: Reviewing and generalizing variational inference . Journal of Machine Learning Research, 23 0 (132): 0 1--109, 2022
work page 2022
-
[33]
Donald E Knuth. Two Notes on Notation . The American Mathematical Monthly, 99: 0 403--422, 1992
work page 1992
-
[34]
Learning multiple layers of features from tiny images
Alex Krizhevsky. Learning multiple layers of features from tiny images . Technical report, 2009
work page 2009
-
[35]
E. L. Lehmann and Joseph P. Romano. Testing Statistical Hypotheses . Springer International Publishing, 2005
work page 2005
-
[36]
Large language models in finance: A survey
Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen. Large language models in finance: A survey . In Proceedings of the fourth ACM international conference on AI in finance, pages 374--382, 2023
work page 2023
-
[37]
The Inductive Bias of Restricted f-GANs
Shuang Liu and Kamalika Chaudhuri. The inductive bias of restricted f-gans . arXiv preprint arXiv:1809.04542, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[38]
When more experts hurt: Underfitting in multi-expert learning to defer
Shuqi Liu, Yuzhou Cao, Lei Feng, Bo An, and Luke Ong. When More Experts Hurt: Underfitting in Multi-Expert Learning to Defer . arXiv preprint arXiv:2602.17144, 2026
-
[39]
Segment anything in medical images
Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images . Nature communications, 15 0 (1): 0 654, 2024
work page 2024
-
[40]
Predict responsibly: improving fairness and accuracy by learning to defer
David Madras, Toni Pitassi, and Richard Zemel. Predict responsibly: improving fairness and accuracy by learning to defer . Advances in neural information processing systems, 31, 2018
work page 2018
-
[41]
Tangobert: Reducing inference cost by using cascaded architecture
Jonathan Mamou, Oren Pereg, Moshe Wasserblat, and Roy Schwartz. Tangobert: Reducing inference cost by using cascaded architecture . arXiv preprint arXiv:2204.06271, 2022
-
[42]
Two-Stage Learning to Defer with Multiple Experts
Anqi Mao, Christopher Mohri, Mehryar Mohri, and Yutao Zhong. Two-Stage Learning to Defer with Multiple Experts . Advances in Neural Information Processing Systems, 36: 0 3578--3606, 2023
work page 2023
-
[43]
Predictor-rejector multi-class abstention: Theoretical analysis and algorithms
Anqi Mao, Mehryar Mohri, and Yutao Zhong. Predictor-rejector multi-class abstention: Theoretical analysis and algorithms . In International Conference on Algorithmic Learning Theory, pages 822--867. PMLR, 2024 a
work page 2024
-
[44]
Principled approaches for learning to defer with multiple experts
Anqi Mao, Mehryar Mohri, and Yutao Zhong. Principled approaches for learning to defer with multiple experts . In International Workshop on Combinatorial Image Analysis, pages 107--135. Springer, 2024 b
work page 2024
-
[45]
Anqi Mao, Mehryar Mohri, and Yutao Zhong. Mastering Multiple-Expert Routing: Realizable \ H\ -Consistency and Strong Guarantees for Learning to Defer . In Forty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=2KlxjR6lsd
work page 2025
-
[46]
Linking losses for density ratio and class-probability estimation
Aditya Krishna Menon and Cheng Soon Ong. Linking losses for density ratio and class-probability estimation . In International Conference on Machine Learning, pages 304--313, 2016
work page 2016
-
[47]
A loss framework for calibrated anomaly detection
Aditya Krishna Menon and Robert C Williamson. A loss framework for calibrated anomaly detection . In Proceedings of the 32nd international conference on neural information processing systems, pages 1494--1504, 2018
work page 2018
- [48]
-
[49]
Consistent estimators for learning to defer to an expert
Hussein Mozannar and David Sontag. Consistent estimators for learning to defer to an expert . In International conference on machine learning, pages 7076--7087. PMLR, 2020
work page 2020
-
[50]
Who should predict? exact algorithms for learning to defer to humans
Hussein Mozannar, Hunter Lang, Dennis Wei, Prasanna Sattigeri, Subhro Das, and David Sontag. Who should predict? exact algorithms for learning to defer to humans . In International conference on artificial intelligence and statistics, pages 10520--10545. PMLR, 2023
work page 2023
-
[51]
Learning to reject meets long-tail learning
Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Neha Gupta, and Sanjiv Kumar. Learning to reject meets long-tail learning . In The Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[52]
Faster Cascades via Speculative Decoding
Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, and Sanjiv Kumar. Faster Cascades via Speculative Decoding . In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=vo9t20wsmd
work page 2025
-
[53]
Jerzy Neyman and Egon S. Pearson. IX. On the problem of the most efficient tests of statistical hypotheses . Philosophical Transactions of the Royal Society of London Series A Containing Papers of a Mathematical or Physical Character, 231: 0 289--337, 1933. doi:10.1098/rsta.1933.0009
-
[54]
On the calibration of multiclass classification with rejection
Chenri Ni, Nontawat Charoenphakdee, Junya Honda, and Masashi Sugiyama. On the calibration of multiclass classification with rejection . Advances in neural information processing systems, 32, 2019
work page 2019
-
[55]
A scaled Bregman theorem with applications
Richard Nock, Aditya Menon, and Cheng Soon Ong. A scaled Bregman theorem with applications . In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016 a
work page 2016
-
[56]
A scaled Bregman theorem with applications
Richard Nock, Aditya Menon, and Cheng Soon Ong. A scaled Bregman theorem with applications . Advances in Neural Information Processing Systems, 29, 2016 b
work page 2016
-
[57]
Differentiable learning under triage
Nastaran Okati, Abir De, and Manuel Gomez-Rodriguez. Differentiable learning under triage . Advances in Neural Information Processing Systems, 34: 0 9140--9151, 2021
work page 2021
-
[58]
Change of measure through the Legendre transform
Antoine Picard-Weibel and Benjamin Guedj. On change of measure inequalities for f -divergences . arXiv preprint arXiv:2202.05568, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[59]
AUC-based Selective Classification
Andrea Pugnana and Salvatore Ruggieri. AUC-based Selective Classification . In Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, editors, Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, pages 2494--2514. PMLR, 25--27 Apr 2023 a . URL https://proceed...
work page 2023
-
[60]
A Model-Agnostic Heuristics for Selective Classification
Andrea Pugnana and Salvatore Ruggieri. A Model-Agnostic Heuristics for Selective Classification . Proceedings of the AAAI Conference on Artificial Intelligence, 37 0 (8): 0 9461--9469, Jun. 2023 b . doi:10.1609/aaai.v37i8.26133. URL https://ojs.aaai.org/index.php/AAAI/article/view/26133
-
[61]
Consistent algorithms for multiclass classification with an abstain option
Harish G Ramaswamy, Ambuj Tewari, and Shivani Agarwal. Consistent algorithms for multiclass classification with an abstain option . Electronic Journal of Statistics, 12: 0 530--554, 2018
work page 2018
-
[62]
Mark D. Reid and Robert C. Williamson. Composite Binary Losses . Journal of Machine Learning Research, 11: 0 2387--2422, 2010
work page 2010
-
[63]
Information, Divergence and Risk for Binary Experiments
Mark D Reid and Robert C Williamson. Information, Divergence and Risk for Binary Experiments . Journal of Machine Learning Research, 12: 0 731--817, 2011
work page 2011
-
[64]
Pattern recognition and neural networks
Brian D Ripley. Pattern recognition and neural networks . Cambridge university press, 2007
work page 2007
-
[65]
R. T. Rockafellar. Convex Analysis . Princeton University Press, 1970
work page 1970
-
[66]
Loss Functions and Operators Generated by f-Divergences
Vincent Roulet, Tianlin Liu, Nino Vieillard, Michael Eli Sander, and Mathieu Blondel. Loss Functions and Operators Generated by f-Divergences . In Forty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=V1YfPJDliw
work page 2025
-
[67]
A min-max solution of an inventory problem
Herbert E Scarf, KJ Arrow, and S Karlin. A min-max solution of an inventory problem . Technical report, Rand Corporation Santa Monica, 1957
work page 1957
-
[68]
Toward expert-level medical question answering with large language models
Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Mohamed Amin, Le Hou, Kevin Clark, Stephen R Pfohl, Heather Cole-Lewis, et al. Toward expert-level medical question answering with large language models . Nature medicine, 31 0 (3): 0 943--950, 2025
work page 2025
-
[69]
A Connection Between Learning to Reject and Bhattacharyya Divergences
Alexander Soen. A Connection Between Learning to Reject and Bhattacharyya Divergences . In Geometric Science of Information, pages 369--377. Springer Nature Switzerland, 2026. doi:10.1007/978-3-032-03918-7_38
-
[70]
Rejection via Learning Density Ratios
Alexander Soen, Hisham Husain, Philip Schulz, and Vu Nguyen. Rejection via Learning Density Ratios . In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[71]
A Classification Framework for Anomaly Detection
Ingo Steinwart, Don Hush, and Clint Scovel. A Classification Framework for Anomaly Detection. Journal of Machine Learning Research, 6 0 (2), 2005
work page 2005
-
[72]
Masashi Sugiyama, Taiji Suzuki, Shinichi Nakajima, Hisashi Kashima, Paul von B\" u nau, and Motoaki Kawanabe. Direct importance estimation for covariate shift adaptation . Annals of the Institute of Statistical Mathematics, 60: 0 699--746, 2008. doi:10.1007/s10463-008-0197-x
-
[73]
Density ratio estimation in machine learning
Masashi Sugiyama, Taiji Suzuki, and Takafumi Kanamori. Density ratio estimation in machine learning . Cambridge University Press, 2012
work page 2012
-
[74]
High-performance medicine: the convergence of human and artificial intelligence
Eric J Topol. High-performance medicine: the convergence of human and artificial intelligence . Nature medicine, 25 0 (1): 0 44--56, 2019
work page 2019
-
[75]
Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems
Neeraj Varshney and Chitta Baral. Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems . In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11007--11021, Abu Dhabi, United Arab Emirates, December 2022. Association for Comput...
-
[76]
Calibrated learning to defer with one-vs-all classifiers
Rajeev Verma and Eric Nalisnick. Calibrated learning to defer with one-vs-all classifiers . In International Conference on Machine Learning, pages 22184--22202. PMLR, 2022
work page 2022
-
[77]
Rajeev Verma, Daniel Barrej \'o n, and Eric Nalisnick. Learning to defer to multiple experts: Consistent surrogate losses, confidence calibration, and conformal ensembles . In International Conference on Artificial Intelligence and Statistics, pages 11415--11434. PMLR, 2023
work page 2023
-
[78]
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features . In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, volume 1, pages I--I, 2001. doi:10.1109/CVPR.2001.990517
-
[79]
Kitani, Yair Movshovitz-Attias, and Elad Eban
Xiaofang Wang, Dan Kondratyuk, Eric Christiansen, Kris M. Kitani, Yair Movshovitz-Attias, and Elad Eban. Wisdom of Committees: An Overlooked Approach To Faster and More Accurate Models . In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=MvO2t0vbs4-
work page 2022
-
[80]
MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis
Jiancheng Yang, Rui Shi, and Bingbing Ni. MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis . In IEEE 18th International Symposium on Biomedical Imaging (ISBI), pages 191--195, 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.