Exploring the Potential of Bilevel Optimization for Calibrating Neural Networks

Arjun Pakrashi; Francesco Rinaldi; Gabriele Sanguin; Marco Viola

arxiv: 2503.13113 · v1 · pith:P4JSJFEGnew · submitted 2025-03-17 · 💻 cs.LG · math.OC

Exploring the Potential of Bilevel Optimization for Calibrating Neural Networks

Gabriele Sanguin , Arjun Pakrashi , Marco Viola , Francesco Rinaldi This is my paper

Pith reviewed 2026-05-22 23:45 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords bilevel optimizationneural network calibrationconfidence estimationisotonic regressionuncertainty quantificationmachine learningself-calibration

0 comments

The pith

Bilevel optimization trains neural networks with reduced calibration error while preserving accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes using bilevel optimization to jointly train a neural network and calibrate its output confidence scores in a single process. The inner optimization level fits the network parameters while the outer level minimizes a calibration objective, tested on toy problems like Blobs and Spirals plus a simulated Blood Alcohol Concentration task. Results are compared against isotonic regression, a standard post-hoc calibration method. A sympathetic reader would care because modern networks often produce overconfident predictions that make uncertainty hard to trust in decision systems. The central experimental finding is that the bilevel approach lowers calibration error without harming predictive accuracy.

Core claim

A self-calibrating bilevel neural-network training approach improves a model's predicted confidence scores. The framework solves a hierarchical problem in which the inner level performs standard network training and the outer level adjusts for calibration. On Blobs, Spirals, and Blood Alcohol Concentration datasets the method produces lower calibration error than isotonic regression while accuracy stays the same.

What carries the argument

Bilevel optimization with neural-network training as the inner problem and a calibration objective as the outer problem.

If this is right

The bilevel method reduces calibration error relative to isotonic regression on the reported toy and simulated datasets.
Predictive accuracy remains unchanged under the bilevel training regime.
Predicted confidence scores become more reliable for downstream decision-making.
The approach offers an integrated alternative to separate post-training calibration steps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bilevel structure could be tested on image or language models where calibration failures are common.
If the outer calibration loss is replaced by other uncertainty metrics the framework might address related problems such as selective classification.
Convergence behavior of the bilevel solver on deeper networks remains unexamined in the reported experiments.

Load-bearing premise

Bilevel optimization can be solved stably and efficiently when the inner problem is neural-network training and the outer problem is calibration.

What would settle it

Applying the bilevel procedure to a larger real-world dataset and finding either no drop in calibration error or solver instability would show the approach does not generalize as claimed.

Figures

Figures reproduced from arXiv: 2503.13113 by Arjun Pakrashi, Francesco Rinaldi, Gabriele Sanguin, Marco Viola.

**Figure 1.** Figure 1: Confidence region estimation on the Blobs 1.7 dataset for differnent approaches. Each plot represents the spatial distribution of confidence levels across the dataset. The color in the background represents the confidence value that the model associates to a point that would be found in that place. A more detailed examination using quantitative metrics is essential to rigorously evaluate the effectiveness … view at source ↗

**Figure 2.** Figure 2: Confidence Histograms (top) and Reliability Diagrams (bottom) for Spiral 3.5 test set. Orange sections represent overconfident gap, while red represents underconfidence [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Left: evolution of training weights found by the BO4SC method for the Blobs 1.7 dataset (1 epoch unit = 10 training epochs). Right: Final weight distribution. with those samples that at the end result to be misclassified. One can clearly see that the weights often move in groups, creating bundles of lines that follow the same trend. They might represent groups of samples close to each other that have the s… view at source ↗

read the original abstract

Handling uncertainty is critical for ensuring reliable decision-making in intelligent systems. Modern neural networks are known to be poorly calibrated, resulting in predicted confidence scores that are difficult to use. This article explores improving confidence estimation and calibration through the application of bilevel optimization, a framework designed to solve hierarchical problems with interdependent optimization levels. A self-calibrating bilevel neural-network training approach is introduced to improve a model's predicted confidence scores. The effectiveness of the proposed framework is analyzed using toy datasets, such as Blobs and Spirals, as well as more practical simulated datasets, such as Blood Alcohol Concentration (BAC). It is compared with a well-known and widely used calibration strategy, isotonic regression. The reported experimental results reveal that the proposed bilevel optimization approach reduces the calibration error while preserving accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Bilevel setup for in-training calibration is a plausible direction but the work is too preliminary to trust the reported gains.

read the letter

The paper's main move is to fold calibration into neural net training via bilevel optimization, with the outer loop targeting a calibration loss while the inner loop does the usual training. On the toy Blobs, Spirals, and simulated BAC datasets they show lower expected calibration error than isotonic regression without losing accuracy. That is the concrete result they put forward. The approach is not entirely new in spirit—bilevel optimization and calibration are both established—but treating calibration as the outer objective during end-to-end training is a specific formulation that does not appear in the usual post-hoc literature. The experiments are direct and the comparison is clear on those small cases. That is the part that works. The soft spots are more substantial. The abstract supplies no equations, no description of the inner/outer objectives, no hypergradient method, and no convergence or stability checks. When the inner problem is non-convex neural net training, bilevel solvers are known to be delicate; without diagnostics it is reasonable to worry that any observed calibration improvement is an artifact of altered training dynamics rather than a reliable property of the hierarchical formulation. The datasets are all simulated and low-dimensional, so there is no evidence the method scales or generalizes. This is the sort of idea that might interest people already working on calibration or bilevel methods, but a reader who needs reproducible procedures or results on real data will not get much from it. I would not bring it to a reading group unless the goal is to brainstorm extensions. I would not cite it. It does not look ready for peer review; the central claim cannot be evaluated from what is shown.

Referee Report

2 major / 0 minor

Summary. The paper introduces a bilevel optimization framework for training neural networks that self-calibrates predicted confidence scores. It evaluates the method on toy datasets (Blobs, Spirals) and a simulated BAC dataset, claiming that the approach reduces calibration error relative to isotonic regression while preserving accuracy.

Significance. If the central claim holds after addressing formulation and stability details, the work could demonstrate a way to embed calibration directly into the training objective via hierarchical optimization rather than post-hoc methods. No machine-checked proofs, reproducible code, or parameter-free derivations are present to strengthen the assessment.

major comments (2)

[Abstract and Method] The manuscript provides no explicit bilevel formulation (inner NN training objective and outer calibration loss) or hypergradient method, which is load-bearing for attributing any ECE reduction to the hierarchical structure rather than to implicit regularization or solver behavior.
[Experiments] No convergence analysis, stability diagnostics, or comparison of implicit vs. unrolled differentiation appears for the non-convex inner loop; this directly undermines the claim that reported calibration gains on Blobs/Spirals/BAC are reliably due to the proposed approach.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback. We address the major comments point-by-point below and will incorporate revisions to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract and Method] The manuscript provides no explicit bilevel formulation (inner NN training objective and outer calibration loss) or hypergradient method, which is load-bearing for attributing any ECE reduction to the hierarchical structure rather than to implicit regularization or solver behavior.

Authors: We agree that an explicit bilevel formulation is necessary to substantiate the claims. The current manuscript describes the high-level idea but does not detail the inner objective (e.g., cross-entropy loss on network parameters) and outer objective (e.g., calibration loss such as ECE) or the specific hypergradient computation. In the revised version we will add a dedicated methods section with the full mathematical bilevel program and the hypergradient approximation employed. This will clarify attribution of the observed ECE reductions. revision: yes
Referee: [Experiments] No convergence analysis, stability diagnostics, or comparison of implicit vs. unrolled differentiation appears for the non-convex inner loop; this directly undermines the claim that reported calibration gains on Blobs/Spirals/BAC are reliably due to the proposed approach.

Authors: We acknowledge that the non-convex inner optimization requires additional diagnostics. The revision will include convergence plots for the inner loop, variance across random seeds, and a side-by-side comparison of implicit differentiation versus unrolling to confirm that calibration gains are not artifacts of the solver. The existing results already show lower ECE than isotonic regression on the reported datasets while accuracy is preserved; the added analyses will strengthen the reliability argument. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical comparison on toy datasets with no fitted predictions or self-referential derivations

full rationale

The paper introduces a bilevel optimization framework for neural network calibration and reports experimental results on Blobs, Spirals, and BAC datasets, comparing against isotonic regression. The abstract and provided text contain no equations, no parameter-fitting steps that are later renamed as predictions, and no derivation chain. The central claim is an empirical outcome (reduced ECE while preserving accuracy), which is independent of any self-definition or self-citation load-bearing premise. No load-bearing mathematical steps exist to inspect for circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no identifiable free parameters, axioms, or invented entities; the bilevel framing is treated as a standard optimization technique.

pith-pipeline@v0.9.0 · 5668 in / 1013 out tokens · 50803 ms · 2026-05-22T23:45:04.311799+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 3 internal anchors

[1]

Minderer, J

M. Minderer, J. Djolonga, R. Romijnders, F. Hubis, X. Zhai, N. Houlsby, D. Tran, M. Lucic, Revisiting the calibration of modern neural networks, in: Advances in Neural Information Processing Systems, volume 34, Curran Associates, Inc., 2021, pp. 15682–15694

work page 2021
[2]

Zhang, G.-S

X.-Y. Zhang, G.-S. Xie, X. Li, T. Mei, C.-L. Liu, A survey on learning to reject, Proceedings of the IEEE 111 (2023) 185–215

work page 2023
[3]

Hendrickx, L

K. Hendrickx, L. Perini, D. Van der Plas, W. Meert, J. Davis, Machine learning with a reject option: A survey, Machine Learning 113 (2024) 3073–3110

work page 2024
[4]

C. Guo, G. Pleiss, Y. Sun, K. Q. Weinberger, On calibration of modern neural networks, in: International conference on machine learning, PMLR, 2017, pp. 1321–1330

work page 2017
[5]

Cosmides, J

L. Cosmides, J. Tooby, Are humans good intuitive statisticians after all? rethinking some conclu- sions from the literature on judgment under uncertainty, cognition 58 (1996) 1–73

work page 1996
[6]

Pedregosa, Hyperparameter optimization with approximate gradient, in: International confer- ence on machine learning, PMLR, 2016, pp

F. Pedregosa, Hyperparameter optimization with approximate gradient, in: International confer- ence on machine learning, PMLR, 2016, pp. 737–746

work page 2016
[7]

Franceschi, M

L. Franceschi, M. Donini, P. Frasconi, M. Pontil, Forward and reverse gradient-based hyper- parameter optimization, in: International Conference on Machine Learning, PMLR, 2017, pp. 1165–1173

work page 2017
[8]

Franceschi, P

L. Franceschi, P. Frasconi, S. Salzo, R. Grazzi, M. Pontil, Bilevel programming for hyperparameter optimization and meta-learning, in: International conference on machine learning, PMLR, 2018, pp. 1568–1577

work page 2018
[9]

N. Jain, P. Shenoy, Selective classification using a robust meta-learning approach, arXiv preprint arXiv:2212.05987 (2022)

work page arXiv 2022
[10]

K. Q. Weinberger, L. K. Saul, Distance metric learning for large margin nearest neighbor classifica- tion., Journal of machine learning research 10 (2009)

work page 2009
[11]

P. R. Mendes Júnior, R. M. De Souza, R. d. O. Werneck, B. V. Stein, D. V. Pazinato, W. R. De Almeida, O. A. Penatti, R. d. S. Torres, A. Rocha, Nearest neighbors distance ratio open-set classifier, Machine Learning 106 (2017) 359–386

work page 2017
[12]

Jiang, B

H. Jiang, B. Kim, M. Guan, M. Gupta, To trust or not to trust a classifier, Advances in neural information processing systems 31 (2018)

work page 2018
[13]

Mandelbaum, D

A. Mandelbaum, D. Weinshall, Distance-based confidence score for neural network classifiers, arXiv preprint arXiv:1709.09844 (2017)

work page arXiv 2017
[14]

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

N. Papernot, P. McDaniel, Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning, arXiv preprint arXiv:1803.04765 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[15]

Y. Gal, Z. Ghahramani, Dropout as a bayesian approximation: Representing model uncertainty in deep learning, in: international conference on machine learning, PMLR, 2016, pp. 1050–1059

work page 2016
[16]

Blundell, J

C. Blundell, J. Cornebise, K. Kavukcuoglu, D. Wierstra, Weight uncertainty in neural network, in: International conference on machine learning, PMLR, 2015, pp. 1613–1622

work page 2015
[17]

Kristiadi, M

A. Kristiadi, M. Hein, P. Hennig, Being bayesian, even just a bit, fixes overconfidence in relu networks, in: International conference on machine learning, PMLR, 2020, pp. 5436–5446

work page 2020
[18]

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

C. Riquelme, G. Tucker, J. Snoek, Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling, arXiv preprint arXiv:1802.09127 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

Y. Xia, X. Cao, F. Wen, G. Hua, J. Sun, Learning discriminative reconstructions for unsupervised outlier removal, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1511–1519

work page 2015
[20]

Yoshihashi, W

R. Yoshihashi, W. Shao, R. Kawakami, S. You, M. Iida, T. Naemura, Classification-reconstruction learning for open-set recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4016–4025

work page 2019
[21]

Srivastava, G

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, The journal of machine learning research 15 (2014) 1929–1958

work page 2014
[22]

Lakshminarayanan, A

B. Lakshminarayanan, A. Pritzel, C. Blundell, Simple and scalable predictive uncertainty estimation using deep ensembles, Advances in neural information processing systems 30 (2017)

work page 2017
[23]

Bendale, T

A. Bendale, T. E. Boult, Towards open set deep networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1563–1572

work page 2016
[24]

De Stefano, C

C. De Stefano, C. Sansone, M. Vento, To reject or not to reject: that is the question-an answer in case of neural classifiers, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 30 (2000) 84–94

work page 2000
[25]

M. H. DeGroot, S. E. Fienberg, The comparison and evaluation of forecasters, Journal of the Royal Statistical Society: Series D (The Statistician) 32 (1983) 12–22

work page 1983
[26]

Niculescu-Mizil, R

A. Niculescu-Mizil, R. Caruana, Predicting good probabilities with supervised learning, in: Proceedings of the 22nd international conference on Machine learning, 2005, pp. 625–632

work page 2005
[27]

Zadrozny, C

B. Zadrozny, C. Elkan, Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers, in: Icml, volume 1, 2001, pp. 609–616

work page 2001
[28]

M. P. Naeini, G. Cooper, M. Hauskrecht, Obtaining well calibrated probabilities using bayesian binning, in: Proceedings of the AAAI conference on artificial intelligence, volume 29, 2015

work page 2015
[29]

Platt, et al., Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, Advances in large margin classifiers 10 (1999) 61–74

J. Platt, et al., Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, Advances in large margin classifiers 10 (1999) 61–74

work page 1999
[30]

M. Kull, T. Silva Filho, P. Flach, Beta calibration: a well-founded and easily implemented improve- ment on logistic calibration for binary classifiers, in: Artificial intelligence and statistics, PMLR, 2017, pp. 623–631

work page 2017
[31]

Y. Wang, L. Li, C. Dang, Calibrating classification probabilities with shape-restricted polynomial regression, IEEE transactions on pattern analysis and machine intelligence 41 (2019) 1813–1827

work page 2019
[32]

F. Pan, X. Ao, P. Tang, M. Lu, D. Liu, L. Xiao, Q. He, Field-aware calibration: a simple and empirically strong method for reliable probabilistic predictions, in: Proceedings of The Web Conference 2020, 2020, pp. 729–739

work page 2020
[33]

Zadrozny, C

B. Zadrozny, C. Elkan, Transforming classifier scores into accurate multiclass probability estimates, in: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002, pp. 694–699

work page 2002
[34]

Kwon, J.-H

Y. Kwon, J.-H. Won, B. J. Kim, M. C. Paik, Uncertainty quantification using bayesian neural networks in classification: Application to biomedical image segmentation, Computational Statistics & Data Analysis 142 (2020) 106816

work page 2020
[35]

Domke, Generic methods for optimization-based modeling, in: Artificial Intelligence and Statistics, PMLR, 2012, pp

J. Domke, Generic methods for optimization-based modeling, in: Artificial Intelligence and Statistics, PMLR, 2012, pp. 318–326

work page 2012
[36]

Maclaurin, D

D. Maclaurin, D. Duvenaud, R. Adams, Gradient-based hyperparameter optimization through reversible learning, in: International conference on machine learning, PMLR, 2015, pp. 2113–2122

work page 2015
[37]

J. Ren*, X. Feng*, B. Liu*, X. Pan*, Y. Fu, L. Mai, Y. Yang, Torchopt: An efficient library for differentiable optimization, Journal of Machine Learning Research 24 (2023) 1–14

work page 2023
[38]

Paszke, S

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in pytorch, in: NIPS-W, 2017

work page 2017
[39]

Nugent, P

C. Nugent, P. Cunningham, A case-based explanation system for black-box systems, Artif. Intell. Rev. 24 (2005) 163–178

work page 2005
[40]

Asadi, M

K. Asadi, M. L. Littman, An alternative softmax operator for reinforcement learning, in: Interna- tional Conference on Machine Learning, PMLR, 2017, pp. 243–252

work page 2017
[41]

D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

Minderer, J

M. Minderer, J. Djolonga, R. Romijnders, F. Hubis, X. Zhai, N. Houlsby, D. Tran, M. Lucic, Revisiting the calibration of modern neural networks, in: Advances in Neural Information Processing Systems, volume 34, Curran Associates, Inc., 2021, pp. 15682–15694

work page 2021

[2] [2]

Zhang, G.-S

X.-Y. Zhang, G.-S. Xie, X. Li, T. Mei, C.-L. Liu, A survey on learning to reject, Proceedings of the IEEE 111 (2023) 185–215

work page 2023

[3] [3]

Hendrickx, L

K. Hendrickx, L. Perini, D. Van der Plas, W. Meert, J. Davis, Machine learning with a reject option: A survey, Machine Learning 113 (2024) 3073–3110

work page 2024

[4] [4]

C. Guo, G. Pleiss, Y. Sun, K. Q. Weinberger, On calibration of modern neural networks, in: International conference on machine learning, PMLR, 2017, pp. 1321–1330

work page 2017

[5] [5]

Cosmides, J

L. Cosmides, J. Tooby, Are humans good intuitive statisticians after all? rethinking some conclu- sions from the literature on judgment under uncertainty, cognition 58 (1996) 1–73

work page 1996

[6] [6]

Pedregosa, Hyperparameter optimization with approximate gradient, in: International confer- ence on machine learning, PMLR, 2016, pp

F. Pedregosa, Hyperparameter optimization with approximate gradient, in: International confer- ence on machine learning, PMLR, 2016, pp. 737–746

work page 2016

[7] [7]

Franceschi, M

L. Franceschi, M. Donini, P. Frasconi, M. Pontil, Forward and reverse gradient-based hyper- parameter optimization, in: International Conference on Machine Learning, PMLR, 2017, pp. 1165–1173

work page 2017

[8] [8]

Franceschi, P

L. Franceschi, P. Frasconi, S. Salzo, R. Grazzi, M. Pontil, Bilevel programming for hyperparameter optimization and meta-learning, in: International conference on machine learning, PMLR, 2018, pp. 1568–1577

work page 2018

[9] [9]

N. Jain, P. Shenoy, Selective classification using a robust meta-learning approach, arXiv preprint arXiv:2212.05987 (2022)

work page arXiv 2022

[10] [10]

K. Q. Weinberger, L. K. Saul, Distance metric learning for large margin nearest neighbor classifica- tion., Journal of machine learning research 10 (2009)

work page 2009

[11] [11]

P. R. Mendes Júnior, R. M. De Souza, R. d. O. Werneck, B. V. Stein, D. V. Pazinato, W. R. De Almeida, O. A. Penatti, R. d. S. Torres, A. Rocha, Nearest neighbors distance ratio open-set classifier, Machine Learning 106 (2017) 359–386

work page 2017

[12] [12]

Jiang, B

H. Jiang, B. Kim, M. Guan, M. Gupta, To trust or not to trust a classifier, Advances in neural information processing systems 31 (2018)

work page 2018

[13] [13]

Mandelbaum, D

A. Mandelbaum, D. Weinshall, Distance-based confidence score for neural network classifiers, arXiv preprint arXiv:1709.09844 (2017)

work page arXiv 2017

[14] [14]

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

N. Papernot, P. McDaniel, Deep k-nearest neighbors: Towards confident, interpretable and robust deep learning, arXiv preprint arXiv:1803.04765 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[15] [15]

Y. Gal, Z. Ghahramani, Dropout as a bayesian approximation: Representing model uncertainty in deep learning, in: international conference on machine learning, PMLR, 2016, pp. 1050–1059

work page 2016

[16] [16]

Blundell, J

C. Blundell, J. Cornebise, K. Kavukcuoglu, D. Wierstra, Weight uncertainty in neural network, in: International conference on machine learning, PMLR, 2015, pp. 1613–1622

work page 2015

[17] [17]

Kristiadi, M

A. Kristiadi, M. Hein, P. Hennig, Being bayesian, even just a bit, fixes overconfidence in relu networks, in: International conference on machine learning, PMLR, 2020, pp. 5436–5446

work page 2020

[18] [18]

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

C. Riquelme, G. Tucker, J. Snoek, Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling, arXiv preprint arXiv:1802.09127 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[19] [19]

Y. Xia, X. Cao, F. Wen, G. Hua, J. Sun, Learning discriminative reconstructions for unsupervised outlier removal, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1511–1519

work page 2015

[20] [20]

Yoshihashi, W

R. Yoshihashi, W. Shao, R. Kawakami, S. You, M. Iida, T. Naemura, Classification-reconstruction learning for open-set recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4016–4025

work page 2019

[21] [21]

Srivastava, G

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, The journal of machine learning research 15 (2014) 1929–1958

work page 2014

[22] [22]

Lakshminarayanan, A

B. Lakshminarayanan, A. Pritzel, C. Blundell, Simple and scalable predictive uncertainty estimation using deep ensembles, Advances in neural information processing systems 30 (2017)

work page 2017

[23] [23]

Bendale, T

A. Bendale, T. E. Boult, Towards open set deep networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1563–1572

work page 2016

[24] [24]

De Stefano, C

C. De Stefano, C. Sansone, M. Vento, To reject or not to reject: that is the question-an answer in case of neural classifiers, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 30 (2000) 84–94

work page 2000

[25] [25]

M. H. DeGroot, S. E. Fienberg, The comparison and evaluation of forecasters, Journal of the Royal Statistical Society: Series D (The Statistician) 32 (1983) 12–22

work page 1983

[26] [26]

Niculescu-Mizil, R

A. Niculescu-Mizil, R. Caruana, Predicting good probabilities with supervised learning, in: Proceedings of the 22nd international conference on Machine learning, 2005, pp. 625–632

work page 2005

[27] [27]

Zadrozny, C

B. Zadrozny, C. Elkan, Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers, in: Icml, volume 1, 2001, pp. 609–616

work page 2001

[28] [28]

M. P. Naeini, G. Cooper, M. Hauskrecht, Obtaining well calibrated probabilities using bayesian binning, in: Proceedings of the AAAI conference on artificial intelligence, volume 29, 2015

work page 2015

[29] [29]

Platt, et al., Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, Advances in large margin classifiers 10 (1999) 61–74

J. Platt, et al., Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, Advances in large margin classifiers 10 (1999) 61–74

work page 1999

[30] [30]

M. Kull, T. Silva Filho, P. Flach, Beta calibration: a well-founded and easily implemented improve- ment on logistic calibration for binary classifiers, in: Artificial intelligence and statistics, PMLR, 2017, pp. 623–631

work page 2017

[31] [31]

Y. Wang, L. Li, C. Dang, Calibrating classification probabilities with shape-restricted polynomial regression, IEEE transactions on pattern analysis and machine intelligence 41 (2019) 1813–1827

work page 2019

[32] [32]

F. Pan, X. Ao, P. Tang, M. Lu, D. Liu, L. Xiao, Q. He, Field-aware calibration: a simple and empirically strong method for reliable probabilistic predictions, in: Proceedings of The Web Conference 2020, 2020, pp. 729–739

work page 2020

[33] [33]

Zadrozny, C

B. Zadrozny, C. Elkan, Transforming classifier scores into accurate multiclass probability estimates, in: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002, pp. 694–699

work page 2002

[34] [34]

Kwon, J.-H

Y. Kwon, J.-H. Won, B. J. Kim, M. C. Paik, Uncertainty quantification using bayesian neural networks in classification: Application to biomedical image segmentation, Computational Statistics & Data Analysis 142 (2020) 106816

work page 2020

[35] [35]

Domke, Generic methods for optimization-based modeling, in: Artificial Intelligence and Statistics, PMLR, 2012, pp

J. Domke, Generic methods for optimization-based modeling, in: Artificial Intelligence and Statistics, PMLR, 2012, pp. 318–326

work page 2012

[36] [36]

Maclaurin, D

D. Maclaurin, D. Duvenaud, R. Adams, Gradient-based hyperparameter optimization through reversible learning, in: International conference on machine learning, PMLR, 2015, pp. 2113–2122

work page 2015

[37] [37]

J. Ren*, X. Feng*, B. Liu*, X. Pan*, Y. Fu, L. Mai, Y. Yang, Torchopt: An efficient library for differentiable optimization, Journal of Machine Learning Research 24 (2023) 1–14

work page 2023

[38] [38]

Paszke, S

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in pytorch, in: NIPS-W, 2017

work page 2017

[39] [39]

Nugent, P

C. Nugent, P. Cunningham, A case-based explanation system for black-box systems, Artif. Intell. Rev. 24 (2005) 163–178

work page 2005

[40] [40]

Asadi, M

K. Asadi, M. L. Littman, An alternative softmax operator for reinforcement learning, in: Interna- tional Conference on Machine Learning, PMLR, 2017, pp. 243–252

work page 2017

[41] [41]

D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017