Confidence Calibration for Convolutional Neural Networks Using Structured Dropout

Zhilu Zhang , Adrian V. Dalca , Mert R. Sabuncu

Authors on Pith no claims yet

classification 💻 cs.LG cs.CVstat.ML

keywords dropoutconfidencecalibrationstructuredbayesianconvolutionaldiversitylearning

read the original abstract

In classification applications, we often want probabilistic predictions to reflect confidence or uncertainty. Dropout, a commonly used training technique, has recently been linked to Bayesian inference, yielding an efficient way to quantify uncertainty in neural network models. However, as previously demonstrated, confidence estimates computed with a naive implementation of dropout can be poorly calibrated, particularly when using convolutional networks. In this paper, through the lens of ensemble learning, we associate calibration error with the correlation between the models sampled with dropout. Motivated by this, we explore the use of structured dropout to promote model diversity and improve confidence calibration. We use the SVHN, CIFAR-10 and CIFAR-100 datasets to empirically compare model diversity and confidence errors obtained using various dropout techniques. We also show the merit of structured dropout in a Bayesian active learning application.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Algorithm and Hardware Co-Design for Efficient Complex-Valued Uncertainty Estimation
cs.AR 2026-04 unverdicted novelty 7.0

Proposes dropout-based BayesCVNNs with automated configuration search and FPGA accelerators that deliver 4.5x–13x speedups over GPUs while enabling uncertainty estimation for complex-valued neural networks.
VOLTA: The Surprising Ineffectiveness of Auxiliary Losses for Calibrated Deep Learning
cs.LG 2026-04 unverdicted novelty 5.0

VOLTA, consisting of a deep encoder with learnable prototypes plus cross-entropy and post-hoc temperature scaling, matches or exceeds ten UQ baselines in accuracy, achieves lower expected calibration error, and perfor...