Recognition: unknown
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
read the original abstract
Deep neural networks (NNs) are powerful black box predictors that have recently achieved impressive performance on a wide spectrum of tasks. Quantifying predictive uncertainty in NNs is a challenging and yet unsolved problem. Bayesian NNs, which learn a distribution over weights, are currently the state-of-the-art for estimating predictive uncertainty; however these require significant modifications to the training procedure and are computationally expensive compared to standard (non-Bayesian) NNs. We propose an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates. Through a series of experiments on classification and regression benchmarks, we demonstrate that our method produces well-calibrated uncertainty estimates which are as good or better than approximate Bayesian NNs. To assess robustness to dataset shift, we evaluate the predictive uncertainty on test examples from known and unknown distributions, and show that our method is able to express higher uncertainty on out-of-distribution examples. We demonstrate the scalability of our method by evaluating predictive uncertainty estimates on ImageNet.
This paper has not been read by Pith yet.
Forward citations
Cited by 13 Pith papers
-
Inducing Artificial Uncertainty in Language Models
Inducing artificial uncertainty on trivial tasks allows training probes that achieve higher calibration on hard data than standard approaches while retaining performance on easy data.
-
DualTCN: A Physics-Constrained Temporal Convolutional Network for 2 Time-Domain Marine CSEM Inversion
DualTCN is the first deep-learning model for time-domain marine CSEM inversion that regresses four earth parameters, achieves high accuracy on simulated data, and runs up to 21,000 times faster than classical optimizers.
-
SegWithU: Uncertainty as Perturbation Energy for Single-Forward-Pass Risk-Aware Medical Image Segmentation
SegWithU treats uncertainty as perturbation energy via rank-1 probes in a post-hoc head for efficient single-pass risk-aware medical image segmentation, outperforming other single-forward-pass methods on ACDC, BraTS20...
-
Multi-Quantile Regression for Extreme Precipitation Downscaling
Q-SRDRN multi-quantile network with pinball loss and per-quantile heads detects extreme precipitation events up to 18 times more effectively than deterministic baselines while preserving augmentation benefits for the median.
-
COMPASS: A Unified Decision-Intelligence System for Navigating Performance Trade-off in HPC
COMPASS formalizes HPC configuration questions as ML tasks on traces, quantifies recommendation trustworthiness, and delivers 65.93% lower average job turnaround time plus 80.93% lower node usage versus prior methods ...
-
AI-assisted modeling and Bayesian inference of unpolarized quark transverse momentum distributions from Drell-Yan data
An AI-assisted Bayesian framework extracts TMD PDFs from global Drell-Yan data using surrogate models for scalable MCMC sampling.
-
Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification
Ensemble-based method of moments on softmax outputs produces stable Dirichlet predictive distributions that improve uncertainty-guided tasks like selective classification over evidential deep learning.
-
Cell-induced densification and tether formation in fibrous extracellular matrices with biomimetic physics-informed neural networks
Bio-PINNs with a near-to-far curriculum and deformation-uncertainty proxy recover cell-induced densified phases and tether morphologies more reliably than standard adaptive PINN baselines in single-cell and multicellu...
-
Ensemble-Based Uncertainty Estimation for Code Correctness Estimation
Ensemble Semantic Entropy improves correlation with code correctness over single-model methods and powers a cascading scaling system that cuts FLOPs by 64.9% while preserving performance on LiveCodeBench.
-
Language Models (Mostly) Know What They Know
Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.
-
Testing the Assumptions of Active Learning for Translation Tasks with Few Samples
Informativeness and diversity of samples selected by active learning show no correlation with test performance on translation tasks using few samples; ordering and pre-training effects dominate instead.
-
Confident in a Confidence Score: Investigating the Sensitivity of Confidence Scores to Supervised Fine-Tuning
Supervised fine-tuning degrades the correlation between confidence scores and output quality in language models, driven by factors like training distribution similarity rather than true quality.
-
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.