Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Yarin Gal , Zoubin Ghahramani

Authors on Pith no claims yet

classification 📊 stat.ML cs.LG

keywords uncertaintydeepdropoutlearningmodelbayesiantoolsclassification

read the original abstract

Deep learning tools have gained tremendous attention in applied machine learning. However such tools for regression and classification do not capture model uncertainty. In comparison, Bayesian models offer a mathematically grounded framework to reason about model uncertainty, but usually come with a prohibitive computational cost. In this paper we develop a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes. A direct result of this theory gives us tools to model uncertainty with dropout NNs -- extracting information from existing models that has been thrown away so far. This mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy. We perform an extensive study of the properties of dropout's uncertainty. Various network architectures and non-linearities are assessed on tasks of regression and classification, using MNIST as an example. We show a considerable improvement in predictive log-likelihood and RMSE compared to existing state-of-the-art methods, and finish by using dropout's uncertainty in deep reinforcement learning.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 13 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Inducing Artificial Uncertainty in Language Models
cs.CL 2026-05 unverdicted novelty 7.0

Inducing artificial uncertainty on trivial tasks allows training probes that achieve higher calibration on hard data than standard approaches while retaining performance on easy data.
Mixed-Precision Information Bottlenecks for On-Device Trait-State Disentanglement in Bipolar Agitation Detection
cs.LG 2026-05 unverdicted novelty 7.0

MP-IB uses an 8x information asymmetry via FP16 trait heads and INT4 state heads to disentangle speaker identity from agitation in voice biomarkers, outperforming larger models on edge devices with low latency and sup...
Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning
astro-ph.GA 2026-04 unverdicted novelty 7.0

A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.
Causal Diffusion Models for Counterfactual Outcome Distributions in Longitudinal Data
stat.ML 2026-04 unverdicted novelty 7.0

Causal Diffusion Model is the first diffusion-based method to produce full probabilistic counterfactual outcome distributions for sequential interventions in longitudinal data, showing 15-30% better distributional acc...
Concrete Problems in AI Safety
cs.AI 2016-06 accept novelty 7.0

The paper categorizes five concrete AI safety problems arising from flawed objectives, costly evaluation, and learning dynamics.
A Physics-Aware Variational Graph Autoencoder for Joint Modal Identification with Uncertainty Quantification
cs.CE 2026-04 unverdicted novelty 6.0

A physics-informed graph variational autoencoder jointly predicts modal frequencies, damping, and shapes from PSD data of trusses with uncertainty quantification and orthogonality constraints.
U-Cast: A Surprisingly Simple and Efficient Frontier Probabilistic AI Weather Forecaster
cs.LG 2026-04 conditional novelty 6.0

A standard U-Net with MAE pre-training followed by short CRPS fine-tuning via Monte Carlo Dropout matches or exceeds GenCast and IFS ENS probabilistic skill at 1.5° resolution while cutting training compute and infere...
Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification
stat.ML 2026-04 unverdicted novelty 6.0

Ensemble-based method of moments on softmax outputs produces stable Dirichlet predictive distributions that improve uncertainty-guided tasks like selective classification over evidential deep learning.
Testing the Assumptions of Active Learning for Translation Tasks with Few Samples
cs.CL 2026-04 unverdicted novelty 5.0

Informativeness and diversity of samples selected by active learning show no correlation with test performance on translation tasks using few samples; ordering and pre-training effects dominate instead.
MedFormer-UR: Uncertainty-Routed Transformer for Medical Image Classification
eess.IV 2026-04 unverdicted novelty 5.0

MedFormer-UR integrates evidential uncertainty from Dirichlet distributions and class-specific prototypes into a transformer to improve calibration and selective prediction on medical images across four modalities.
ASTRAFier: A Novel and Scalable Transformer-based Stellar Variability Classifier
astro-ph.IM 2026-04 unverdicted novelty 5.0

ASTRAFier is a Transformer-BiLSTM-CNN model that classifies stellar variability from light curves, reporting 94.26% accuracy on Kepler data and 88.22% on TESS, then applied to 2.8 million TESS curves to release a catalog.
Portfolio Optimization Proxies under Label Scarcity and Regime Shifts via Bayesian and Deterministic Students under Semi-Supervised Sandwich Training
cs.LG 2026-04 unverdicted novelty 4.0

A semi-supervised teacher-student framework enables neural networks to proxy CVaR portfolio optimization using synthetic data augmentation for scarce labels and regime shifts.
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers
math.OC 2026-04 unverdicted novelty 2.0

A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.