Recognition: unknown
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
read the original abstract
Deep learning tools have gained tremendous attention in applied machine learning. However such tools for regression and classification do not capture model uncertainty. In comparison, Bayesian models offer a mathematically grounded framework to reason about model uncertainty, but usually come with a prohibitive computational cost. In this paper we develop a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes. A direct result of this theory gives us tools to model uncertainty with dropout NNs -- extracting information from existing models that has been thrown away so far. This mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy. We perform an extensive study of the properties of dropout's uncertainty. Various network architectures and non-linearities are assessed on tasks of regression and classification, using MNIST as an example. We show a considerable improvement in predictive log-likelihood and RMSE compared to existing state-of-the-art methods, and finish by using dropout's uncertainty in deep reinforcement learning.
This paper has not been read by Pith yet.
Forward citations
Cited by 13 Pith papers
-
Inducing Artificial Uncertainty in Language Models
Inducing artificial uncertainty on trivial tasks allows training probes that achieve higher calibration on hard data than standard approaches while retaining performance on easy data.
-
Mixed-Precision Information Bottlenecks for On-Device Trait-State Disentanglement in Bipolar Agitation Detection
MP-IB uses an 8x information asymmetry via FP16 trait heads and INT4 state heads to disentangle speaker identity from agitation in voice biomarkers, outperforming larger models on edge devices with low latency and sup...
-
Homogeneous Stellar Parameters from Heterogeneous Spectra with Deep Learning
A single end-to-end Transformer model unifies stellar labels from heterogeneous spectroscopic surveys into a self-consistent scale without post-hoc recalibration.
-
Causal Diffusion Models for Counterfactual Outcome Distributions in Longitudinal Data
Causal Diffusion Model is the first diffusion-based method to produce full probabilistic counterfactual outcome distributions for sequential interventions in longitudinal data, showing 15-30% better distributional acc...
-
Concrete Problems in AI Safety
The paper categorizes five concrete AI safety problems arising from flawed objectives, costly evaluation, and learning dynamics.
-
A Physics-Aware Variational Graph Autoencoder for Joint Modal Identification with Uncertainty Quantification
A physics-informed graph variational autoencoder jointly predicts modal frequencies, damping, and shapes from PSD data of trusses with uncertainty quantification and orthogonality constraints.
-
U-Cast: A Surprisingly Simple and Efficient Frontier Probabilistic AI Weather Forecaster
A standard U-Net with MAE pre-training followed by short CRPS fine-tuning via Monte Carlo Dropout matches or exceeds GenCast and IFS ENS probabilistic skill at 1.5° resolution while cutting training compute and infere...
-
Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification
Ensemble-based method of moments on softmax outputs produces stable Dirichlet predictive distributions that improve uncertainty-guided tasks like selective classification over evidential deep learning.
-
Testing the Assumptions of Active Learning for Translation Tasks with Few Samples
Informativeness and diversity of samples selected by active learning show no correlation with test performance on translation tasks using few samples; ordering and pre-training effects dominate instead.
-
MedFormer-UR: Uncertainty-Routed Transformer for Medical Image Classification
MedFormer-UR integrates evidential uncertainty from Dirichlet distributions and class-specific prototypes into a transformer to improve calibration and selective prediction on medical images across four modalities.
-
ASTRAFier: A Novel and Scalable Transformer-based Stellar Variability Classifier
ASTRAFier is a Transformer-BiLSTM-CNN model that classifies stellar variability from light curves, reporting 94.26% accuracy on Kepler data and 88.22% on TESS, then applied to 2.8 million TESS curves to release a catalog.
-
Portfolio Optimization Proxies under Label Scarcity and Regime Shifts via Bayesian and Deterministic Students under Semi-Supervised Sandwich Training
A semi-supervised teacher-student framework enables neural networks to proxy CVaR portfolio optimization using synthetic data augmentation for scarce labels and regime shifts.
-
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.