On the Importance of Strong Baselines in Bayesian Deep Learning

Jishnu Mukhoti; Pontus Stenetorp; Yarin Gal

On the Importance of Strong Baselines in Bayesian Deep Learning

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 1811.09385 v2 pith:KAY4OX6S submitted 2018-11-23 cs.LG stat.ML

On the Importance of Strong Baselines in Bayesian Deep Learning

Jishnu Mukhoti , Pontus Stenetorp , Yarin Gal This is my paper

classification cs.LG stat.ML

keywords experimentallearningbaselinebayesiandeepexperimentbaselinesbeen

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

Like all sub-fields of machine learning Bayesian Deep Learning is driven by empirical validation of its theoretical proposals. Given the many aspects of an experiment it is always possible that minor or even major experimental flaws can slip by both authors and reviewers. One of the most popular experiments used to evaluate approximate inference techniques is the regression experiment on UCI datasets. However, in this experiment, models which have been trained to convergence have often been compared with baselines trained only for a fixed number of iterations. We find that a well-established baseline, Monte Carlo dropout, when evaluated under the same experimental settings shows significant improvements. In fact, the baseline outperforms or performs competitively with methods that claimed to be superior to the very same baseline method when they were introduced. Hence, by exposing this flaw in experimental procedure, we highlight the importance of using identical experimental setups to evaluate, compare, and benchmark methods in Bayesian Deep Learning.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

'In-Between' Uncertainty in Bayesian Neural Networks
stat.ML 2019-06 unverdicted novelty 5.0

MFVI in BNNs underestimates uncertainty between data regions, leading to overconfident OOD predictions, while linearised Laplace approximation performs better.