A PAC-Bayesian Tutorial with A Dropout Bound

David McAllester

arxiv: 1307.2118 · v1 · pith:M4JUX4MBnew · submitted 2013-07-08 · 💻 cs.LG

A PAC-Bayesian Tutorial with A Dropout Bound

David McAllester This is my paper

classification 💻 cs.LG

keywords boundpac-bayesiangeneralizationlosstraininganalysisboundsdropout

0 comments

read the original abstract

This tutorial gives a concise overview of existing PAC-Bayesian theory focusing on three generalization bounds. The first is an Occam bound which handles rules with finite precision parameters and which states that generalization loss is near training loss when the number of bits needed to write the rule is small compared to the sample size. The second is a PAC-Bayesian bound providing a generalization guarantee for posterior distributions rather than for individual rules. The PAC-Bayesian bound naturally handles infinite precision rule parameters, $L_2$ regularization, {\em provides a bound for dropout training}, and defines a natural notion of a single distinguished PAC-Bayesian posterior distribution. The third bound is a training-variance bound --- a kind of bias-variance analysis but with bias replaced by expected training loss. The training-variance bound dominates the other bounds but is more difficult to interpret. It seems to suggest variance reduction methods such as bagging and may ultimately provide a more meaningful analysis of dropouts.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

On Unified and Sharpened CMI Bounds for Generalization Errors
cs.IT 2026-05 unverdicted novelty 7.0

A unified CMI generalization bound based on leave-m-out cross-validation that envelopes existing results, bridges MI/CMI gaps, and sharpens under bounded loss with empirical gains.
Generalization Bounds for Quantum Learning via R\'enyi Divergences
quant-ph 2025-05 unverdicted novelty 7.0

Derives generalization bounds for quantum learning via quantum and classical Rényi divergences, with a new modified sandwich quantum Rényi divergence shown to outperform the Petz version analytically and numerically.
Margin-Adaptive Confidence Ranking for Reliable LLM Judgement
cs.LG 2026-05 unverdicted novelty 5.0

Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.
Is K-fold cross validation the best model selection method for Machine Learning?
stat.ML 2024-01 unverdicted novelty 5.0

K-fold CUBV combines cross-validation with PAC-Bayesian upper bounds on actual risk to provide a more robust criterion for validating ML accuracy and reducing false positives than standard CV.