A PAC-Bayesian Tutorial with A Dropout Bound
read the original abstract
This tutorial gives a concise overview of existing PAC-Bayesian theory focusing on three generalization bounds. The first is an Occam bound which handles rules with finite precision parameters and which states that generalization loss is near training loss when the number of bits needed to write the rule is small compared to the sample size. The second is a PAC-Bayesian bound providing a generalization guarantee for posterior distributions rather than for individual rules. The PAC-Bayesian bound naturally handles infinite precision rule parameters, $L_2$ regularization, {\em provides a bound for dropout training}, and defines a natural notion of a single distinguished PAC-Bayesian posterior distribution. The third bound is a training-variance bound --- a kind of bias-variance analysis but with bias replaced by expected training loss. The training-variance bound dominates the other bounds but is more difficult to interpret. It seems to suggest variance reduction methods such as bagging and may ultimately provide a more meaningful analysis of dropouts.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
On Unified and Sharpened CMI Bounds for Generalization Errors
A unified CMI generalization bound based on leave-m-out cross-validation that envelopes existing results, bridges MI/CMI gaps, and sharpens under bounded loss with empirical gains.
-
Generalization Bounds for Quantum Learning via R\'enyi Divergences
Derives generalization bounds for quantum learning via quantum and classical Rényi divergences, with a new modified sandwich quantum Rényi divergence shown to outperform the Petz version analytically and numerically.
-
Margin-Adaptive Confidence Ranking for Reliable LLM Judgement
Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.
-
Is K-fold cross validation the best model selection method for Machine Learning?
K-fold CUBV combines cross-validation with PAC-Bayesian upper bounds on actual risk to provide a more robust criterion for validating ML accuracy and reducing false positives than standard CV.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.