Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors

Daniel M. Roy; Gintare Karolina Dziugaite

arxiv: 1712.09376 · v3 · pith:YSQIGV43new · submitted 2017-12-26 · 📊 stat.ML · cs.LG

Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors

Gintare Karolina Dziugaite , Daniel M. Roy This is my paper

classification 📊 stat.ML cs.LG

keywords entropy-sgdboundpac-bayesboundsclassifiergeneralizationobtainprior

0 comments

read the original abstract

We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i.e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier. Entropy-SGD works by optimizing the bound's prior, violating the hypothesis of the PAC-Bayes theorem that the prior is chosen independently of the data. Indeed, available implementations of Entropy-SGD rapidly obtain zero training error on random labels and the same holds of the Gibbs posterior. In order to obtain a valid generalization bound, we rely on a result showing that data-dependent priors obtained by stochastic gradient Langevin dynamics (SGLD) yield valid PAC-Bayes bounds provided the target distribution of SGLD is {\epsilon}-differentially private. We observe that test error on MNIST and CIFAR10 falls within the (empirically nonvacuous) risk bounds computed under the assumption that SGLD reaches stationarity. In particular, Entropy-SGLD can be configured to yield relatively tight generalization bounds and still fit real labels, although these same settings do not obtain state-of-the-art performance.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets
cs.LG 2019-06 unverdicted novelty 6.0

Derives algorithm-dependent generalization bounds for neural nets using multilevel entropic regularization and proposes a Metropolis-simulated multi-scale Gibbs training procedure tested on a two-layer net for MNIST.