Stronger generalization bounds for deep nets via a compression approach

Behnam Neyshabur; Rong Ge; Sanjeev Arora; Yi Zhang

arxiv: 1802.05296 · v4 · pith:ELP4V5O2new · submitted 2018-02-14 · 💻 cs.LG

Stronger generalization bounds for deep nets via a compression approach

Sanjeev Arora , Rong Ge , Behnam Neyshabur , Yi Zhang This is my paper

classification 💻 cs.LG

keywords boundsgeneralizationnetsdeepcompressionbetterpropertiessome

0 comments

read the original abstract

Deep nets generalize well despite having more parameters than the number of training samples. Recent works try to give an explanation using PAC-Bayes and Margin-based analyses, but do not as yet result in sample complexity bounds better than naive parameter counting. The current paper shows generalization bounds that're orders of magnitude better in practice. These rely upon new succinct reparametrizations of the trained net --- a compression that is explicit and efficient. These yield generalization bounds via a simple compression-based framework introduced here. Our results also provide some theoretical justification for widespread empirical success in compressing deep nets. Analysis of correctness of our compression relies upon some newly identified \textquotedblleft noise stability\textquotedblright properties of trained deep nets, which are also experimentally verified. The study of these properties and resulting generalization bounds are also extended to convolutional nets, which had eluded earlier attempts on proving generalization.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis
cs.LG 2026-05 unverdicted novelty 7.0

QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.