hub

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

· 2017 · cs.LG · arXiv 1703.11008

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it

open full Pith review browse 19 citing papers arXiv PDF

abstract

One of the defining properties of deep learning is that models are chosen to have many more parameters than available training data. In light of this capacity for overfitting, it is remarkable that simple algorithms like SGD reliably return solutions with low test error. One roadblock to explaining these phenomena in terms of implicit regularization, structural properties of the solution, and/or easiness of the data is that many learning bounds are quantitatively vacuous when applied to networks learned by SGD in this "deep learning" regime. Logically, in order to explain generalization, we need nonvacuous bounds. We return to an idea by Langford and Caruana (2001), who used PAC-Bayes bounds to compute nonvacuous numerical bounds on generalization error for stochastic two-layer two-hidden-unit neural networks via a sensitivity analysis. By optimizing the PAC-Bayes bound directly, we are able to extend their approach and obtain nonvacuous generalization bounds for deep stochastic neural network classifiers with millions of parameters trained on only tens of thousands of examples. We connect our findings to recent and old work on flat minima and MDL-based explanations of generalization.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Are Flat Minima an Illusion?

cs.LG · 2026-03-24 · unverdicted · novelty 8.0

Flat minima are illusory; generalization is driven by weakness, a reparameterization-invariant measure of compatible completions that predicts performance better than sharpness on MNIST and Fashion-MNIST.

Sample Complexity of Scientific Discovery: PAC Learnability of Compositional Function Trees

cs.LG · 2026-06-28 · unverdicted · novelty 7.0

Proves that Rademacher complexity of depth-d compositional trees over finite operator vocabulary is controlled by (K b L)^{d} / sqrt(n) under Lipschitz conditions on operators.

Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning

cs.CV · 2026-06-08 · unverdicted · novelty 7.0

FisherAdapTune uses temporal drift in Fisher geometry, measured by scale-invariant Jensen-Shannon distance, to progressively freeze stabilized parameter groups during fine-tuning, reporting gains on segmentation and zero-shot transfer.

Pointwise Generalization in Deep Neural Networks

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.

On the Generalization Error of Differentially Private Algorithms via Typicality

cs.IT · 2026-01-13 · unverdicted · novelty 7.0

Sharper information-theoretic generalization bounds for differentially private algorithms obtained via typicality arguments that improve prior mutual-information results and add new maximal-leakage bounds.

Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

LE-SAM inverts SAM by fixing the loss budget instead of the parameter-space radius, yielding better generalization across benchmarks.

ConquerNet: Convolution-Smoothed Quantile ReLU Neural Networks with Minimax Guarantees

stat.ML · 2026-05-07 · unverdicted · novelty 7.0

ConquerNet smooths quantile ReLU networks with convolution for easier training and establishes minimax-optimal nonasymptotic risk bounds over Besov function classes.

Topology-Aware PAC-Bayesian Generalization Analysis for Graph Neural Networks

cs.LG · 2026-04-12 · unverdicted · novelty 7.0

A new PAC-Bayesian framework for GCNs derives a family of generalization bounds that embed graph topology via structured sensitivity matrices from spatial and spectral perspectives, recovering prior bounds as special cases while claiming tighter results.

Second-Order Path Kernel Interpolation Formulas in Machine Learning

cs.LG · 2026-06-05 · unverdicted · novelty 6.0

Derives second-order path-kernel interpolation formulas for gradient descent, SGD, and momentum training, adding curvature terms and a concentration estimate around the expected prediction.

CRAFT: Cost-aware Refinement And Front-aware Tuning of Prompts

cs.CL · 2026-06-03 · unverdicted · novelty 6.0

CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.

Sharpness-Aware Minimization for Efficiently Improving Generalization

cs.LG · 2020-10-03 · conditional · novelty 6.0

SAM solves a min-max problem to locate flat low-loss regions, improving generalization on CIFAR, ImageNet and label-noise tasks.

Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets

cs.LG · 2019-06-26 · unverdicted · novelty 6.0

Derives algorithm-dependent generalization bounds for neural nets using multilevel entropic regularization and proposes a Metropolis-simulated multi-scale Gibbs training procedure tested on a two-layer net for MNIST.

On the Generalization Bounds of Symbolic Regression with Genetic Programming

cs.LG · 2026-04-19 · unverdicted · novelty 6.0

Derives a generalization bound for GP-based symbolic regression that decomposes the gap into structure-selection complexity and constant-fitting complexity under tree constraints.

Towards Initialization-dependent and Non-vacuous Generalization Bounds for Overparameterized Shallow Neural Networks

cs.LG · 2026-04-01 · unverdicted · novelty 6.0

Path-norm initialization-dependent bounds with a new peeling technique give non-vacuous generalization guarantees for overparameterized shallow networks with Lipschitz activations.

A Rigorous, Tractable Measure of Model Complexity

stat.ML · 2026-05-20 · unverdicted · novelty 5.0

A gradient-similarity complexity measure that generalizes polynomial degree, kernel length scale, neighbor count, tree splits, and forest size while offering insights into double descent.

Margin-Adaptive Confidence Ranking for Reliable LLM Judgement

cs.LG · 2026-05-14 · unverdicted · novelty 4.0

Develops a margin-adaptive learned confidence estimator for LLMs with generalization guarantees to improve agreement rates with human judgments over heuristic baselines.

On improving deep learning generalization with adaptive sparse connectivity

cs.NE · 2019-06-27 · unverdicted · novelty 4.0

Sparse MLPs trained via SET plus neuron pruning achieve competitive performance on 15 datasets while pruning ~50% of hidden neurons and keeping parameter count linear in neuron count.

Generalization error bounds for two-layer neural networks with Lipschitz loss function

stat.ML · 2026-04-07 · unverdicted · novelty 4.0

Generalization error bounds of order O(n^{-1/2}) (dimension-free) are derived for two-layer neural networks with Lipschitz losses under independent test data, and O(n^{-1/(d_in + d_out)}) without independence, using Wasserstein distances and SGD moment bounds.

Statistical Properties of Training & Generalization

stat.ML · 2026-06-18 · unverdicted · novelty 1.0

Review of neural scaling laws and their relation to constraints and inductive biases when applying machine learning to physics problems.

citing papers explorer

Showing 18 of 18 citing papers after filters.

Are Flat Minima an Illusion? cs.LG · 2026-03-24 · unverdicted · none · ref 9 · internal anchor
Flat minima are illusory; generalization is driven by weakness, a reparameterization-invariant measure of compatible completions that predicts performance better than sharpness on MNIST and Fashion-MNIST.
Sample Complexity of Scientific Discovery: PAC Learnability of Compositional Function Trees cs.LG · 2026-06-28 · unverdicted · none · ref 14 · internal anchor
Proves that Rademacher complexity of depth-d compositional trees over finite operator vocabulary is controlled by (K b L)^{d} / sqrt(n) under Lipschitz conditions on operators.
Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning cs.CV · 2026-06-08 · unverdicted · none · ref 14 · internal anchor
FisherAdapTune uses temporal drift in Fisher geometry, measured by scale-invariant Jensen-Shannon distance, to progressively freeze stabilized parameter groups during fine-tuning, reporting gains on segmentation and zero-shot transfer.
Pointwise Generalization in Deep Neural Networks cs.LG · 2026-05-18 · unverdicted · none · ref 75 · internal anchor
Proposes pointwise Riemannian Dimension from feature eigenvalues to derive tighter, representation-aware generalization bounds for deep networks in the nonlinear regime.
On the Generalization Error of Differentially Private Algorithms via Typicality cs.IT · 2026-01-13 · unverdicted · none · ref 35 · internal anchor
Sharper information-theoretic generalization bounds for differentially private algorithms obtained via typicality arguments that improve prior mutual-information results and add new maximal-leakage bounds.
Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization cs.LG · 2026-05-11 · unverdicted · none · ref 35
LE-SAM inverts SAM by fixing the loss budget instead of the parameter-space radius, yielding better generalization across benchmarks.
ConquerNet: Convolution-Smoothed Quantile ReLU Neural Networks with Minimax Guarantees stat.ML · 2026-05-07 · unverdicted · none · ref 2
ConquerNet smooths quantile ReLU networks with convolution for easier training and establishes minimax-optimal nonasymptotic risk bounds over Besov function classes.
Topology-Aware PAC-Bayesian Generalization Analysis for Graph Neural Networks cs.LG · 2026-04-12 · unverdicted · none · ref 10
A new PAC-Bayesian framework for GCNs derives a family of generalization bounds that embed graph topology via structured sensitivity matrices from spatial and spectral perspectives, recovering prior bounds as special cases while claiming tighter results.
Second-Order Path Kernel Interpolation Formulas in Machine Learning cs.LG · 2026-06-05 · unverdicted · none · ref 99 · internal anchor
Derives second-order path-kernel interpolation formulas for gradient descent, SGD, and momentum training, adding curvature terms and a concentration estimate around the expected prediction.
CRAFT: Cost-aware Refinement And Front-aware Tuning of Prompts cs.CL · 2026-06-03 · unverdicted · none · ref 143 · internal anchor
CRAFT is a Pareto-front prompt optimizer that allocates scarce LLM validation calls to candidates near the current front using accuracy- and cost-oriented generators plus NSGA-II retention.
Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets cs.LG · 2019-06-26 · unverdicted · none · ref 20 · internal anchor
Derives algorithm-dependent generalization bounds for neural nets using multilevel entropic regularization and proposes a Metropolis-simulated multi-scale Gibbs training procedure tested on a two-layer net for MNIST.
On the Generalization Bounds of Symbolic Regression with Genetic Programming cs.LG · 2026-04-19 · unverdicted · none · ref 9
Derives a generalization bound for GP-based symbolic regression that decomposes the gap into structure-selection complexity and constant-fitting complexity under tree constraints.
Towards Initialization-dependent and Non-vacuous Generalization Bounds for Overparameterized Shallow Neural Networks cs.LG · 2026-04-01 · unverdicted · none · ref 16
Path-norm initialization-dependent bounds with a new peeling technique give non-vacuous generalization guarantees for overparameterized shallow networks with Lipschitz activations.
A Rigorous, Tractable Measure of Model Complexity stat.ML · 2026-05-20 · unverdicted · none · ref 45 · internal anchor
A gradient-similarity complexity measure that generalizes polynomial degree, kernel length scale, neighbor count, tree splits, and forest size while offering insights into double descent.
Margin-Adaptive Confidence Ranking for Reliable LLM Judgement cs.LG · 2026-05-14 · unverdicted · none · ref 98 · internal anchor
Develops a margin-adaptive learned confidence estimator for LLMs with generalization guarantees to improve agreement rates with human judgments over heuristic baselines.
On improving deep learning generalization with adaptive sparse connectivity cs.NE · 2019-06-27 · unverdicted · none · ref 1 · internal anchor
Sparse MLPs trained via SET plus neuron pruning achieve competitive performance on 15 datasets while pruning ~50% of hidden neurons and keeping parameter count linear in neuron count.
Generalization error bounds for two-layer neural networks with Lipschitz loss function stat.ML · 2026-04-07 · unverdicted · none · ref 5
Generalization error bounds of order O(n^{-1/2}) (dimension-free) are derived for two-layer neural networks with Lipschitz losses under independent test data, and O(n^{-1/(d_in + d_out)}) without independence, using Wasserstein distances and SGD moment bounds.
Statistical Properties of Training & Generalization stat.ML · 2026-06-18 · unverdicted · none · ref 127 · internal anchor
Review of neural scaling laws and their relation to constraints and inductive biases when applying machine learning to physics problems.

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer