A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

Behnam Neyshabur; Nathan Srebro; Srinadh Bhojanapalli

arxiv: 1707.09564 · v2 · pith:GX757J7Xnew · submitted 2017-07-29 · 💻 cs.LG

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

Behnam Neyshabur , Srinadh Bhojanapalli , Nathan Srebro This is my paper

classification 💻 cs.LG

keywords boundgeneralizationnetworksneuralnormanalysisapproachbounds

0 comments

read the original abstract

We present a generalization bound for feedforward neural networks in terms of the product of the spectral norm of the layers and the Frobenius norm of the weights. The generalization bound is derived using a PAC-Bayes analysis.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 12 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Aligning Network Equivariance with Data Symmetry: A Theoretical Framework and Adaptive Approach for Image Restoration
cs.CV 2026-05 unverdicted novelty 7.0

A new dataset-level non-strict symmetry measure allows deriving bounded equivariance for restoration models and motivates an adaptive network that aligns with per-sample symmetry to reduce expected risk.
Rethinking Model Selection in VLM Through the Lens of Gromov-Wasserstein Distance
cs.CV 2026-05 unverdicted novelty 7.0

Gromov-Wasserstein distance between modalities provides a stronger, inference-only predictor of final VLM performance than conventional encoder metrics, backed by theory linking it to cross-modal learnability and veri...
Path Regularization: A Near-Complete and Optimal Nonasymptotic Generalization Theory for Multilayer Neural Networks and Double Descent Phenomenon
cs.LG 2025-03 unverdicted novelty 7.0

A nonasymptotic generalization error upper bound for path-regularized multilayer neural networks with Lipschitz losses that exhibits double descent and is near-minimax optimal for ReLU regression.
Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces
cs.LG 2024-02 unverdicted novelty 7.0

Derives adaptive generalization bounds {c_m / N^{1/(2∨m)}} for digital ML models via new concentration of measure results on finite metric spaces, with c_m = O(sqrt(m)).
A Qualitative Test-Risk Mechanism for Scaling Behavior in Normalized Residual Networks
cs.LG 2026-05 unverdicted novelty 6.0

Depth expansion in normalized residual networks yields provable test-risk improvement through representational, optimization, and generalization gains under first-order descent and norm-control conditions.
Certified and accurate computation of function space norms of deep neural networks
math.NA 2026-03 unverdicted novelty 6.0

A certified adaptive quadrature framework computes guaranteed L^p, W^{1,p}, and W^{2,p} norms of deep neural networks by propagating interval enclosures on axis-aligned boxes.
Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets
cs.LG 2019-06 unverdicted novelty 6.0

Derives algorithm-dependent generalization bounds for neural nets using multilevel entropic regularization and proposes a Metropolis-simulated multi-scale Gibbs training procedure tested on a two-layer net for MNIST.
Margin-Adaptive Confidence Ranking for Reliable LLM Judgement
cs.LG 2026-05 unverdicted novelty 5.0

Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.
Rethinking the Personalized Relaxed Initialization in the Federated Learning: Consistency and Generalization
cs.LG 2026-04 unverdicted novelty 4.0

FedInit uses reverse personalized initialization in FL to reduce client drift effects, showing via excess risk that inconsistency impacts generalization error more than optimization error.
Generalization error bounds for two-layer neural networks with Lipschitz loss function
stat.ML 2026-04 unverdicted novelty 4.0

Generalization error bounds of order O(n^{-1/2}) (dimension-free) are derived for two-layer neural networks with Lipschitz losses under independent test data, and O(n^{-1/(d_in + d_out)}) without independence, using W...
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
cs.LG 2019-07 unverdicted novelty 4.0

Provides Hessian-based theoretical characterizations of SGD dynamics and a scale-invariant generalization bound for deep nets, backed by experiments on synthetic data, MNIST, and CIFAR-10.
On improving deep learning generalization with adaptive sparse connectivity
cs.NE 2019-06 unverdicted novelty 4.0

Sparse MLPs trained via SET plus neuron pruning achieve competitive performance on 15 datasets while pruning ~50% of hidden neurons and keeping parameter count linear in neuron count.