hub

A pac-bayesian approach to spectrally- normalized margin bounds for neural networks.arXiv preprint arXiv:1707.09564

Behnam Neyshabur, Srinadh Bhojanapalli, Nathan Srebro · 2017 · cs.LG · arXiv 1707.09564

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

open full Pith review browse 12 citing papers arXiv PDF

abstract

We present a generalization bound for feedforward neural networks in terms of the product of the spectral norm of the layers and the Frobenius norm of the weights. The generalization bound is derived using a PAC-Bayes analysis.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Aligning Network Equivariance with Data Symmetry: A Theoretical Framework and Adaptive Approach for Image Restoration

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

A new dataset-level non-strict symmetry measure allows deriving bounded equivariance for restoration models and motivates an adaptive network that aligns with per-sample symmetry to reduce expected risk.

Path Regularization: A Near-Complete and Optimal Nonasymptotic Generalization Theory for Multilayer Neural Networks and Double Descent Phenomenon

cs.LG · 2025-03-03 · unverdicted · novelty 7.0

A nonasymptotic generalization error upper bound for path-regularized multilayer neural networks with Lipschitz losses that exhibits double descent and is near-minimax optimal for ReLU regression.

Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces

cs.LG · 2024-02-08 · unverdicted · novelty 7.0

Derives adaptive generalization bounds {c_m / N^{1/(2∨m)}} for digital ML models via new concentration of measure results on finite metric spaces, with c_m = O(sqrt(m)).

Rethinking Model Selection in VLM Through the Lens of Gromov-Wasserstein Distance

cs.CV · 2026-05-02 · unverdicted · novelty 7.0

Gromov-Wasserstein distance between modalities provides a stronger, inference-only predictor of final VLM performance than conventional encoder metrics, backed by theory linking it to cross-modal learnability and verified across 60+ training runs.

Certified and accurate computation of function space norms of deep neural networks

math.NA · 2026-03-06 · unverdicted · novelty 6.0

A certified adaptive quadrature framework computes guaranteed L^p, W^{1,p}, and W^{2,p} norms of deep neural networks by propagating interval enclosures on axis-aligned boxes.

Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets

cs.LG · 2019-06-26 · unverdicted · novelty 6.0

Derives algorithm-dependent generalization bounds for neural nets using multilevel entropic regularization and proposes a Metropolis-simulated multi-scale Gibbs training procedure tested on a two-layer net for MNIST.

A Qualitative Test-Risk Mechanism for Scaling Behavior in Normalized Residual Networks

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Depth expansion in normalized residual networks yields provable test-risk improvement through representational, optimization, and generalization gains under first-order descent and norm-control conditions.

Margin-Adaptive Confidence Ranking for Reliable LLM Judgement

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.

Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization

cs.LG · 2019-07-24 · unverdicted · novelty 4.0

Provides Hessian-based theoretical characterizations of SGD dynamics and a scale-invariant generalization bound for deep nets, backed by experiments on synthetic data, MNIST, and CIFAR-10.

On improving deep learning generalization with adaptive sparse connectivity

cs.NE · 2019-06-27 · unverdicted · novelty 4.0

Sparse MLPs trained via SET plus neuron pruning achieve competitive performance on 15 datasets while pruning ~50% of hidden neurons and keeping parameter count linear in neuron count.

Rethinking the Personalized Relaxed Initialization in the Federated Learning: Consistency and Generalization

cs.LG · 2026-04-14 · unverdicted · novelty 4.0

FedInit uses reverse personalized initialization in FL to reduce client drift effects, showing via excess risk that inconsistency impacts generalization error more than optimization error.

Generalization error bounds for two-layer neural networks with Lipschitz loss function

stat.ML · 2026-04-07 · unverdicted · novelty 4.0

Generalization error bounds of order O(n^{-1/2}) (dimension-free) are derived for two-layer neural networks with Lipschitz losses under independent test data, and O(n^{-1/(d_in + d_out)}) without independence, using Wasserstein distances and SGD moment bounds.

citing papers explorer

Showing 12 of 12 citing papers.

Aligning Network Equivariance with Data Symmetry: A Theoretical Framework and Adaptive Approach for Image Restoration cs.CV · 2026-05-13 · unverdicted · none · ref 40 · internal anchor
A new dataset-level non-strict symmetry measure allows deriving bounded equivariance for restoration models and motivates an adaptive network that aligns with per-sample symmetry to reduce expected risk.
Path Regularization: A Near-Complete and Optimal Nonasymptotic Generalization Theory for Multilayer Neural Networks and Double Descent Phenomenon cs.LG · 2025-03-03 · unverdicted · none · ref 6 · internal anchor
A nonasymptotic generalization error upper bound for path-regularized multilayer neural networks with Lipschitz losses that exhibits double descent and is near-minimax optimal for ReLU regression.
Tighter Learning Guarantees on Digital Computers via Concentration of Measure on Finite Spaces cs.LG · 2024-02-08 · unverdicted · none · ref 6 · internal anchor
Derives adaptive generalization bounds {c_m / N^{1/(2∨m)}} for digital ML models via new concentration of measure results on finite metric spaces, with c_m = O(sqrt(m)).
Rethinking Model Selection in VLM Through the Lens of Gromov-Wasserstein Distance cs.CV · 2026-05-02 · unverdicted · none · ref 34
Gromov-Wasserstein distance between modalities provides a stronger, inference-only predictor of final VLM performance than conventional encoder metrics, backed by theory linking it to cross-modal learnability and verified across 60+ training runs.
Certified and accurate computation of function space norms of deep neural networks math.NA · 2026-03-06 · unverdicted · none · ref 43 · internal anchor
A certified adaptive quadrature framework computes guaranteed L^p, W^{1,p}, and W^{2,p} norms of deep neural networks by propagating interval enclosures on axis-aligned boxes.
Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets cs.LG · 2019-06-26 · unverdicted · none · ref 21 · internal anchor
Derives algorithm-dependent generalization bounds for neural nets using multilevel entropic regularization and proposes a Metropolis-simulated multi-scale Gibbs training procedure tested on a two-layer net for MNIST.
A Qualitative Test-Risk Mechanism for Scaling Behavior in Normalized Residual Networks cs.LG · 2026-05-08 · unverdicted · none · ref 9
Depth expansion in normalized residual networks yields provable test-risk improvement through representational, optimization, and generalization gains under first-order descent and norm-control conditions.
Margin-Adaptive Confidence Ranking for Reliable LLM Judgement cs.LG · 2026-05-14 · unverdicted · none · ref 15 · internal anchor
Introduces a margin-adaptive confidence ranking method that learns an estimator from simulated diversity and derives margin-dependent generalization bounds for use in fixed-sequence testing of LLM-human agreement.
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization cs.LG · 2019-07-24 · unverdicted · none · ref 51 · internal anchor
Provides Hessian-based theoretical characterizations of SGD dynamics and a scale-invariant generalization bound for deep nets, backed by experiments on synthetic data, MNIST, and CIFAR-10.
On improving deep learning generalization with adaptive sparse connectivity cs.NE · 2019-06-27 · unverdicted · none · ref 11 · internal anchor
Sparse MLPs trained via SET plus neuron pruning achieve competitive performance on 15 datasets while pruning ~50% of hidden neurons and keeping parameter count linear in neuron count.
Rethinking the Personalized Relaxed Initialization in the Federated Learning: Consistency and Generalization cs.LG · 2026-04-14 · unverdicted · none · ref 10
FedInit uses reverse personalized initialization in FL to reduce client drift effects, showing via excess risk that inconsistency impacts generalization error more than optimization error.
Generalization error bounds for two-layer neural networks with Lipschitz loss function stat.ML · 2026-04-07 · unverdicted · none · ref 16
Generalization error bounds of order O(n^{-1/2}) (dimension-free) are derived for two-layer neural networks with Lipschitz losses under independent test data, and O(n^{-1/(d_in + d_out)}) without independence, using Wasserstein distances and SGD moment bounds.

A pac-bayesian approach to spectrally- normalized margin bounds for neural networks.arXiv preprint arXiv:1707.09564

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer