Proves that Rademacher complexity of depth-d compositional trees over finite operator vocabulary is controlled by (K b L)^{d} / sqrt(n) under Lipschitz conditions on operators.
Exploring Generalization in Deep Learning
4 Pith papers cite this work. Polarity classification is still indexing.
abstract
With a goal of understanding what drives generalization in deep networks, we consider several recently suggested explanations, including norm-based control, sharpness and robustness. We study how these measures can ensure generalization, highlighting the importance of scale normalization, and making a connection between sharpness and PAC-Bayes theory. We then investigate how well the measures explain different observed phenomena.
citation-role summary
citation-polarity summary
fields
cs.LG 4years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
PAC-Bayes applied to low-sharpness flat minima yields non-vacuous generalization bounds for boolean functions whose Fourier spectra are sparse and low-degree, with parameters estimable by property testing.
Adaptive elastic net SAEs (AEN-SAEs) mitigate feature starvation in SAEs by combining ℓ2 structural stability with adaptive ℓ1 reweighting, producing a Lipschitz-continuous sparse coding map that recovers global feature support under mild assumptions.
Develops a margin-adaptive learned confidence estimator for LLMs with generalization guarantees to improve agreement rates with human judgments over heuristic baselines.
citing papers explorer
-
Sample Complexity of Scientific Discovery: PAC Learnability of Compositional Function Trees
Proves that Rademacher complexity of depth-d compositional trees over finite operator vocabulary is controlled by (K b L)^{d} / sqrt(n) under Lipschitz conditions on operators.
-
A Sharper Picture of Generalization in Transformers
PAC-Bayes applied to low-sharpness flat minima yields non-vacuous generalization bounds for boolean functions whose Fourier spectra are sparse and low-degree, with parameters estimable by property testing.
-
Feature Starvation as Geometric Instability in Sparse Autoencoders
Adaptive elastic net SAEs (AEN-SAEs) mitigate feature starvation in SAEs by combining ℓ2 structural stability with adaptive ℓ1 reweighting, producing a Lipschitz-continuous sparse coding map that recovers global feature support under mild assumptions.
-
Margin-Adaptive Confidence Ranking for Reliable LLM Judgement
Develops a margin-adaptive learned confidence estimator for LLMs with generalization guarantees to improve agreement rates with human judgments over heuristic baselines.