hub

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

Behnam Neyshabur, Ryota Tomioka, Nathan Srebro · 2014 · cs.LG · arXiv 1412.6614

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

open full Pith review browse 15 citing papers arXiv PDF

abstract

We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multilayer feed-forward networks. We argue, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 other 1

citation-polarity summary

background 2 unclear 1

representative citing papers

Understanding deep learning requires rethinking generalization

cs.LG · 2016-11-10 · accept · novelty 8.0

State-of-the-art convolutional networks easily memorize random labels and unstructured noise images, indicating that generalization in deep learning cannot be explained by traditional capacity or regularization arguments.

Conservation Laws from Data Symmetry in Neural Networks

cs.LG · 2026-06-09 · unverdicted · novelty 7.0

Data symmetries generically do not induce conserved quantities in NN training for analytic non-polynomial losses, but can for MSE with tensorizable networks.

New Equivalences Between Interpolation and SVMs: Kernels and Structured Features

stat.ML · 2023-05-03 · unverdicted · novelty 7.0

New conditions for support vector proliferation (SVP) in RKHS for bounded orthonormal systems and sub-Gaussian features, yielding generalization bounds for kernel SVMs beyond prior restrictive assumptions.

Estimating Implicit Regularization in Deep Learning

stat.ML · 2026-05-06 · unverdicted · novelty 7.0

Gradient matching empirically recovers implicit regularization effects such as l2 penalties from early stopping and dropout in neural networks.

Quantifying and Optimizing Simplicity via Polynomial Representations

cs.AI · 2026-05-28 · unverdicted · novelty 6.0

Polynomial representations yield an effective-degree simplicity metric that predicts generalization across tasks and serves as a differentiable regularizer improving performance in classification and RL.

Memorisation, convergence and generalisation in generative models

stat.ML · 2026-05-20 · unverdicted · novelty 6.0

Linear generative models memorize at small data loads but converge continuously once samples scale linearly with dimension; this convergence is insensitive to sharp recovery of principal latent factors.

Deep sequence models tend to memorize geometrically; it is unclear why

cs.LG · 2025-10-30 · unverdicted · novelty 6.0

Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

cs.LG · 2024-01-02 · unverdicted · novelty 6.0

SPIN lets weak LLMs become strong by self-generating training data from previous model versions and training to prefer human-annotated responses over its own outputs, outperforming DPO even with extra GPT-4 data on benchmarks.

Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

stat.ML · 2022-06-02 · unverdicted · novelty 6.0

For orthogonal inputs, gradient flow on shallow ReLU nets with MSE loss at small init converges to zero loss, exhibits min-variation-norm bias, initial alignment, and saddle-to-saddle dynamics.

Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

Evolving Parameter Isolation (EPI) periodically updates parameter isolation masks using online gradient signals during supervised fine-tuning to protect emerging task-critical parameters and reduce interference and forgetting.

Probing the Impact of Scale on Data-Efficient, Generalist Transformer World Models for Atari

cs.LG · 2026-05-09 · unverdicted · novelty 5.0

Transformer world models on Atari exhibit game-specific scaling regimes, but joint training on 26 environments produces consistent monotonic gains that improve downstream control policies to a median normalized score of 0.770.

(How) Learning Rates Regulate Catastrophic Overtraining

cs.LG · 2026-04-15 · unverdicted · novelty 5.0

Learning rate decay during SFT increases pretrained model sharpness, which exacerbates catastrophic forgetting and causes overtraining in LLMs.

Learning Sparse Compositional Functions with Norm-Constrained Neural Networks

stat.ML · 2026-05-25 · unverdicted · novelty 4.0

Derives approximation rates and excess risk bounds for Frobenius norm-constrained DNNs learning sparse compositional functions on DAGs, applicable to multi-index models and binary trees while avoiding the curse of dimensionality.

On improving deep learning generalization with adaptive sparse connectivity

cs.NE · 2019-06-27 · unverdicted · novelty 4.0

Sparse MLPs trained via SET plus neuron pruning achieve competitive performance on 15 datasets while pruning ~50% of hidden neurons and keeping parameter count linear in neuron count.

Nexus: Same Pretraining Loss, Better Downstream Generalization via Common Minima

cs.LG · 2026-04-10

citing papers explorer

Showing 1 of 1 citing paper after filters.

Deep sequence models tend to memorize geometrically; it is unclear why cs.LG · 2025-10-30 · unverdicted · none · ref 128 · internal anchor
Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer