Geometry of optimization and implicit regularization in deep learning

Behnam Neyshabur, Ryota Tomioka, Ruslan Salakhutdinov, Nathan Srebro · 2017 · cs.LG · arXiv 1705.03071

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

We argue that the optimization plays a crucial role in generalization of deep learning models through implicit regularization. We do this by demonstrating that generalization ability is not controlled by network size but rather by some other implicit control. We then demonstrate how changing the empirical optimization procedure can improve generalization, even if actual optimization quality is not affected. We do so by studying the geometry of the parameter space of deep networks, and devising an optimization algorithm attuned to this geometry.

representative citing papers

Mirror Descent Beyond Euclidean Stability: An Exponential Separation in Initialization Sensitivity

cs.LG · 2026-06-09 · conditional · novelty 7.0

Non-quadratic Mirror Descent exhibits exponential initialization sensitivity in convex settings, shown via 3D constructions and KL-regularized simplex examples, with Bregman anchoring proposed for stabilization.

Convergence of Continual Learning in Homogeneous Deep Networks

cs.LG · 2026-06-29 · unverdicted · novelty 6.0

Continual classification in homogeneous models is sequential projections onto margin sets, with local linear convergence under regularity properties for random and cyclic tasks, extended to regression.

Trust, but Verify: Peeling Low-Bit Transformer Networks for Training Monitoring

cs.LG · 2026-05-04 · unverdicted · novelty 5.0

A layer-wise peeling framework creates reference bounds to diagnose under-optimized layers in trained decoder-only transformers, including low-bit and quantized versions.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Convergence of Continual Learning in Homogeneous Deep Networks cs.LG · 2026-06-29 · unverdicted · none · ref 258 · internal anchor
Continual classification in homogeneous models is sequential projections onto margin sets, with local linear convergence under regularity properties for random and cyclic tasks, extended to regression.
Trust, but Verify: Peeling Low-Bit Transformer Networks for Training Monitoring cs.LG · 2026-05-04 · unverdicted · none · ref 7
A layer-wise peeling framework creates reference bounds to diagnose under-optimized layers in trained decoder-only transformers, including low-bit and quantized versions.

Geometry of optimization and implicit regularization in deep learning

fields

years

verdicts

representative citing papers

citing papers explorer