Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

Leon Bottou; Levent Sagun; Yann LeCun

arxiv: 1611.07476 · v2 · pith:2XXJU5B4new · submitted 2016-11-22 · 💻 cs.LG

Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

Levent Sagun , Leon Bottou , Yann LeCun This is my paper

classification 💻 cs.LG

keywords bulkedgeseigenvalueshessianzeroaroundawaybefore

0 comments

read the original abstract

We look at the eigenvalues of the Hessian of a loss function before and after training. The eigenvalue distribution is seen to be composed of two parts, the bulk which is concentrated around zero, and the edges which are scattered away from zero. We present empirical evidence for the bulk indicating how over-parametrized the system is, and for the edges that depend on the input data.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 12 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Implicit Bias of Depth: From Neural Collapse to Softmax Codes
cs.LG 2026-05 unverdicted novelty 7.0

Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.
AMUSE: Anytime Muon with Stable Gradient Evaluation
cs.LG 2026-05 unverdicted novelty 7.0

AMUSE is a new optimizer integrating Muon orthogonalization with Schedule-Free averaging via adaptive interpolation for schedule-free anytime training that improves Pareto frontiers on vision and LLM tasks.
Backdoor Channels Hidden in Latent Space: Cryptographic Undetectability in Modern Neural Networks
cs.CR 2026-05 unverdicted novelty 7.0

Backdoors can be realized as statistically natural latent directions in modern neural networks, achieving high attack success with negligible clean accuracy loss and resisting existing defenses.
Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape
cs.LG 2019-07 conditional novelty 7.0

Permutation symmetries generate permutation saddles and equal-loss valleys linking equivalent global minima, yielding a lower bound on symmetry-induced critical points.
Mechanistic Anomaly Detection via Functional Attribution
cs.LG 2026-04 unverdicted novelty 6.0

Functional attribution with influence functions detects anomalous mechanisms in neural networks, achieving SOTA backdoor detection (average DER 0.93) on vision benchmarks and improvements on LLMs.
GIST: Targeted Data Selection for Instruction Tuning via Coupled Optimization Geometry
cs.LG 2026-02 unverdicted novelty 6.0

GIST recovers a task-specific low-dimensional subspace from validation gradients using SVD and scores training examples by their alignment within this coupled subspace for LoRA-based instruction tuning.
Low Rank Based Subspace Inference for the Laplace Approximation of Bayesian Neural Networks
cs.LG 2025-02 unverdicted novelty 6.0

Derives optimal low-rank subspace for Laplace approx in BNNs, provides scalable outperforming version, and new comparison metric.
AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments
cs.LG 2026-05 unverdicted novelty 5.0

AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.
Wolkowicz-Styan Upper Bound on the Hessian Eigenspectrum for Cross-Entropy Loss in Nonlinear Smooth Neural Networks
cs.LG 2026-04 unverdicted novelty 5.0

A closed-form upper bound on the maximum Hessian eigenvalue of cross-entropy loss is derived for smooth nonlinear neural networks.
RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization
cs.LG 2026-03 conditional novelty 5.0

RMNP preconditions matrix updates via row-wise L2 normalization instead of Newton-Schulz iteration, reducing complexity to O(mn) while matching Muon's non-convex convergence rate and empirical performance.
On the Convergence Analysis of Muon
stat.ML 2025-05 unverdicted novelty 5.0

Convergence analysis shows Muon outperforms gradient descent by exploiting low-rank structure in neural network Hessians.
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
cs.LG 2019-07 unverdicted novelty 4.0

Provides Hessian-based theoretical characterizations of SGD dynamics and a scale-invariant generalization bound for deep nets, backed by experiments on synthetic data, MNIST, and CIFAR-10.