Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

Levent Sagun , Leon Bottou , Yann LeCun

Authors on Pith no claims yet

classification 💻 cs.LG

keywords bulkedgeseigenvalueshessianzeroaroundawaybefore

read the original abstract

We look at the eigenvalues of the Hessian of a loss function before and after training. The eigenvalue distribution is seen to be composed of two parts, the bulk which is concentrated around zero, and the edges which are scattered away from zero. We present empirical evidence for the bulk indicating how over-parametrized the system is, and for the edges that depend on the input data.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Backdoor Channels Hidden in Latent Space: Cryptographic Undetectability in Modern Neural Networks
cs.CR 2026-05 unverdicted novelty 7.0

Backdoors can be realized as statistically natural latent directions in modern neural networks, achieving high attack success with negligible clean accuracy loss and resisting existing defenses.
Mechanistic Anomaly Detection via Functional Attribution
cs.LG 2026-04 unverdicted novelty 6.0

Functional attribution with influence functions detects anomalous mechanisms in neural networks, achieving SOTA backdoor detection (average DER 0.93) on vision benchmarks and improvements on LLMs.
AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments
cs.LG 2026-05 unverdicted novelty 5.0

AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.
Wolkowicz-Styan Upper Bound on the Hessian Eigenspectrum for Cross-Entropy Loss in Nonlinear Smooth Neural Networks
cs.LG 2026-04 unverdicted novelty 5.0

A closed-form upper bound on the maximum Hessian eigenvalue of cross-entropy loss is derived for smooth nonlinear neural networks.
RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization
cs.LG 2026-03 conditional novelty 5.0

RMNP preconditions matrix updates via row-wise L2 normalization instead of Newton-Schulz iteration, reducing complexity to O(mn) while matching Muon's non-convex convergence rate and empirical performance.