On the Superlinear Relationship between SGD Noise Covariance and Loss Landscape Curvature

· 2026 · cs.LG · arXiv 2602.05600

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Stochastic Gradient Descent (SGD) introduces anisotropic noise that is correlated with the local curvature of the loss landscape, thereby biasing optimization toward flat minima. Prior work often assumes an equivalence between the Fisher Information Matrix and the Hessian for negative log-likelihood losses, leading to the claim that the SGD noise covariance $\mathbf{C}$ is proportional to the Hessian $\mathbf{H}$. We show that this assumption holds only under restrictive conditions that are typically violated in deep neural networks. Using the recently discovered Activity--Weight Duality, we find a more general relationship agnostic to the specific loss formulation, showing that $\mathbf{C} \propto \mathbb{E}_p[\mathbf{h}_p^2]$, where $\mathbf{h}_p$ denotes the per-sample Hessian with $\mathbf{H} = \mathbb{E}_p[\mathbf{h}_p]$. As a consequence, $\mathbf{C}$ and $\mathbf{H}$ commute approximately rather than coincide exactly. We further find that, within the analyzed fully connected layers, their diagonal elements follow per-layer empirical power laws $C_{ii} \propto H_{ii}^{\gamma}$, with layer-dependent fitted exponents bounded by $1 \leq \gamma \leq 2$. Experiments across datasets, architectures, and loss functions support the resulting layerwise bounds, providing a unified characterization of the noise-curvature relationship in deep learning.

representative citing papers

Worker Disagreement Reveals Sharp Directions in Local SGD

cs.LG · 2026-05-26 · unverdicted · novelty 6.0

Worker-average gaps in Local SGD serve as a Hessian-free estimator of the dominant sharp subspace by capturing gradient alignment with high-curvature directions.

citing papers explorer

Showing 1 of 1 citing paper.

Worker Disagreement Reveals Sharp Directions in Local SGD cs.LG · 2026-05-26 · unverdicted · none · ref 18 · internal anchor
Worker-average gaps in Local SGD serve as a Hessian-free estimator of the dominant sharp subspace by capturing gradient alignment with high-curvature directions.

On the Superlinear Relationship between SGD Noise Covariance and Loss Landscape Curvature

fields

years

verdicts

representative citing papers

citing papers explorer