pith. sign in

arxiv: 1901.08244 · v1 · pith:4TJ4X5XLnew · submitted 2019-01-24 · 💻 cs.LG · stat.ML

Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians

classification 💻 cs.LG stat.ML
keywords outliersstructurehessianspectrumcovariancegradientsmeansused
0
0 comments X
read the original abstract

We consider deep classifying neural networks. We expose a structure in the derivative of the logits with respect to the parameters of the model, which is used to explain the existence of outliers in the spectrum of the Hessian. Previous works decomposed the Hessian into two components, attributing the outliers to one of them, the so-called Covariance of gradients. We show this term is not a Covariance but a second moment matrix, i.e., it is influenced by means of gradients. These means possess an additive two-way structure that is the source of the outliers in the spectrum. This structure can be used to approximate the principal subspace of the Hessian using certain "averaging" operations, avoiding the need for high-dimensional eigenanalysis. We corroborate this claim across different datasets, architectures and sample sizes.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. The Implicit Bias of Depth: From Neural Collapse to Softmax Codes

    cs.LG 2026-05 unverdicted novelty 7.0

    Depth induces an implicit low-rank bias in deep unconstrained feature models trained with unregularized multiclass cross-entropy, promoting softmax codes over neural collapse via more efficient norm propagation.

  2. Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization

    cs.LG 2019-07 unverdicted novelty 4.0

    Provides Hessian-based theoretical characterizations of SGD dynamics and a scale-invariant generalization bound for deep nets, backed by experiments on synthetic data, MNIST, and CIFAR-10.