pith. machine review for the scientific record. sign in

arxiv: 1906.02353 · v1 · submitted 2019-06-05 · 💻 cs.LG · stat.ML

Recognition: unknown

Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks

Authors on Pith no claims yet
classification 💻 cs.LG stat.ML
keywords methodsgauss-newtongradientsubsamplednaturalnetworksneuraltraining
0
0 comments X
read the original abstract

We present practical Levenberg-Marquardt variants of Gauss-Newton and natural gradient methods for solving non-convex optimization problems that arise in training deep neural networks involving enormous numbers of variables and huge data sets. Our methods use subsampled Gauss-Newton or Fisher information matrices and either subsampled gradient estimates (fully stochastic) or full gradients (semi-stochastic), which, in the latter case, we prove convergent to a stationary point. By using the Sherman-Morrison-Woodbury formula with automatic differentiation (backpropagation) we show how our methods can be implemented to perform efficiently. Finally, numerical results are presented to demonstrate the effectiveness of our proposed methods.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Fast Gauss-Newton for Multiclass Cross-Entropy

    cs.LG 2026-05 unverdicted novelty 7.0

    FGN is a positive semidefinite under-approximation of the multiclass GGN obtained by exact decomposition into true-vs-rest and within-competitor terms, exact for binary classification and implemented via matrix-free c...

  2. PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training

    cs.LG 2026-04 unverdicted novelty 7.0

    Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.