Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks

Yi Ren, Donald Goldfarb · 2019 · cs.LG · arXiv 1906.02353

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

We present practical Levenberg-Marquardt variants of Gauss-Newton and natural gradient methods for solving non-convex optimization problems that arise in training deep neural networks involving enormous numbers of variables and huge data sets. Our methods use subsampled Gauss-Newton or Fisher information matrices and either subsampled gradient estimates (fully stochastic) or full gradients (semi-stochastic), which, in the latter case, we prove convergent to a stationary point. By using the Sherman-Morrison-Woodbury formula with automatic differentiation (backpropagation) we show how our methods can be implemented to perform efficiently. Finally, numerical results are presented to demonstrate the effectiveness of our proposed methods.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

One More Time: Revisiting Neural Quantum States from a Reinforcement Learning Perspective

cs.LG · 2026-07-02 · unverdicted · novelty 7.0 · 2 refs

PWO is a trust-region optimizer for autoregressive NQS that improves stability over Adam and stochastic reconfiguration methods while scaling to 1.5B-parameter models on spin systems.

Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.

Fast Gauss-Newton for Multiclass Cross-Entropy

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

FGN is a positive semidefinite under-approximation of the multiclass GGN obtained by exact decomposition into true-vs-rest and within-competitor terms, exact for binary classification and implemented via matrix-free conjugate gradient on a whitened row-space system.

PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training

cs.LG · 2026-04-23 · unverdicted · novelty 7.0

Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.

On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime

cs.LG · 2026-01-06 · unverdicted · novelty 5.0

Preconditioned gradient descent mitigates spectral bias and reduces grokking delays by enabling uniform parameter space exploration in the NTK regime, confirming grokking as a transition to the rich regime.

citing papers explorer

Showing 5 of 5 citing papers after filters.

One More Time: Revisiting Neural Quantum States from a Reinforcement Learning Perspective cs.LG · 2026-07-02 · unverdicted · none · ref 85 · 2 links · internal anchor
PWO is a trust-region optimizer for autoregressive NQS that improves stability over Adam and stochastic reconfiguration methods while scaling to 1.5B-parameter models on spin systems.
Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation cs.LG · 2026-05-18 · unverdicted · none · ref 19 · internal anchor
RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.
Fast Gauss-Newton for Multiclass Cross-Entropy cs.LG · 2026-05-07 · unverdicted · none · ref 30
FGN is a positive semidefinite under-approximation of the multiclass GGN obtained by exact decomposition into true-vs-rest and within-competitor terms, exact for binary classification and implemented via matrix-free conjugate gradient on a whitened row-space system.
PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training cs.LG · 2026-04-23 · unverdicted · none · ref 59
Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.
On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime cs.LG · 2026-01-06 · unverdicted · none · ref 17 · internal anchor
Preconditioned gradient descent mitigates spectral bias and reduces grokking delays by enabling uniform parameter space exploration in the NTK regime, confirming grokking as a transition to the rich regime.

Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer