A Sketch-and-Project Analysis of Subsampled Natural Gradient Algorithms

Gil Goldshlager; Jiang Hu; Lin Lin

arxiv: 2508.21022 · v3 · pith:EWGTRBZAnew · submitted 2025-08-28 · 💻 cs.LG · math.OC· stat.ML

A Sketch-and-Project Analysis of Subsampled Natural Gradient Algorithms

Gil Goldshlager , Jiang Hu , Lin Lin This is my paper

classification 💻 cs.LG math.OCstat.ML

keywords sketch-and-projectgradientproxyconvergencedescentnaturalsettingssmall-sample

0 comments

read the original abstract

Subsampled natural gradient descent (SNG) has been used to enable high-precision scientific machine learning, but standard analyses based on stochastic preconditioning fail to provide insight into realistic small-sample settings. We overcome this limitation by instead analyzing SNG as a sketch-and-project method. Motivated by this lens, we discard the usual theoretical proxy which decouples gradients and preconditioners using two independent mini-batches, and we replace it with a new proxy based on squared volume sampling. Under this new proxy we show that the expectation of the SNG direction becomes equal to a preconditioned gradient descent step even in the presence of coupling, leading to (i) global convergence guarantees when using a single mini-batch of any size, and (ii) an explicit characterization of the convergence rate in terms of quantities related to the sketch-and-project structure. These findings in turn yield new insights into small-sample settings, for example by suggesting that the advantage of SNG over SGD is that it can more effectively exploit spectral decay in the model Jacobian. We also extend these ideas to explain a popular structured momentum scheme for SNG, known as SPRING, by showing that it arises naturally from accelerated sketch-and-project methods.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Curvature-Aware Optimization for High-Accuracy Physics-Informed Neural Networks
cs.LG 2026-04 unverdicted novelty 4.0

Curvature-aware optimizers such as natural gradient and self-scaling BFGS/Broyden accelerate PINN convergence and accuracy on PDEs including Helmholtz, Stokes, Burgers, and Euler equations plus stiff ODEs, with new mo...