pith. sign in

arxiv: 1808.10340 · v1 · pith:TBAKTDFPnew · submitted 2018-08-30 · 💻 cs.LG · math.DG· math.OC· stat.ML

A Coordinate-Free Construction of Scalable Natural Gradient

classification 💻 cs.LG math.DGmath.OCstat.ML
keywords invariancecoordinate-freegradientk-facnaturalnetworkspropertiesalgorithm
0
0 comments X
read the original abstract

Most neural networks are trained using first-order optimization methods, which are sensitive to the parameterization of the model. Natural gradient descent is invariant to smooth reparameterizations because it is defined in a coordinate-free way, but tractable approximations are typically defined in terms of coordinate systems, and hence may lose the invariance properties. We analyze the invariance properties of the Kronecker-Factored Approximate Curvature (K-FAC) algorithm by constructing the algorithm in a coordinate-free way. We explicitly construct a Riemannian metric under which the natural gradient matches the K-FAC update; invariance to affine transformations of the activations follows immediately. We extend our framework to analyze the invariance properties of K-FAC applied to convolutional networks and recurrent neural networks, as well as metrics other than the usual Fisher metric.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss

    cs.LG 2026-06 unverdicted novelty 6.0

    Double preconditioning (DoPr) improves downstream task performance in test-time feedback settings without consistent gains in validation loss.