Optimizing neural networks with kronecker-factored approx- imate curvature

James Martens, Roger Grosse · 2015

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Understanding Sample Efficiency in Predictive Coding

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Predictive coding learns more sample-efficiently than backpropagation because its updates align better with output prediction errors in deep linear networks, with exact conditions for optimal alignment derived.

Fast Gauss-Newton for Multiclass Cross-Entropy

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

FGN is a positive semidefinite under-approximation of the multiclass GGN obtained by exact decomposition into true-vs-rest and within-competitor terms, exact for binary classification and implemented via matrix-free conjugate gradient on a whitened row-space system.

Error whitening: Why Gauss-Newton outperforms Newton

cs.LG · 2026-05-11 · conditional · novelty 6.0

Gauss-Newton descent whitens errors by projecting Newton directions or gradients onto the tangent space, replacing JJ^T with the identity and removing parameterization distortions that affect Newton descent.

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

AdaPreLoRA pairs the Adafactor diagonal Kronecker preconditioner on the full weight matrix with a closed-form factor-space solve that selects the update minimizing an H_t-weighted imbalance, yielding competitive results on GPT-2, Mistral-7B, Qwen2-7B and diffusion personalization tasks.

MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration

cs.LG · 2026-03-30 · unverdicted · novelty 6.0

MuonEq introduces pre-orthogonalization equilibration schemes that improve Muon optimizer performance during large language model pretraining.

citing papers explorer

Showing 5 of 5 citing papers.

Understanding Sample Efficiency in Predictive Coding cs.LG · 2026-05-12 · unverdicted · none · ref 11
Predictive coding learns more sample-efficiently than backpropagation because its updates align better with output prediction errors in deep linear networks, with exact conditions for optimal alignment derived.
Fast Gauss-Newton for Multiclass Cross-Entropy cs.LG · 2026-05-07 · unverdicted · none · ref 24
FGN is a positive semidefinite under-approximation of the multiclass GGN obtained by exact decomposition into true-vs-rest and within-competitor terms, exact for binary classification and implemented via matrix-free conjugate gradient on a whitened row-space system.
Error whitening: Why Gauss-Newton outperforms Newton cs.LG · 2026-05-11 · conditional · none · ref 39
Gauss-Newton descent whitens errors by projecting Newton directions or gradients onto the tangent space, replacing JJ^T with the identity and removing parameterization distortions that affect Newton descent.
AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation cs.LG · 2026-05-09 · unverdicted · none · ref 19
AdaPreLoRA pairs the Adafactor diagonal Kronecker preconditioner on the full weight matrix with a closed-form factor-space solve that selects the update minimizing an H_t-weighted imbalance, yielding competitive results on GPT-2, Mistral-7B, Qwen2-7B and diffusion personalization tasks.
MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration cs.LG · 2026-03-30 · unverdicted · none · ref 49
MuonEq introduces pre-orthogonalization equilibration schemes that improve Muon optimizer performance during large language model pretraining.

Optimizing neural networks with kronecker-factored approx- imate curvature

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer