pith. sign in

Optimizing neural networks with kronecker-factored approx- imate curvature

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 3

citation-polarity summary

fields

cs.LG 5

years

2026 5

roles

background 3

polarities

background 3

representative citing papers

Understanding Sample Efficiency in Predictive Coding

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Predictive coding learns more sample-efficiently than backpropagation because its updates align better with output prediction errors in deep linear networks, with exact conditions for optimal alignment derived.

Fast Gauss-Newton for Multiclass Cross-Entropy

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

FGN is a positive semidefinite under-approximation of the multiclass GGN obtained by exact decomposition into true-vs-rest and within-competitor terms, exact for binary classification and implemented via matrix-free conjugate gradient on a whitened row-space system.

Error whitening: Why Gauss-Newton outperforms Newton

cs.LG · 2026-05-11 · conditional · novelty 6.0

Gauss-Newton descent whitens errors by projecting Newton directions or gradients onto the tangent space, replacing JJ^T with the identity and removing parameterization distortions that affect Newton descent.

AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

AdaPreLoRA pairs the Adafactor diagonal Kronecker preconditioner on the full weight matrix with a closed-form factor-space solve that selects the update minimizing an H_t-weighted imbalance, yielding competitive results on GPT-2, Mistral-7B, Qwen2-7B and diffusion personalization tasks.

citing papers explorer

Showing 5 of 5 citing papers.

  • Understanding Sample Efficiency in Predictive Coding cs.LG · 2026-05-12 · unverdicted · none · ref 11

    Predictive coding learns more sample-efficiently than backpropagation because its updates align better with output prediction errors in deep linear networks, with exact conditions for optimal alignment derived.

  • Fast Gauss-Newton for Multiclass Cross-Entropy cs.LG · 2026-05-07 · unverdicted · none · ref 24

    FGN is a positive semidefinite under-approximation of the multiclass GGN obtained by exact decomposition into true-vs-rest and within-competitor terms, exact for binary classification and implemented via matrix-free conjugate gradient on a whitened row-space system.

  • Error whitening: Why Gauss-Newton outperforms Newton cs.LG · 2026-05-11 · conditional · none · ref 39

    Gauss-Newton descent whitens errors by projecting Newton directions or gradients onto the tangent space, replacing JJ^T with the identity and removing parameterization distortions that affect Newton descent.

  • AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation cs.LG · 2026-05-09 · unverdicted · none · ref 19

    AdaPreLoRA pairs the Adafactor diagonal Kronecker preconditioner on the full weight matrix with a closed-form factor-space solve that selects the update minimizing an H_t-weighted imbalance, yielding competitive results on GPT-2, Mistral-7B, Qwen2-7B and diffusion personalization tasks.

  • MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration cs.LG · 2026-03-30 · unverdicted · none · ref 49

    MuonEq introduces pre-orthogonalization equilibration schemes that improve Muon optimizer performance during large language model pretraining.