Block-diagonal Hessian-free Optimization for Training Neural Networks

Huishuai Zhang , Caiming Xiong , James Bradbury , Richard Socher

Authors on Pith no claims yet

classification 💻 cs.LG cs.AIstat.ML

keywords deephessian-freemethodneuralapproximationbetterblockblock-diagonal

read the original abstract

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are rarely applied to deep learning in practice because of high computational cost and the need for model-dependent algorithmic variations. We introduce a variant of the Hessian-free method that leverages a block-diagonal approximation of the generalized Gauss-Newton matrix. Our method computes the curvature approximation matrix only for pairs of parameters from the same layer or block of the neural network and performs conjugate gradient updates independently for each block. Experiments on deep autoencoders, deep convolutional networks, and multilayer LSTMs demonstrate better convergence and generalization compared to the original Hessian-free approach and the Adam method.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Importance-Guided Basis Selection for Low-Rank Decomposition of Large Language Models
cs.LG 2026-05 unverdicted novelty 6.0

BSI ranks singular-vector bases for LLM low-rank compression by estimating expected task loss increase via second-order Taylor expansion of the loss and an efficient Hessian-diagonal estimator, outperforming magnitude...