New optimizer uses auxiliary loss to imitate low-order Hessian information, replacing gradient squares in Adam-like training with convergence guarantee and some experimental gains.
On the variance of the adaptive learning rate and beyond, 2021
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
math.OC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Low-Order Explicit Hessian Imitation Method for Large-Scale Supervised Machine Learning
New optimizer uses auxiliary loss to imitate low-order Hessian information, replacing gradient squares in Adam-like training with convergence guarantee and some experimental gains.