New optimizer uses auxiliary loss to imitate low-order Hessian information, replacing gradient squares in Adam-like training with convergence guarantee and some experimental gains.
Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Ghosted Layers recovers accuracy in layer-pruned LLMs via a closed-form unconstrained linear operator that aligns boundary activations using a small calibration set.
citing papers explorer
-
Low-Order Explicit Hessian Imitation Method for Large-Scale Supervised Machine Learning
New optimizer uses auxiliary loss to imitate low-order Hessian information, replacing gradient squares in Adam-like training with convergence guarantee and some experimental gains.
-
Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs
Ghosted Layers recovers accuracy in layer-pruned LLMs via a closed-form unconstrained linear operator that aligns boundary activations using a small calibration set.