Towards quantifying the hessian structure of neural networks.arXiv preprint arXiv:2505.02809

Dong, Z · 2025 · arXiv 2505.02809

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Why Muon Outperforms Adam: A Curvature Perspective

cs.LG · 2026-06-03 · conditional · novelty 7.0

Muon outperforms Adam by reducing curvature penalty via lower Normalized Directional Sharpness, as shown via Taylor approximation on LLM training and proven on stylized quadratic problems with heterogeneous curvature.

Outer-Momentum Restarting in High-Dimensional Two-Phase Optimization

cs.LG · 2026-05-27 · unverdicted · novelty 5.0

Periodic outer-momentum restarts in two-phase optimizers exploit phase cancellation in a linearized NTK model to widen stable learning-rate and momentum ranges in language-model pretraining.

Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs

cs.CL · 2026-04-14 · unverdicted · novelty 5.0

HETA is a new attribution framework for decoder-only LLMs that combines semantic transition vectors, Hessian-based sensitivity scores, and KL divergence to produce more faithful and human-aligned token attributions than prior methods.

RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization

cs.LG · 2026-03-20 · conditional · novelty 5.0

RMNP preconditions matrix updates via row-wise L2 normalization instead of Newton-Schulz iteration, reducing complexity to O(mn) while matching Muon's non-convex convergence rate and empirical performance.

On the Convergence Analysis of Muon

stat.ML · 2025-05-29 · unverdicted · novelty 5.0

Convergence analysis shows Muon outperforms gradient descent by exploiting low-rank structure in neural network Hessians.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Towards quantifying the hessian structure of neural networks.arXiv preprint arXiv:2505.02809

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer