Towards quantifying the hessian structure of neural networks.arXiv preprint arXiv:2505.02809

Zhaorui Dong, Yushun Zhang, Zhi-Quan Luo, Jianfeng Yao, Ruoyu Sun · 2025 · arXiv 2505.02809

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs

cs.CL · 2026-04-14 · unverdicted · novelty 5.0

HETA is a new attribution framework for decoder-only LLMs that combines semantic transition vectors, Hessian-based sensitivity scores, and KL divergence to produce more faithful and human-aligned token attributions than prior methods.

RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization

cs.LG · 2026-03-20 · conditional · novelty 5.0

RMNP preconditions matrix updates via row-wise L2 normalization instead of Newton-Schulz iteration, reducing complexity to O(mn) while matching Muon's non-convex convergence rate and empirical performance.

On the Convergence Analysis of Muon

stat.ML · 2025-05-29 · unverdicted · novelty 5.0

Convergence analysis shows Muon outperforms gradient descent by exploiting low-rank structure in neural network Hessians.

citing papers explorer

Showing 3 of 3 citing papers.

Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs cs.CL · 2026-04-14 · unverdicted · none · ref 11
HETA is a new attribution framework for decoder-only LLMs that combines semantic transition vectors, Hessian-based sensitivity scores, and KL divergence to produce more faithful and human-aligned token attributions than prior methods.
RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization cs.LG · 2026-03-20 · conditional · none · ref 19
RMNP preconditions matrix updates via row-wise L2 normalization instead of Newton-Schulz iteration, reducing complexity to O(mn) while matching Muon's non-convex convergence rate and empirical performance.
On the Convergence Analysis of Muon stat.ML · 2025-05-29 · unverdicted · none · ref 7
Convergence analysis shows Muon outperforms gradient descent by exploiting low-rank structure in neural network Hessians.

Towards quantifying the hessian structure of neural networks.arXiv preprint arXiv:2505.02809

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer