pith. sign in

Adamp: Slowing down the slowdown for momentum optimizers on scale-invariant weights

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 2 2025 1

verdicts

UNVERDICTED 3

roles

background 1

polarities

background 1

representative citing papers

Demystifying Manifold Constraints in LLM Pre-training

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

Manifold constraints via the new MACRO optimizer independently bound activation scales and enforce rotational equilibrium in LLM pre-training, subsuming RMS normalization and decoupled weight decay while delivering competitive performance with convergence guarantees.

Nora: Normalized Orthogonal Row Alignment for Scalable Matrix Optimizer

cs.LG · 2026-05-05 · unverdicted · novelty 4.0

Nora is a matrix optimizer that stabilizes weight norms and angular velocities through row-wise momentum projection onto the orthogonal complement of the weights while approximating structured preconditioning with O(mn) complexity and proven scalability.

citing papers explorer

Showing 3 of 3 citing papers.

  • Demystifying Manifold Constraints in LLM Pre-training cs.LG · 2026-05-06 · unverdicted · none · ref 59

    Manifold constraints via the new MACRO optimizer independently bound activation scales and enforce rotational equilibrium in LLM pre-training, subsuming RMS normalization and decoupled weight decay while delivering competitive performance with convergence guarantees.

  • mRadNet: A Compact Radar Object Detector with MetaFormer eess.SP · 2025-09-11 · unverdicted · none · ref 24

    mRadNet improves state-of-the-art radar object detection on the CRUW dataset while using the fewest parameters and lowest FLOPs among compared models.

  • Nora: Normalized Orthogonal Row Alignment for Scalable Matrix Optimizer cs.LG · 2026-05-05 · unverdicted · none · ref 16

    Nora is a matrix optimizer that stabilizes weight norms and angular velocities through row-wise momentum projection onto the orthogonal complement of the weights while approximating structured preconditioning with O(mn) complexity and proven scalability.