Manifold constraints via the new MACRO optimizer independently bound activation scales and enforce rotational equilibrium in LLM pre-training, subsuming RMS normalization and decoupled weight decay while delivering competitive performance with convergence guarantees.
An exponential learning rate schedule for deep learning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
NDSI-BWE deploys seven nonlinear-dynamics discriminators and a dual-stream ConformerNeXt generator to claim new state-of-the-art results in speech bandwidth extension.
citing papers explorer
-
Demystifying Manifold Constraints in LLM Pre-training
Manifold constraints via the new MACRO optimizer independently bound activation scales and enforce rotational equilibrium in LLM pre-training, subsuming RMS normalization and decoupled weight decay while delivering competitive performance with convergence guarantees.
-
CIS-BWE: Chaos-Informed Speech Bandwidth Extension
NDSI-BWE deploys seven nonlinear-dynamics discriminators and a dual-stream ConformerNeXt generator to claim new state-of-the-art results in speech bandwidth extension.