← back to paper
arxiv: 2603.03099 · 2 revisions
Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails