arXiv preprint arXiv:2210.11693 , year=

Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale , author= · 2022 · arXiv 2210.11693

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

The Falcon Series of Open Language Models

cs.CL · 2023-11-28 · conditional · novelty 6.0

Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.

RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization

cs.LG · 2026-03-20 · conditional · novelty 5.0

RMNP preconditions matrix updates via row-wise L2 normalization instead of Newton-Schulz iteration, reducing complexity to O(mn) while matching Muon's non-convex convergence rate and empirical performance.

Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations

cs.LG · 2026-04-14 · unverdicted · novelty 3.0

A retrospective survey and empirical evaluation of deep learning optimization algorithms that identifies trends, design trade-offs, and future directions.

citing papers explorer

Showing 3 of 3 citing papers.

The Falcon Series of Open Language Models cs.CL · 2023-11-28 · conditional · none · ref 195
Falcon-180B is a 180B-parameter open decoder-only model trained on 3.5 trillion tokens that approaches PaLM-2-Large performance at lower cost and is released with dataset extracts.
RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization cs.LG · 2026-03-20 · conditional · none · ref 12
RMNP preconditions matrix updates via row-wise L2 normalization instead of Newton-Schulz iteration, reducing complexity to O(mn) while matching Muon's non-convex convergence rate and empirical performance.
Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations cs.LG · 2026-04-14 · unverdicted · none · ref 31
A retrospective survey and empirical evaluation of deep learning optimization algorithms that identifies trends, design trade-offs, and future directions.

arXiv preprint arXiv:2210.11693 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer