Language models are few-shot learners.Advances in Neural Information Processing Systems, 33:1877–1901,

12 Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al · 1901

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Preconditioned Norms: A Unified Framework for Steepest Descent, Quasi-Newton and Adaptive Methods

cs.LG · 2025-10-12 · unverdicted · novelty 6.0

Preconditioned matrix norms unify steepest descent, quasi-Newton, and adaptive optimizers, revealing SGD, Adam, Muon, KL-Shampoo, SOAP, and SPlus as special cases and enabling new methods MuAdam and MuAdam-SANIA that are competitive in experiments.

citing papers explorer

Showing 1 of 1 citing paper.

Preconditioned Norms: A Unified Framework for Steepest Descent, Quasi-Newton and Adaptive Methods cs.LG · 2025-10-12 · unverdicted · none · ref 5
Preconditioned matrix norms unify steepest descent, quasi-Newton, and adaptive optimizers, revealing SGD, Adam, Muon, KL-Shampoo, SOAP, and SPlus as special cases and enabling new methods MuAdam and MuAdam-SANIA that are competitive in experiments.

Language models are few-shot learners.Advances in Neural Information Processing Systems, 33:1877–1901,

fields

years

verdicts

representative citing papers

citing papers explorer