Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901

17 [BMR+20] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al · 1901

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Rethinking Language Model Scaling under Transferable Hypersphere Optimization

cs.LG · 2026-03-30 · conditional · novelty 6.0

HyperP transfers optimal learning rates across model width, depth, tokens, and MoE granularity under Frobenius-sphere constraints, delivering stable scaling and 1.58x efficiency gains.

citing papers explorer

Showing 1 of 1 citing paper.

Rethinking Language Model Scaling under Transferable Hypersphere Optimization cs.LG · 2026-03-30 · conditional · none · ref 3
HyperP transfers optimal learning rates across model width, depth, tokens, and MoE granularity under Frobenius-sphere constraints, delivering stable scaling and 1.58x efficiency gains.

Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901

fields

years

verdicts

representative citing papers

citing papers explorer