18 [GRK17] Scott Gray, Alec Radford, and Diederik P Kingma

URL http://arxiv

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.LG · 2020-01-23 · unverdicted · novelty 8.0

Empirical power-law scaling governs language model loss versus model size, data size, and compute, enabling optimal allocation of training compute.

Showing 1 of 1 citing paper.

Scaling Laws for Neural Language Models cs.LG · 2020-01-23 · unverdicted · none · ref 5
Empirical power-law scaling governs language model loss versus model size, data size, and compute, enabling optimal allocation of training compute.