pith. sign in

Predictable scale: Part i–optimal hyperparameter scaling law in large language model pretraining

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 4 2025 1

roles

background 1

polarities

background 1

representative citing papers

Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

A framework quantifies hyperparameter transfer via scaling-law fit quality, extrapolation robustness, and loss penalty, with ablations showing that μP's advantage over standard parameterization stems from maximizing the embedding layer learning rate to avoid bottlenecks and instabilities in AdamW.

citing papers explorer

Showing 5 of 5 citing papers.