pith. sign in

Predictable scale: Part i–optimal hyperparameter scaling law in large language model pretraining

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 7 2025 1

roles

background 1

polarities

background 1

clear filters

representative citing papers

MultiHashFormer: Hash-based Generative Language Models

cs.CL · 2026-06-26 · unverdicted · novelty 7.0

MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.

Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

A framework quantifies hyperparameter transfer via scaling-law fit quality, extrapolation robustness, and loss penalty, with ablations showing that μP's advantage over standard parameterization stems from maximizing the embedding layer learning rate to avoid bottlenecks and instabilities in AdamW.

Staged Factorial Screening for Budget-Constrained Micro-Pretraining

cs.LG · 2026-04-27 · unverdicted · novelty 3.0

Staged factorial screening recovers stable early penalties from total batch, depth, and width in 2-10 minute pretraining runs and supports a bridge-centered recommendation through 24-hour continuations on two hosts.

citing papers explorer

Showing 8 of 8 citing papers.