Scaling law for language models training considering batch size.arXiv preprint arXiv:2412.01505,

Xian Shuai, Yiding Wang, Yimeng Wu, Xin Jiang, Xiaozhe Ren · 2025 · arXiv 2412.01505

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression

cs.LG · 2026-05-23 · unverdicted · novelty 7.0

Derives mini-batch scaling laws for sketched linear regression, with shared approximation terms and protocol-specific variance/fluctuation scalings under power-law spectrum and source condition.

ScaleAcross Explorer: Exploring Communication Optimization for Scale-Across AI Model Training

cs.DC · 2026-05-23 · unverdicted · novelty 4.0

ScaleAcross Explorer jointly optimizes three design dimensions for scale-across training and reports up to 64.62% speedups over production baselines and 37.59% over prior art in testbed and simulation experiments.

citing papers explorer

Showing 2 of 2 citing papers.

From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression cs.LG · 2026-05-23 · unverdicted · none · ref 11
Derives mini-batch scaling laws for sketched linear regression, with shared approximation terms and protocol-specific variance/fluctuation scalings under power-law spectrum and source condition.
ScaleAcross Explorer: Exploring Communication Optimization for Scale-Across AI Model Training cs.DC · 2026-05-23 · unverdicted · none · ref 21
ScaleAcross Explorer jointly optimizes three design dimensions for scale-across training and reports up to 64.62% speedups over production baselines and 37.59% over prior art in testbed and simulation experiments.

Scaling law for language models training considering batch size.arXiv preprint arXiv:2412.01505,

fields

years

verdicts

representative citing papers

citing papers explorer