Derives mini-batch scaling laws for sketched linear regression, with shared approximation terms and protocol-specific variance/fluctuation scalings under power-law spectrum and source condition.
Scaling law for language models training considering batch size.arXiv preprint arXiv:2412.01505,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
ScaleAcross Explorer jointly optimizes three design dimensions for scale-across training and reports up to 64.62% speedups over production baselines and 37.59% over prior art in testbed and simulation experiments.
citing papers explorer
-
From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression
Derives mini-batch scaling laws for sketched linear regression, with shared approximation terms and protocol-specific variance/fluctuation scalings under power-law spectrum and source condition.
-
ScaleAcross Explorer: Exploring Communication Optimization for Scale-Across AI Model Training
ScaleAcross Explorer jointly optimizes three design dimensions for scale-across training and reports up to 64.62% speedups over production baselines and 37.59% over prior art in testbed and simulation experiments.