Scaling Laws of Global Weather Models

Alexandru Calotoiu; Langwen Huang; Torsten Hoefler; Yuejiang Yu

arxiv: 2602.22962 · v2 · pith:CZWOASSNnew · submitted 2026-02-26 · 💻 cs.LG

Scaling Laws of Global Weather Models

Yuejiang Yu , Langwen Huang , Alexandru Calotoiu , Torsten Hoefler This is my paper

classification 💻 cs.LG

keywords modelsmodelperformancetrainingweatherscalingsizecompute

0 comments

read the original abstract

Data-driven models are revolutionizing weather forecasting. To optimize training efficiency and model performance, this paper analyzes empirical scaling laws within this domain. We investigate the relationship between model performance (validation loss) and three key factors: model size ($N$), dataset size ($D$), and compute budget ($C$). Across a range of models, we find that Aurora exhibits the strongest data-scaling behavior: increasing the training dataset by 10x reduces validation loss by up to 3.2x. GraphCast demonstrates the highest parameter efficiency, yet suffers from limited hardware utilization. Our compute-optimal analysis indicates that, under fixed compute budgets, allocating resources to more total training data yields greater performance gains than increasing model size. Furthermore, we analyze model shape and uncover scaling behaviors that differ fundamentally from those observed in language models: weather forecasting models consistently favor increased width over depth. These findings suggest that future weather models should prioritize wider architectures and larger effective training datasets to maximize predictive performance.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Earth System Foundation Model (ESFM): A unified framework for heterogeneous data integration and forecasting
physics.ao-ph 2026-04 unverdicted novelty 6.0

ESFM is a single open foundation model that unifies heterogeneous Earth data sources and forecasts missing regions while preserving inter-variable physical relationships.