pith. sign in

arxiv: 2606.31248 · v1 · pith:EFYFCNIZnew · submitted 2026-06-30 · ⚛️ physics.ao-ph · cs.CE· cs.LG

Scaling Storm-Resolving Atmospheric AI Simulation to the Entire Planet

Pith reviewed 2026-07-01 02:55 UTC · model grok-4.3

classification ⚛️ physics.ao-ph cs.CEcs.LG
keywords storm-resolving simulationAI emulatorkilometer-scale dynamicsautoregressive modeltile-based trainingenergy-efficient simulationatmospheric modeling
0
0 comments X

The pith

Tile-based autoregressive transformer emulates global 4.9 km atmospheric dynamics after training on only 17 days of data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that an AI model called STRATA can generate stable 24-hour global simulations at storm-resolving 4.9 km resolution. It trains on 17 days of high-resolution physics model output by breaking the globe into small spatial tiles, exploiting the assumption that dynamics stay mostly local over 10-minute steps. This trades scarce long global time series for abundant local samples and reassembles full-Earth forecasts through overlapping-tile blending. The result runs about 50 times more energy-efficiently than the original physics model while producing realistic convective-scale features across regimes, although large-scale biases grow with lead time. An iso-FLOP study indicates that km-scale emulation demands roughly ten times more computation per grid point than coarser AI weather models.

Core claim

STRATA is the first autoregressive AI emulator for global storm-resolving atmospheric dynamics. Trained on 17 days of 4.9-km SCREAM output sampled every 10 minutes, it combines 3D patch embedding, local 3D neighborhood attention, Stereographic Rotary Position Embedding, and a pixel-space de-aliasing decoder to deliver stable 24-hour global rollouts with realistic km-scale dynamics. It achieves 48 simulation days per megawatt-hour, about 50 times better energy efficiency than the SCREAM physics model, and supports 741 simulated days per wall-clock day on 512 H100 GPUs.

What carries the argument

Tile-based autoregressive training on small spatial patches with overlapping blending for global rollout, enabled by the premise that 10-minute atmospheric dynamics are predominantly local.

If this is right

  • Global km-scale emulation becomes feasible on far less data and energy than physics-based storm-resolving models.
  • Iso-FLOP scaling shows convective-scale information density requires about 10 times more computation per grid point than coarse AI weather models.
  • Overlapping-tile blending enables stable autoregressive rollout without explicit global temporal training.
  • Energy cost drops to roughly 1/50th of the SCREAM physics model, allowing longer ensemble or climate-length runs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same local-tile premise could extend to hybrid AI-physics coupling where the emulator supplies fast convective tendencies to a coarser dynamical core.
  • Real observational datasets at comparable resolution and cadence could test whether the learned local mappings generalize beyond the training physics model.
  • Extending the architecture to multi-day lead times would require explicit mechanisms to counteract the observed large-scale drift.

Load-bearing premise

Atmospheric dynamics on 10-minute timescales remain mostly local, allowing small tiles to substitute for full global samples.

What would settle it

A 48-hour global rollout that develops growing large-scale biases or patch-boundary artifacts faster than the reported 24-hour stability window.

Figures

Figures reproduced from arXiv: 2606.31248 by Akshay Subramaniam, Jaideep Pathak, Karthik Kashinath, Mike Pritchard, Mohammad Shoaib Abbas, Naser Mahfouz, Noah Brenowitz, Noel Keen, Peter Caldwell, Suman Ravuri, Tao Ge, Zeyuan Hu.

Figure 1
Figure 1. Figure 1: Twelve-hour STRATA rollout compared with SCREAM. Top: near-surface specific humidity and rain intensity at +12 h from STRATA (left) and the corresponding SCREAM reference simulation (right) on the native 4.9-km output grid (6 × 20482 ≈ 25 M horizontal cells). The colored boxes in the global SCREAM panel mark the zoom locations. Bottom: zoomed comparisons across six representative regimes. Row 1: STRATA for… view at source ↗
Figure 2
Figure 2. Figure 2: STRATA method overview. (a,b) STRATA learns 10-minute updates from local SCREAM tiles and applies the same model globally using overlapping tiles that are blended into a continuous rollout; Duo-grid halo padding supplies geometrically consistent context near cubed-sphere face boundaries. (c) The tile update model combines 3D patch embedding, local attention mechanisms, and a pixel-space de-aliasing decoder… view at source ↗
Figure 3
Figure 3. Figure 3: Patch-size and model-size scaling. (a) inference latency vs. FLOPs per forward pass (estimated as 2NP, where N is the token sequence length and P the number of parameters) for DiT-S/M/L models with horizontal patch sizes ps ∈ {1, 2, 4}; at equal FLOPs, ps = 4 is 1.5–2.9× faster than ps = 1. (b) test loss vs. training FLOPs for the same sweep. Patch size 1 is worse than patch sizes 2 and 4 at equal FLOPs; s… view at source ↗
Figure 4
Figure 4. Figure 4: Predictability diagnostics for 24-hour rollout. (a) Tropical precipitation over the Indo￾Pacific, averaged over 5◦S–5◦N, shows coherent eastward and westward propagation of large-scale precipitation structures during STRATA free rollout. (b) Fractions Skill Score measures neighborhood￾scale precipitation agreement with SCREAM; persistence is shown as an initial-condition memory reference rather than a comp… view at source ↗
Figure 5
Figure 5. Figure 5: Precipitation statistics during rollout. Solid lines show the mean over three rollouts initialized on different dates at the same time of day; shading shows the min–max range. (a) Global￾mean precipitation over 24-hour rollouts. (b) Precipitation amount distribution at 12-hour lead time, showing each rain-rate bin’s contribution to the global mean. (c) Spherical-harmonic power spectrum of precipitation at … view at source ↗
Figure 6
Figure 6. Figure 6: Grid invariance test: near-surface humidity and rain rate from a 3-hour zero-shot rollout over Indonesia on three grids of similar nominal resolution—regular latitude–longitude (left), oblique stereographic rotated 45◦ (middle), and the native SCREAM cubed-sphere (right, training grid). and grid resolution differ across the three setups—but the spatial organization of precipitation and moisture is preserve… view at source ↗
Figure 3
Figure 3. Figure 3: Model names encode the FLOP tier (S/M/L) and horizontal patch size. Within each tier, [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
read the original abstract

Kilometer-scale convection shapes precipitation extremes, tropical organization, and cloud feedbacks, but most global atmospheric models approximate these processes at 25-100 km resolution. Global storm-resolving physics models resolve convective systems explicitly, but at a cost -- roughly one MWh per simulated day on exascale supercomputers -- that limits long-duration simulation. We introduce STRATA (Storm-resolving Tile-based autoRegressive Atmosphere Transformer Architecture), the first autoregressive AI emulator for global storm-resolving atmospheric dynamics. STRATA is trained on the highest-resolution atmospheric dataset yet used for global AI emulation: 17 days of SCREAM physics-model output at 4.9-km resolution (~25 million grid cells) sampled every 10 minutes. Our central premise is that on 10-minute timescales atmospheric dynamics are predominantly local, so training on small spatial tiles trades scarce global temporal samples for abundant local spatial samples and enables global rollout via overlapping-tile blending. STRATA combines 3D patch embedding and local 3D neighborhood attention, a novel Stereographic Rotary Position Embedding (StereoRoPE) for grid-invariant encoding, and a pixel-space de-aliasing decoder that suppresses patch-scale rollout artifacts. An iso-FLOP scaling study reveals that km-scale emulation requires ~10x more FLOPs per grid point than coarse-resolution AI weather models, consistent with the higher information density of convective-scale dynamics. Trained on only 17 days of data, STRATA produces stable 24-hour global rollouts with realistic km-scale dynamics across diverse regimes, though large-scale biases develop with lead time. It achieves 48 simulation days per megawatt-hour -- about 50 times better energy efficiency than the SCREAM physics model -- and 741 simulated days per wall-clock day at 512 H100 GPUs. Code and dataset are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces STRATA, an autoregressive transformer architecture for emulating global storm-resolving atmospheric dynamics at 4.9 km resolution. Trained on 17 days of SCREAM physics-model output sampled every 10 minutes, it uses 3D patch embedding, local neighborhood attention, a novel Stereographic Rotary Position Embedding (StereoRoPE), and a de-aliasing decoder to enable global rollouts from small-tile training via overlapping blending. The central premise is that 10-minute dynamics are predominantly local, allowing the model to produce stable 24-hour global rollouts with realistic km-scale features across regimes (despite developing large-scale biases) while achieving 48 simulation days per MWh, roughly 50 times the energy efficiency of the underlying physics model.

Significance. If the performance claims hold under rigorous validation, the work would represent a meaningful advance in AI emulation of convective-scale atmospheric processes, potentially enabling longer-duration high-resolution simulations that are currently limited by computational cost. The public release of code and dataset is a clear strength supporting reproducibility and further development.

major comments (2)
  1. [Abstract] Abstract: The central performance claims of 'stable 24-hour global rollouts with realistic km-scale dynamics across diverse regimes' are presented without any quantitative error metrics (e.g., RMSE, bias, or spectral error against the SCREAM reference), baseline comparisons, or full validation details. This is load-bearing because the abstract itself notes developing large-scale biases with lead time, and the 17-day training set provides only ~2448 global time steps; without these metrics the support for the claims remains moderate.
  2. [Abstract] Abstract (central premise paragraph): The locality assumption—that atmospheric dynamics on 10-minute timescales are predominantly local, enabling global rollout via overlapping-tile blending—is load-bearing for the entire methodology. However, the noted large-scale biases suggest possible non-local couplings or blending inconsistencies; no quantitative checks on global invariants (total mass, energy, or momentum conservation) or cross-tile spectral continuity are referenced, which directly bears on whether the observed biases undermine the 24-hour stability claim.
minor comments (2)
  1. [Abstract] The iso-FLOP scaling study is mentioned but lacks details on the exact FLOP counts per grid point or the functional form of the scaling relation; adding this would clarify the ~10x claim relative to coarse-resolution models.
  2. The definition and implementation details of StereoRoPE are introduced as novel but would benefit from a short equation or pseudocode in the methods to allow readers to assess its grid-invariance properties.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments correctly identify opportunities to strengthen the abstract with quantitative support for our claims and to explicitly validate the locality assumption. We address each point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central performance claims of 'stable 24-hour global rollouts with realistic km-scale dynamics across diverse regimes' are presented without any quantitative error metrics (e.g., RMSE, bias, or spectral error against the SCREAM reference), baseline comparisons, or full validation details. This is load-bearing because the abstract itself notes developing large-scale biases with lead time, and the 17-day training set provides only ~2448 global time steps; without these metrics the support for the claims remains moderate.

    Authors: We agree that the abstract would be strengthened by including key quantitative metrics to support the performance claims. The body of the manuscript contains detailed error metrics, spectral analyses, and regime-specific validation against SCREAM, but these are not summarized in the abstract. In the revision we will add concise quantitative indicators (e.g., 24-hour RMSE for temperature, zonal wind, and precipitation, plus a note on bias growth) while preserving the abstract's brevity and explicitly referencing the limited temporal sample size. revision: yes

  2. Referee: [Abstract] Abstract (central premise paragraph): The locality assumption—that atmospheric dynamics on 10-minute timescales are predominantly local, enabling global rollout via overlapping-tile blending—is load-bearing for the entire methodology. However, the noted large-scale biases suggest possible non-local couplings or blending inconsistencies; no quantitative checks on global invariants (total mass, energy, or momentum conservation) or cross-tile spectral continuity are referenced, which directly bears on whether the observed biases undermine the 24-hour stability claim.

    Authors: The locality premise is central and is empirically supported by the fact that stable 24-hour global rollouts are achieved from tile-trained models. We acknowledge that the emergence of large-scale biases raises legitimate questions about non-local effects and blending fidelity. The current manuscript does not report explicit global conservation diagnostics or cross-tile spectral continuity metrics. We will add these analyses in revision, including time series of domain-integrated mass and energy drift and wavenumber spectra evaluated across tile boundaries, to directly test whether blending artifacts or missing non-local couplings contribute to the observed biases. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical AI emulator (STRATA) trained directly on external SCREAM physics-model output at 4.9 km resolution. Its central premise is explicitly stated as an assumption ('on 10-minute timescales atmospheric dynamics are predominantly local') rather than derived from any equations or prior results within the paper. No self-definitional reductions, fitted inputs renamed as predictions, load-bearing self-citations, uniqueness theorems, or ansatzes appear in the provided text. Reported outcomes (stable 24-hour rollouts, energy efficiency of 48 simulation days per MWh) are empirical measurements from training and inference, not quantities forced by construction from the model's own parameters or inputs. The approach is therefore self-contained as a data-driven method.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the domain assumption that short-timescale dynamics are sufficiently local for tile training to generalize globally; the architecture introduces a new position embedding whose necessity is not independently verified outside this work.

axioms (1)
  • domain assumption On 10-minute timescales atmospheric dynamics are predominantly local
    Explicitly stated as the central premise enabling the tile-based training strategy.
invented entities (1)
  • Stereographic Rotary Position Embedding (StereoRoPE) no independent evidence
    purpose: Grid-invariant position encoding for spherical data
    New component introduced to handle the geometry of the global grid.

pith-pipeline@v0.9.1-grok · 5909 in / 1277 out tokens · 70481 ms · 2026-07-01T02:55:26.696853+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 23 canonical work pages · 1 internal anchor

  1. [1]

    Sandrine Bony, Bjorn Stevens, Dargan M. W. Frierson, Christian Jakob, Masa Kageyama, Robert Pincus, Theodore G. Shepherd, Steven C. Sherwood, A. Pier Siebesma, Adam H. Sobel, Masahiro Watanabe, and Mark J. Webb. Clouds, circulation and climate sensitivity.Nature Geoscience, 8(4):261–268, April 2015. ISSN 1752-0908. doi: 10.1038/ngeo2398

  2. [2]

    Bretherton, Florent Brient, Kyle G

    Tapio Schneider, João Teixeira, Christopher S. Bretherton, Florent Brient, Kyle G. Pressel, Christoph Schär, and A. Pier Siebesma. Climate goals and computing the future of clouds. Nature Climate Change, 7(1):3–5, January 2017. ISSN 1758-6798. doi: 10.1038/nclimate3190

  3. [3]

    P. M. Caldwell, C. R. Terai, B. Hillman, N. D. Keen, P. Bogenschutz, W. Lin, H. Beydoun, M. Taylor, L. Bertagna, A. M. Bradley, T. C. Clevenger, A. S. Donahue, C. Eldred, J. Foucar, J.-C. Golaz, O. Guba, R. Jacob, J. Johnson, J. Krishna, W. Liu, K. Pressel, A. G. Salinger, B. Singh, A. Steyer, P. Ullrich, D. Wu, X. Yuan, J. Shpund, H.-Y . Ma, and C. S. Ze...

  4. [4]

    Müller, Thomas Rackow, Junhong Lee, Edgar Dolores-Tesillos, Imme Benedict, Matthias Aengenheyster, Razvan Aguri- dan, Gabriele Arduini, Alexander J

    Hans Segura, Xabier Pedruzo-Bagazgoitia, Philipp Weiss, Sebastian K. Müller, Thomas Rackow, Junhong Lee, Edgar Dolores-Tesillos, Imme Benedict, Matthias Aengenheyster, Razvan Aguri- dan, Gabriele Arduini, Alexander J. Baker, Jiawei Bao, Swantje Bastin, Eulàlia Baulenas, Tobias Becker, Sebastian Beyer, Hendryk Bockelmann, Nils Brüggemann, Lukas Brunner, Su...

  5. [5]

    Mark Taylor, Peter M. Caldwell, Luca Bertagna, Conrad Clevenger, Aaron Donahue, James Foucar, Oksana Guba, Benjamin Hillman, Noel Keen, Jayesh Krishna, Matthew Norman, Sarat Sreepathi, Christopher Terai, James B. White, Andrew G Salinger, Renata B McCoy, Lai-yung Ruby Leung, David C. Bader, and Danqing Wu. The Simple Cloud-Resolving E3SM Atmosphere Model ...

  6. [6]

    A. S. Donahue, P. M. Caldwell, L. Bertagna, H. Beydoun, P. A. Bogenschutz, A. M. Bradley, T. C. Clevenger, J. Foucar, C. Golaz, O. Guba, W. Hannah, B. R. Hillman, J. N. Johnson, N. Keen, W. Lin, B. Singh, S. Sreepathi, M. A. Taylor, J. Tian, C. R. Terai, P. A. Ullrich, X. Yuan, and Y . Zhang. To Exascale and Beyond—The Simple Cloud-Resolving E3SM Atmosphe...

  7. [7]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention Is All You Need, August 2023

  8. [8]

    FourCastNet: A Global Data- driven High-resolution Weather Model using Adaptive Fourier Neural Operators, February 2022

    Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, Pedram Hassanzadeh, Karthik Kashinath, and Animashree Anandkumar. FourCastNet: A Global Data- driven High-resolution Weather Model using Adaptive Fourier Neural Operators, February 2022

  9. [9]

    Accurate medium-range global weather forecasting with 3D neural networks.Nature, 619(7970):533– 538, July 2023

    Kaifeng Bi, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian. Accurate medium-range global weather forecasting with 3D neural networks.Nature, 619(7970):533– 538, July 2023. ISSN 1476-4687. doi: 10.1038/s41586-023-06185-3

  10. [10]

    GraphCast: Learning skillful medium-range global weather forecasting, August 2023

    Remi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri, Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn Stott, Alexander Pritzel, Shakir Mohamed, and Peter Battaglia. GraphCast: Learning skillful medium-range global weather forecas...

  11. [11]

    FuXi: A cascade machine learning forecasting system for 15-day global weather forecast

    Lei Chen, Xiaohui Zhong, Feng Zhang, Yuan Cheng, Yinghui Xu, Yuan Qi, and Hao Li. FuXi: A cascade machine learning forecasting system for 15-day global weather forecast. npj Climate and Atmospheric Science, 6(1):190, November 2023. ISSN 2397-3722. doi: 10.1038/s41612-023-00512-1

  12. [12]

    Bruinsma, Ana Lucic, Megan Stanley, Anna Vaughan, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan A

    Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Anna Vaughan, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan A. Weyn, Haiyu Dong, Jayesh K. Gupta, Kit Thambiratnam, Alexander T. Archibald, Chun-Chieh Wu, Elizabeth Heider, Max Welling, Richard E. Turner, and Paris Perdikaris. A Foundation Model for the Earth System, November 2024

  13. [13]

    Scaling transformer neural networks for skillful and reliable medium-range weather forecasting, October 2024

    Tung Nguyen, Rohan Shah, Hritik Bansal, Troy Arcomano, Romit Maulik, Veerabhadra Kota- marthi, Ian Foster, Sandeep Madireddy, and Aditya Grover. Scaling transformer neural networks for skillful and reliable medium-range weather forecasting, October 2024

  14. [14]

    Andersson, Andrew El-Kadi, Do- minic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, Remi Lam, and Matthew Willson

    Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R. Andersson, Andrew El-Kadi, Do- minic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, Remi Lam, and Matthew Willson. GenCast: Diffusion-based ensemble forecasting for medium-range weather, May 2024

  15. [15]

    Simon Lang, Mihai Alexe, Mariana C. A. Clare, Christopher Roberts, Rilwan Adewoyin, Zied Ben Bouallègue, Matthew Chantry, Jesper Dramsch, Peter D. Dueben, Sara Hahner, Pedro Maciel, Ana Prieto-Nemesio, Cathal O’Brien, Florian Pinault, Jan Polster, Baudouin Raoult, Steffen Tietsche, and Martin Leutbecher. AIFS-CRPS: Ensemble forecasting using a model train...

  16. [16]

    Collins, Michael S

    Boris Bonev, Thorsten Kurth, Ankur Mahesh, Mauro Bisson, Jean Kossaifi, Karthik Kashinath, Anima Anandkumar, William D. Collins, Michael S. Pritchard, and Alexander Keller. FourCast- 11 Net 3: A geometric approach to probabilistic machine-learning weather forecasting at scale, July 2025

  17. [17]

    Demystifying Data-Driven Probabilistic Medium-Range Weather Forecasting, January 2026

    Jean Kossaifi, Nikola Kovachki, Morteza Mardani, Daniel Leibovici, Suman Ravuri, Ira Shokar, Edoardo Calvello, Mohammad Shoaib Abbas, Peter Harrington, Ashay Subramaniam, Noah Brenowitz, Boris Bonev, Wonmin Byeon, Karsten Kreis, Dale Durran, Arash Vahdat, Mike Pritchard, and Jan Kautz. Demystifying Data-Driven Probabilistic Medium-Range Weather Forecastin...

  18. [18]

    Scaling Laws of Global Weather Models, February 2026

    Yuejiang Yu, Langwen Huang, Alexandru Calotoiu, and Torsten Hoefler. Scaling Laws of Global Weather Models, February 2026

  19. [19]

    Brenner, and Stephan Hoyer

    Dmitrii Kochkov, Janni Yuval, Ian Langmore, Peter Norgaard, Jamie Smith, Griffin Mooers, Milan Klöwer, James Lottes, Stephan Rasp, Peter Düben, Sam Hatfield, Peter Battaglia, Alvaro Sanchez-Gonzalez, Matthew Willson, Michael P. Brenner, and Stephan Hoyer. Neural General Circulation Models for Weather and Climate.Nature, July 2024. ISSN 0028-0836, 1476-468...

  20. [20]

    Durran, Zihui Liu, Zachary I

    Nathaniel Cresswell-Clay, Bowen Liu, Dale R. Durran, Zihui Liu, Zachary I. Espinosa, Raul A. Moreno, and Matthias Karlbauer. A Deep Learning Earth System Model for Efficient Simulation of the Observed Climate.AGU Advances, 6(4):e2025A V001706, 2025. ISSN 2576-604X. doi: 10.1029/2025A V001706

  21. [21]

    Clark, Anna Kwa, W

    Oliver Watt-Meyer, Brian Henn, Jeremy McGibbon, Spencer K. Clark, Anna Kwa, W. Andre Perkins, Elynn Wu, Lucas Harris, and Christopher S. Bretherton. ACE2: Accurately learning subseasonal to decadal atmospheric variability and forced responses, November 2024

  22. [22]

    Bretherton, and Rose Yu

    Salva Rühling Cachay, Brian Henn, Oliver Watt-Meyer, Christopher S. Bretherton, and Rose Yu. Probabilistic Emulation of a Global Climate Model with Spherical DYffusion, November 2024

  23. [23]

    Chapman, John S

    William E. Chapman, John S. Schreck, Yingkai Sha, David John Gagne II, Dhamma Kim- para, Laure Zanna, Kirsten J. Mayer, and Judith Berner. CAMulator: Fast Emulation of the Community Atmosphere Model, April 2025

  24. [24]

    Flora and Corey Potvin

    Montgomery L. Flora and Corey Potvin. WoFSCast: A Machine Learning Model for Pre- dicting Thunderstorms at Watch-to-Warning Scales.Geophysical Research Letters, 52(10): e2024GL112383, 2025. ISSN 1944-8007. doi: 10.1029/2024GL112383

  25. [25]

    Smith, Sergey Frolov, Montgomery Flora, and Corey Potvin

    Daniel Abdi, Isidora Jankov, Paul Madden, Vanderlei Vargas, Timothy A. Smith, Sergey Frolov, Montgomery Flora, and Corey Potvin. HRRRCast: A data-driven emulator for regional weather forecasting at convection allowing scales, July 2025

  26. [26]

    Kilometer-Scale Convection Allowing Model Emulation using Generative Diffusion Modeling, August 2024

    Jaideep Pathak, Yair Cohen, Piyush Garg, Peter Harrington, Noah Brenowitz, Dale Durran, Morteza Mardani, Arash Vahdat, Shaoming Xu, Karthik Kashinath, and Michael Pritchard. Kilometer-Scale Convection Allowing Model Emulation using Generative Diffusion Modeling, August 2024

  27. [27]

    Residual Corrective Diffusion Modeling for Km- scale Atmospheric Downscaling, 2023

    Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng- Chin Liu, Arash Vahdat, Mohammad Amin Nabian, Tao Ge, Akshay Subramaniam, Karthik Kashinath, Jan Kautz, and Mike Pritchard. Residual Corrective Diffusion Modeling for Km- scale Atmospheric Downscaling, 2023

  28. [28]

    Brenowitz, Tao Ge, Akshay Subramaniam, Aayush Gupta, David M

    Noah D. Brenowitz, Tao Ge, Akshay Subramaniam, Aayush Gupta, David M. Hall, Morteza Mardani, Arash Vahdat, Karthik Kashinath, and Michael S. Pritchard. Climate in a Bottle: Towards a Generative Foundation Model for the Kilometer-Scale Global Atmosphere, 2025

  29. [29]

    Andre Perkins, Anna Kwa, Jeremy McGibbon, Troy Arcomano, Spencer K

    W. Andre Perkins, Anna Kwa, Jeremy McGibbon, Troy Arcomano, Spencer K. Clark, Oliver Watt-Meyer, Christopher S. Bretherton, and Lucas M. Harris. HiRO-ACE: Fast and skillful AI emulation and downscaling trained on a 3 km global storm-resolving model, February 2026

  30. [30]

    MultiDiffusion: Fusing diffusion paths for controlled image generation.arXiv [cs.CV], February 2023

    Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel. MultiDiffusion: Fusing diffusion paths for controlled image generation.arXiv [cs.CV], February 2023

  31. [31]

    Spherical Fourier Neural Operators: Learning Stable Dynamics on the Sphere, June 2023

    Boris Bonev, Thorsten Kurth, Christian Hundt, Jaideep Pathak, Maximilian Baust, Karthik Kashinath, and Anima Anandkumar. Spherical Fourier Neural Operators: Learning Stable Dynamics on the Sphere, June 2023

  32. [32]

    Forecasting Global Weather with Graph Neural Networks, February 2022

    Ryan Keisler. Forecasting Global Weather with Graph Neural Networks, February 2022. 12

  33. [33]

    Implementation of the Novel Duo-Grid in GFDL’s FV3 Dynamical Core.Journal of Advances in Modeling Earth Systems, 15(12): e2023MS003712, 2023

    Joseph Mouallem, Lucas Harris, and Xi Chen. Implementation of the Novel Duo-Grid in GFDL’s FV3 Dynamical Core.Journal of Advances in Modeling Earth Systems, 15(12): e2023MS003712, 2023. ISSN 1942-2466. doi: 10.1029/2023MS003712

  34. [34]

    Scalable Diffusion Models with Transformers, March 2023

    William Peebles and Saining Xie. Scalable Diffusion Models with Transformers, March 2023

  35. [35]

    Neighborhood Attention Transformer, May 2023

    Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi. Neighborhood Attention Transformer, May 2023

  36. [36]

    Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light, April 2025

    Ali Hassani, Fengzhe Zhou, Aditya Kane, Jiannan Huang, Chieh-Yun Chen, Min Shi, Steven Walton, Markus Hoehnerbach, Vijay Thakkar, Michael Isaev, Qinsheng Zhang, Bing Xu, Haicheng Wu, Wen-mei Hwu, Ming-Yu Liu, and Humphrey Shi. Generalized Neighborhood Attention: Multi-dimensional Sparse Attention at the Speed of Light, April 2025

  37. [37]

    Rae, Oriol Vinyals, and Laurent Sifre

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre...

  38. [38]

    Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling Laws for Neural Language Models, January 2020

  39. [39]

    Making Convolutional Networks Shift-Invariant Again, June 2019

    Richard Zhang. Making Convolutional Networks Shift-Invariant Again, June 2019

  40. [40]

    PixelDiT: Pixel Diffusion Transformers for Image Generation, November 2025

    Yongsheng Yu, Wei Xiong, Weili Nie, Yichen Sheng, Shiqiu Liu, and Jiebo Luo. PixelDiT: Pixel Diffusion Transformers for Image Generation, November 2025

  41. [41]

    RoFormer: Enhanced Transformer with Rotary Position Embedding, November 2023

    Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. RoFormer: Enhanced Transformer with Rotary Position Embedding, November 2023

  42. [42]

    Rotary Position Embedding for Vision Transformer

    Byeongho Heo, Song Park, Dongyoon Han, and Sangdoo Yun. Rotary Position Embedding for Vision Transformer. https://arxiv.org/abs/2403.13298v2, March 2024

  43. [43]

    Durran, Raul A

    Matthias Karlbauer, Nathaniel Cresswell-Clay, Dale R. Durran, Raul A. Moreno, Thorsten Kurth, Boris Bonev, Noah Brenowitz, and Martin V . Butz. Advancing Parsimonious Deep Learning Weather Prediction Using the HEALPix Mesh.Journal of Advances in Modeling Earth Systems, 16(8):e2023MS004021, 2024. ISSN 1942-2466. doi: 10.1029/2023MS004021

  44. [44]

    Madalina Surcel, Isztar Zawadzki, and M. K. Yau. A Study on the Scale Dependence of the Predictability of Precipitation Patterns. January 2015. doi: 10.1175/JAS-D-14-0071.1

  45. [45]

    Roberts and Humphrey W

    Nigel M. Roberts and Humphrey W. Lean. Scale-Selective Verification of Rainfall Ac- cumulations from High-Resolution Forecasts of Convective Events. January 2008. doi: 10.1175/2007MWR2123.1

  46. [46]

    Climate goals and computing the future of clouds

    Tapio Schneider, João Teixeira, Christopher S Bretherton, Florent Brient, Kyle G Pressel, Christoph Schär, and A Pier Siebesma. Climate goals and computing the future of clouds. Nature Climate Change, 7(1):3–5, 2017

  47. [47]

    Pritchard, Christopher S

    Hossein Parishani, Michael S. Pritchard, Christopher S. Bretherton, Christopher R. Terai, Matthew C. Wyant, Marat Khairoutdinov, and Balwinder Singh. Insensitivity of the Cloud Response to Surface Warming Under Radical Changes to Boundary Layer Turbulence and Cloud Microphysics: Results From the Ultraparameterized CAM.Journal of Advances in Modeling Earth...

  48. [48]

    C. R. Terai, M. S. Pritchard, P. Blossey, and C. S. Bretherton. The Impact of Resolving Subkilometer Processes on Aerosol-Cloud Interactions of Low-Level Clouds in Global Model Simulations.Journal of Advances in Modeling Earth Systems, 12(11):e2020MS002274, 2020. ISSN 1942-2466. doi: 10.1029/2020MS002274

  49. [49]

    Blossey, Walter M

    Liran Peng, Peter N. Blossey, Walter M. Hannah, Christopher S. Bretherton, Christopher R. Terai, Andrea M. Jenney, and Michael Pritchard. Improving Stratocumulus Cloud Amounts in a 200-m Resolution Multi-Scale Modeling Framework Through Tuning of Its Interior Physics. Journal of Advances in Modeling Earth Systems, 16(3):e2023MS003632, 2024. ISSN 1942-

  50. [50]

    doi: 10.1029/2023MS003632

  51. [51]

    The cumulus parameterization problem: Past, present, and future.Journal of climate, 17(13):2493–2525, 2004

    Akio Arakawa. The cumulus parameterization problem: Past, present, and future.Journal of climate, 17(13):2493–2525, 2004. 13

  52. [52]

    The non-hydrostatic icosahedral atmospheric model: Description and development.Progress in Earth and Planetary Science, 1(1):18, 2014

    Masaki Satoh, Hirofumi Tomita, Hisashi Yashiro, Hiroaki Miura, Chihiro Kodama, Tatsuya Seiki, Akira T Noda, Yohei Yamada, Daisuke Goto, Masahiro Sawada, et al. The non-hydrostatic icosahedral atmospheric model: Description and development.Progress in Earth and Planetary Science, 1(1):18, 2014

  53. [53]

    Klein, Jiwoo Lee, Min-Seop Ahn, Cheng Tao, and Peter J

    Hsi-Yen Ma, Stephen A. Klein, Jiwoo Lee, Min-Seop Ahn, Cheng Tao, and Peter J. Gleckler. Superior Daily and Sub-Daily Precipitation Statistics for Intense and Long-Lived Storms in Global Storm-Resolving Models.Geophysical Research Letters, 49(8):e2021GL096759, April

  54. [54]

    doi: 10.1029/2021GL096759

    ISSN 0094-8276, 1944-8007. doi: 10.1029/2021GL096759

  55. [55]

    Icon- sapphire: simulating the components of the earth system and their interactions at kilometer and subkilometer scales.Geoscientific Model Development, 16(2):779–811, 2023

    Cathy Hohenegger, Peter Korn, Leonidas Linardakis, René Redler, Reiner Schnur, Panagiotis Adamidis, Jiawei Bao, Swantje Bastin, Milad Behravesh, Martin Bergemann, et al. Icon- sapphire: simulating the components of the earth system and their interactions at kilometer and subkilometer scales.Geoscientific Model Development, 16(2):779–811, 2023

  56. [56]

    Destination Earth: The Climate Change Adaptation Digital Twin

    Ioan Hadade, Daniel Klocke, Jussi Enkovaara, Tuomas Lunttila, Thomas Rackow, Jan Frederik Engels, Claudia Frauen, René Redler, Jenni Kontkanen, Thomas Jung, Dmitry Sein, Irina Sandu, Balthasar Reuter, Nils Wedi, Sebastian Milinski, Francisco Doblas-Reyes, Miguel Castrillo, Mario Acosta, Sergi Girona, and Pekka Manninen. Destination Earth: The Climate Chan...

  57. [57]

    Query-Key Normal- ization for Transformers, October 2020

    Alex Henry, Prudhvi Raj Dachapally, Shubham Pawar, and Yuxuan Chen. Query-Key Normal- ization for Transformers, October 2020

  58. [58]

    Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

    Zihan Qiu, Zekun Wang, Bo Zheng, Zeyu Huang, Kaiyue Wen, Songlin Yang, Rui Men, Le Yu, Fei Huang, Suozhi Huang, Dayiheng Liu, Jingren Zhou, and Junyang Lin. Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free. https://arxiv.org/abs/2505.06708v1, May 2025

  59. [59]

    Two modes of change of the distribution of rain.Journal of Climate, 27(22):8357–8371, 2014

    Angeline G Pendergrass and Dennis L Hartmann. Two modes of change of the distribution of rain.Journal of Climate, 27(22):8357–8371, 2014

  60. [60]

    Kooperman, Michael S

    Gabriel J. Kooperman, Michael S. Pritchard, Melissa A. Burt, Mark D. Branson, and David A. Randall. Robust effects of cloud superparameterization on simulated daily rainfall inten- sity statistics across multiple versions of the Community Earth System Model.Journal of Advances in Modeling Earth Systems, 8(1):140–165, 2016. ISSN 1942-2466. doi: 10.1002/201...

  61. [61]

    Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, April 2018

    Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour, April 2018

  62. [62]

    3D” variables have 24 vertical levels; “2D

    Sam McCandlish, Jared Kaplan, Dario Amodei, and OpenAI Dota Team. An Empirical Model of Large-Batch Training, December 2018. 14 Contents 1 Introduction 1 2 Related Work 3 3 Proposed Method 3 3.1 Tile-based training and inference . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . ....