pith. sign in

arxiv: 2605.22248 · v1 · pith:AII623JVnew · submitted 2026-05-21 · 💻 cs.LG

No Epoch Like the Present: Robust Climate Emulation Requires Out-of-Distribution Generalisation

Pith reviewed 2026-05-22 08:16 UTC · model grok-4.3

classification 💻 cs.LG
keywords climate emulationout-of-distribution generalisationmachine learningcompositional generalisationhybrid modelsdistribution shiftsseasonal variationrobustness evaluation
0
0 comments X

The pith

Physically motivated decompositions improve out-of-distribution performance in climate emulators with only modest in-distribution trade-offs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that climate emulation is an out-of-distribution projection task where machine learning models trained on present-day data will encounter inevitable distribution shifts from a changing climate. It demonstrates that seasonal variation provides a real-world proxy for these long-term shifts, enabling a zero-overhead test of emulator robustness without synthetic perturbations. Current hybrid-ML emulators degrade significantly under seasonal shifts, but the work identifies compositional generalisation through physically motivated decompositions as a route to better OOD performance. A sympathetic reader would care because accurate future climate projections require models that generalise to unseen atmospheric states rather than overfitting to historical statistics.

Core claim

Climate change induces statistically significant and growing shifts in atmospheric state distributions that render standard present-climate evaluation insufficient. Seasonal variation serves as an effective proxy for these shifts. State-of-the-art hybrid-ML emulators degrade under seasonal shifts, yet physically motivated decompositions substantially improve OOD performance while incurring only modest trade-offs against in-distribution performance, advancing compositional generalisation as a path to robust climate emulation.

What carries the argument

Compositional generalisation via physically motivated decompositions of climate system components.

If this is right

  • Standard evaluation protocols limited to present climate data are insufficient for assessing future reliability.
  • Seasonal shifts supply a rigorous, real-world testbed for measuring emulator robustness at zero overhead.
  • Current hybrid-ML emulators exhibit significant degradation when exposed to these realistic distribution shifts.
  • Physically motivated decompositions deliver substantial OOD gains while preserving most in-distribution accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar proxy-based robustness tests could extend to other gradual-shift domains such as ecological or economic forecasting.
  • Model design for climate emulation should favour modular decompositions that allow novel recombination of observed physical processes.
  • Evaluation frameworks for earth-system models may need to incorporate explicit tests for generalisation across time scales.

Load-bearing premise

Seasonal variation serves as an effective proxy for long-term climate shifts.

What would settle it

A direct comparison showing that emulator performance under seasonal shifts fails to predict performance on actual future climate projections generated by full physical models.

Figures

Figures reproduced from arXiv: 2605.22248 by Anson Lei, Bradley Stanley-Clamp, Hannah M. Christensen, Ingmar Posner.

Figure 1
Figure 1. Figure 1: A zero-overhead framework for evaluating emulator robustness. Inspired by emergent [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Climate change drives a statistically significant and progressively growing shift in the [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Seasonal OOD performance predicts climate-change OOD performance across regions. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Current hybrid-ML emulators degrade systematically with increasing distribution shift. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Composing experts improves robustness under covariate shift without sacrificing ID skill. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Seasonal OOD performance serves as a useful proxy for climate-change OOD performance [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Seasonal OOD performance serves as a useful proxy for climate-change OOD performance [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Pruning stratospheric levels enforces stationarity across the dataset. Multivariate energy [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Multivariate distribution shift can obscure similarities in task-relevant variables. Left: [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Multivariate distribution shift can obscure similarities in radiatively relevant variables. [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Seasonal distributions of lower-tropospheric (850 hPa) liquid and ice cloud tendency [PITH_FULL_IMAGE:figures/full_fig_p029_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Physically grounded decomposition consistently improves robustness to seasonal shift [PITH_FULL_IMAGE:figures/full_fig_p033_12.png] view at source ↗
read the original abstract

Climate emulation is an out-of-distribution (OOD) projection task. This is precisely the challenge where modern Machine Learning (ML) methods are most prone to failure. Consequently, while current ML emulators trained on present climate achieve high in-distribution performance, their future reliability under the inevitable distribution shifts of a changing climate remains a critical, poorly understood blind spot. Addressing this challenge requires a fundamental shift in how we understand, evaluate, and design climate emulators. In this work, we first confirm that climate change drives a statistically significant and progressively growing shift in atmospheric state distributions, rendering standard evaluation protocols insufficient. We empirically establish that seasonal variation serves as an effective proxy for these long-term climate shifts, providing access to $\textit{real-world}$ distribution shifts without recourse to heuristics like synthetic perturbations. Motivated by this link, we introduce a novel evaluation framework that leverages seasonal shifts as a rigorous, zero-overhead testbed for emulator robustness. Our systematic characterisation confirms that current state-of-the-art hybrid-ML emulators degrade significantly under these realistic shifts. Finally, we chart a path forward by identifying compositional generalisation, the ability to form novel combinations from observed elementary components, as a principled route towards robust climate emulation. We demonstrate that physically motivated decompositions substantially improve OOD performance with only modest trade-offs against in-distribution performance, providing an avenue towards ML-driven climate emulators robust to an unknown future.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that climate emulation is fundamentally an out-of-distribution (OOD) task, demonstrates that current hybrid-ML emulators degrade under distribution shifts, establishes seasonal variation as a real-world proxy for long-term climate-driven shifts (avoiding synthetic perturbations), introduces a seasonal-shift evaluation framework, and shows that physically motivated decompositions enable compositional generalization that substantially improves OOD performance with only modest in-distribution trade-offs.

Significance. If the empirical link between seasonal OOD and future climate shifts holds and the reported OOD gains are robust, the work supplies a zero-overhead, physically grounded testbed for emulator robustness and identifies a concrete architectural route (decompositions) toward reliable long-term climate emulation. This is a timely contribution given the reliance of climate science on ML surrogates.

major comments (2)
  1. Abstract and the section establishing the proxy: the claim that seasonal variation is an effective proxy for anthropogenic climate shifts rests on an empirical link whose quantitative strength is not detailed (no effect sizes, confidence intervals, or direct comparison of failure modes against CMIP-style future projections). Because this proxy underpins the entire evaluation framework and the subsequent OOD improvement claims, the absence of such validation makes the central robustness conclusions difficult to assess.
  2. The results section reporting emulator degradation and decomposition gains: the manuscript states statistically significant performance drops and subsequent improvements but provides no numerical effect sizes, error bars, or ablation controls on decomposition granularity. Without these, it is impossible to judge whether the OOD gains are substantial enough to offset the modest in-distribution trade-offs or whether they are sensitive to hyperparameter choices.
minor comments (2)
  1. Notation for the decomposition components and the precise definition of 'compositional generalisation' should be introduced earlier and used consistently to avoid ambiguity when comparing in-distribution versus OOD metrics.
  2. Figure captions and axis labels for the seasonal-shift experiments would benefit from explicit mention of the exact variables and time scales used, improving reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. We address each major comment below, agreeing where additional detail is warranted and outlining specific revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [—] Abstract and the section establishing the proxy: the claim that seasonal variation is an effective proxy for anthropogenic climate shifts rests on an empirical link whose quantitative strength is not detailed (no effect sizes, confidence intervals, or direct comparison of failure modes against CMIP-style future projections). Because this proxy underpins the entire evaluation framework and the subsequent OOD improvement claims, the absence of such validation makes the central robustness conclusions difficult to assess.

    Authors: We agree that the quantitative validation of the seasonal proxy can be strengthened. The manuscript demonstrates statistically significant distribution shifts due to climate change and empirically establishes seasonal variation as a real-world proxy without synthetic perturbations. However, we acknowledge that effect sizes, confidence intervals, and explicit comparisons of failure modes to CMIP-style future projections are not presented in sufficient detail. In the revised manuscript we will expand the proxy-establishment section (and update the abstract accordingly) to include these quantitative measures and direct comparisons, thereby providing a more rigorous foundation for the evaluation framework. revision: yes

  2. Referee: [—] The results section reporting emulator degradation and decomposition gains: the manuscript states statistically significant performance drops and subsequent improvements but provides no numerical effect sizes, error bars, or ablation controls on decomposition granularity. Without these, it is impossible to judge whether the OOD gains are substantial enough to offset the modest in-distribution trade-offs or whether they are sensitive to hyperparameter choices.

    Authors: We concur that more granular numerical reporting is needed. The current manuscript reports statistically significant degradation under seasonal shifts and subsequent gains from physically motivated decompositions, yet lacks explicit effect sizes, error bars, and ablations on decomposition granularity. We will revise the results section to add effect sizes and error bars for all key metrics, together with ablation studies that vary decomposition granularity. These additions will enable readers to assess whether the OOD improvements meaningfully offset the in-distribution trade-offs and to evaluate sensitivity to hyperparameter and design choices. revision: yes

Circularity Check

0 steps flagged

Empirical demonstration with no circular derivation chain

full rationale

The paper is an empirical study confirming distribution shifts, establishing seasonal variation as a proxy via observation, and demonstrating OOD gains from decompositions through experiments. No equations or derivations are present that reduce reported results to fitted parameters or self-citations by construction. The proxy is presented as an empirical finding rather than a definitional assumption that tautologically produces the OOD improvements. The work is self-contained against external benchmarks via real-world seasonal shifts and does not rely on load-bearing self-citations or ansatzes for its central claims.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Central claims rest on the domain assumption that seasonal atmospheric changes are sufficiently analogous to long-term climate shifts to serve as a valid proxy, plus standard machine-learning training assumptions; no new physical entities are postulated and free parameters are limited to ordinary model hyperparameters.

free parameters (1)
  • model hyperparameters and decomposition granularity
    Chosen during training and architecture design to achieve the reported trade-off between in-distribution and OOD performance.
axioms (1)
  • domain assumption seasonal variation serves as an effective proxy for long-term climate distribution shifts
    Invoked to justify the zero-overhead evaluation framework without synthetic perturbations.

pith-pipeline@v0.9.0 · 5792 in / 1382 out tokens · 45242 ms · 2026-05-22T08:16:27.056855+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 2 internal anchors

  1. [1]

    Tokarska, George Hurtt, Elmar Kriegler, Jean-Francois Lamarque, Gerald Meehl, Richard Moss, Susanne E

    Claudia Tebaldi, Kevin Debeire, Veronika Eyring, Erich Fischer, John Fyfe, Pierre Friedlingstein, Reto Knutti, Jason Lowe, Brian O’Neill, Benjamin Sanderson, Detlef van Vuuren, Keywan Riahi, Malte Meinshausen, Zebedee Nicholls, Katarzyna B. Tokarska, George Hurtt, Elmar Kriegler, Jean-Francois Lamarque, Gerald Meehl, Richard Moss, Susanne E. Bauer, Olivie...

  2. [2]

    Kwakkel, Warren E

    Marjolijn Haasnoot, Jan H. Kwakkel, Warren E. Walker, and Judith ter Maat. Dynamic adaptive policy pathways: A method for crafting robust decisions for a deeply uncertain world.Global Environmental Change, 23(2):485–498, April 2013. doi: 10.1016/j.gloenvcha.2012.12.006

  3. [3]

    Nicola Ranger, Tim Reeder, and Jason Lowe. Addressing ‘deep’ uncertainty over long- term climate in major infrastructure projects: four innovations of the Thames Estuary 2100 Project.EURO Journal on Decision Processes, 1(3):233–262, November 2013. doi: 10.1007/s40070-013-0014-5

  4. [4]

    Brether- ton, Xi Chen, Peter Düben, Falko Judt, Marat Khairoutdinov, Daniel Klocke, Chihiro Ko- dama, Luis Kornblueh, Shian-Jiann Lin, Philipp Neumann, William M

    Bjorn Stevens, Masaki Satoh, Ludovic Auger, Joachim Biercamp, Christopher S. Brether- ton, Xi Chen, Peter Düben, Falko Judt, Marat Khairoutdinov, Daniel Klocke, Chihiro Ko- dama, Luis Kornblueh, Shian-Jiann Lin, Philipp Neumann, William M. Putman, Niklas Röber, Ryosuke Shibuya, Benoit Vanniere, Pier Luigi Vidale, Nils Wedi, and Linjiong Zhou. DYAMOND: the...

  5. [5]

    Representing Equilibrium and Nonequilibrium Convection in Large- 10 Scale Models.Journal of the Atmospheric Sciences, 71(2):734–753, February 2014

    Peter Bechtold, Noureddine Semane, Philippe Lopez, Jean-Pierre Chaboureau, Anton Beljaars, and Niels Bormann. Representing Equilibrium and Nonequilibrium Convection in Large- 10 Scale Models.Journal of the Atmospheric Sciences, 71(2):734–753, February 2014. doi: 10.1175/JAS-D-13-0163.1

  6. [6]

    Earth System Modeling 2.0: A Blueprint for Models That Learn From Observations and Targeted High-Resolution Simulations

    Tapio Schneider, Shiwei Lan, Andrew Stuart, and João Teixeira. Earth System Modeling 2.0: A Blueprint for Models That Learn From Observations and Targeted High-Resolution Simulations. Geophysical Research Letters, 44(24):12,396–12,417, 2017. doi: 10.1002/2017GL076101

  7. [7]

    Advances and challenges in climate modeling.Climatic Change, 170(1):18, January 2022

    Omid Alizadeh. Advances and challenges in climate modeling.Climatic Change, 170(1):18, January 2022. doi: 10.1007/s10584-021-03298-4

  8. [8]

    Griffin Mooers, Michael Pritchard, Tom Beucler, Jordan Ott, Galen Yacalis, Pierre Baldi, and Pierre Gentine. Assessing the Potential of Deep Learning for Emulating Cloud Superparameter- ization in Climate Models With Real-Geography Boundary Conditions.Journal of Advances in Modeling Earth Systems, 13(5), 2021. doi: 10.1029/2020MS002385

  9. [9]

    Will, Gunnar Behrens, Julius J

    Sungduk Yu, Zeyuan Hu, Akshay Subramaniam, Walter Hannah, Liran Peng, Jerry Lin, Mo- hamed Aziz Bhouri, Ritwik Gupta, Björn Lütjens, Justus C. Will, Gunnar Behrens, Julius J. M. Busecke, Nora Loose, Charles I. Stern, Tom Beucler, Bryce Harrop, Helge Heuer, Ben- jamin R. Hillman, Andrea Jenney, Nana Liu, Alistair White, Tian Zheng, Zhiming Kuang, Fiaz Ahme...

  10. [10]

    Brenowitz, Tom Beucler, Michael Pritchard, and Christopher S

    Noah D. Brenowitz, Tom Beucler, Michael Pritchard, and Christopher S. Bretherton. Interpreting and Stabilizing Machine-Learning Parametrizations of Convection.Journal of the Atmospheric Sciences, 77(12):4357–4375, December 2020. doi: 10.1175/JAS-D-20-0082.1

  11. [11]

    Clark, Anna Kwa, W

    Oliver Watt-Meyer, Brian Henn, Jeremy McGibbon, Spencer K. Clark, Anna Kwa, W. Andre Perkins, Elynn Wu, Lucas Harris, and Christopher S. Bretherton. ACE2: accurately learn- ing subseasonal to decadal atmospheric variability and forced responses.npj Climate and Atmospheric Science, 8(1):205, May 2025. doi: 10.1038/s41612-025-01090-0

  12. [12]

    Brenner, and Stephan Hoyer

    Dmitrii Kochkov, Janni Yuval, Ian Langmore, Peter Norgaard, Jamie Smith, Griffin Mooers, Milan Klöwer, James Lottes, Stephan Rasp, Peter Düben, Sam Hatfield, Peter Battaglia, Alvaro Sanchez-Gonzalez, Matthew Willson, Michael P. Brenner, and Stephan Hoyer. Neural general circulation models for weather and climate.Nature, 632(8027):1060–1066, August 2024. d...

  13. [13]

    Navigating the Noise: Bringing Clarity to ML Parameterization Design With \boldsym- bol\mathcalO\(100) Ensembles

    Jerry Lin, Sungduk Yu, Liran Peng, Tom Beucler, Eliot Wong-Toi, Zeyuan Hu, Pierre Gentine, Margarita Geleta, and Mike Pritchard. Navigating the Noise: Bringing Clarity to ML Parameter- ization Design With \boldsymbol\mathcalO\(100) Ensembles.Journal of Advances in Modeling Earth Systems, 17(4), 2025. doi: 10.1029/2024MS004551

  14. [14]

    Jerry Lin, Zeyuan Hu, Tom Beucler, Katherine Frields, Hannah Christensen, Walter Hannah, Helge Heuer, Peter Ukkonnen, Laura A. Mansfield, Tian Zheng, Liran Peng, Ritwik Gupta, Pierre Gentine, Yusef Al-Naher, Mingjiang Duan, Kyo Hattori, Weiliang Ji, Chunhan Li, Kippei Matsuda, Naoki Murakami, Shlomo Ron, Marec Serlin, Hongjian Song, Yuma Tanabe, Daisuke Y...

  15. [15]

    Gentine, M

    P. Gentine, M. Pritchard, S. Rasp, G. Reinaudi, and G. Yacalis. Could Machine Learning Break the Convection Parameterization Deadlock?Geophysical Research Letters, 45(11):5742–5751,

  16. [16]

    doi: 10.1029/2018GL078202

  17. [17]

    Brenowitz and Christopher S

    Noah D. Brenowitz and Christopher S. Bretherton. Spatially Extended Tests of a Neural Network Parametrization Trained by Coarse-Graining.Journal of Advances in Modeling Earth Systems, 11(8):2728–2744, August 2019. doi: 10.1029/2019MS001711. 11

  18. [18]

    Deep Learning for the Parametrization of Subgrid Processes in Climate Models

    Pierre Gentine, Veronika Eyring, and Tom Beucler. Deep Learning for the Parametrization of Subgrid Processes in Climate Models. InDeep Learning for the Earth Sciences, pages 307–314. John Wiley & Sons, Ltd, 2021. doi: 10.1002/9781119646181.ch21

  19. [19]

    Zhang, Xiaomeng Huang, and Yong Wang

    Yilun Han, Guang J. Zhang, Xiaomeng Huang, and Yong Wang. A Moist Physics Parameteriza- tion Based on Deep Learning.Journal of Advances in Modeling Earth Systems, 12(9), 2020. doi: 10.1029/2020MS002076

  20. [20]

    Gunnar Behrens, Tom Beucler, Fernando Iglesias-Suarez, Sungduk Yu, Pierre Gentine, Michael Pritchard, Mierk Schwabe, and Veronika Eyring. Simulating Atmospheric Processes in Earth System Models and Quantifying Uncertainties With Deep Learning Multi-Member and Stochas- tic Parameterizations.Journal of Advances in Modeling Earth Systems, 17(4), 2025. doi: 1...

  21. [21]

    Generalizable neural-network parameteri- zation of mesoscale eddies in idealized and global ocean models, July 2025

    Pavel Perezhogin, Alistair Adcroft, and Laure Zanna. Generalizable neural-network parameteri- zation of mesoscale eddies in idealized and global ocean models, July 2025

  22. [22]

    Han- nah, Noah D

    Zeyuan Hu, Akshay Subramaniam, Zhiming Kuang, Jerry Lin, Sungduk Yu, Walter M. Han- nah, Noah D. Brenowitz, Josh Romero, and Michael S. Pritchard. Stable Machine-Learning Parameterization of Subgrid Processes in a Comprehensive Atmospheric Model Learned From Embedded Convection-Permitting Simulations.Journal of Advances in Modeling Earth Systems, 17(7), 2...

  23. [23]

    Clark, Bill Hurlin, Oliver Watt-Meyer, Alistair Adcroft, Chris Bretherton, and Laure Zanna

    William Gregory, Mitchell Bushuk, James Duncan, Elynn Wu, Adam Subel, Spencer K. Clark, Bill Hurlin, Oliver Watt-Meyer, Alistair Adcroft, Chris Bretherton, and Laure Zanna. FloeNet: A mass-conserving global sea ice emulator that generalizes across climates, March 2026. URL https://arxiv.org/abs/2603.12449

  24. [24]

    O’Gorman and John G

    Paul A. O’Gorman and John G. Dwyer. Using Machine Learning to Parameterize Moist Convection: Potential for Modeling of Climate, Climate Change, and Extreme Events.Journal of Advances in Modeling Earth Systems, 10(10):2548–2563, 2018. doi: 10.1029/2018MS001351

  25. [25]

    Towards Physically- Consistent, Data-Driven Models of Convection

    Tom Beucler, Michael Pritchard, Pierre Gentine, and Stephan Rasp. Towards Physically- Consistent, Data-Driven Models of Convection. InIGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium, pages 3987–3990, September 2020. doi: 10.1109/ IGARSS39084.2020.9324569

  26. [26]

    Towards Physically Consistent Deep Learning For Climate Model Parameterizations

    Birgit Kühbacher, Fernando Iglesias-Suarez, Niki Kilbertus, and Veronika Eyring. Towards Physically Consistent Deep Learning For Climate Model Parameterizations. In2024 Interna- tional Conference on Machine Learning and Applications (ICMLA), pages 280–287, December

  27. [27]

    doi: 10.1109/ICMLA61862.2024.00044

  28. [28]

    Zhang, and Yong Wang

    Yilun Han, Guang J. Zhang, and Yong Wang. An Ensemble of Neural Networks for Moist Physics Processes, Its Generalizability and Stable Integration.Journal of Advances in Modeling Earth Systems, 15(10), 2023. doi: 10.1029/2022MS003508

  29. [29]

    O’Gorman, J

    Tom Beucler, Pierre Gentine, Janni Yuval, Ankitesh Gupta, Liran Peng, Jerry Lin, Sungduk Yu, Stephan Rasp, Fiaz Ahmed, Paul A. O’Gorman, J. David Neelin, Nicholas J. Lutsko, and Michael Pritchard. Climate-Invariant Machine Learning. February 2024. doi: 110.1126/sciadv. adj7250

  30. [30]

    Robustness of AI-based weather forecasts in a changing climate, September 2024

    Thomas Rackow, Nikolay Koldunov, Christian Lessig, Irina Sandu, Mihai Alexe, Matthew Chantry, Mariana Clare, Jesper Dramsch, Florian Pappenberger, Xabier Pedruzo-Bagazgoitia, Steffen Tietsche, and Thomas Jung. Robustness of AI-based weather forecasts in a changing climate, September 2024. URLhttps://arxiv.org/abs/2409.18529

  31. [31]

    Causally-Informed Deep Learning to Improve Climate Models and Projections.Journal of Geophysical Research: Atmospheres, 129(4), 2024

    Fernando Iglesias-Suarez, Pierre Gentine, Breixo Solino-Fernandez, Tom Beucler, Michael Pritchard, Jakob Runge, and Veronika Eyring. Causally-Informed Deep Learning to Improve Climate Models and Projections.Journal of Geophysical Research: Atmospheres, 129(4), 2024. doi: 10.1029/2023JD039202

  32. [32]

    Stress- testing the coupled behavior of hybrid physics-machine learning climate simulations on an unseen, warmer climate, January 2024

    Jerry Lin, Mohamed Aziz Bhouri, Tom Beucler, Sungduk Yu, and Michael Pritchard. Stress- testing the coupled behavior of hybrid physics-machine learning climate simulations on an unseen, warmer climate, January 2024. URLhttps://arxiv.org/abs/2401.02098. 12

  33. [33]

    Machine learning and the quest for objectivity in climate model parameterization.Climatic Change, 176(8):101, July 2023

    Julie Jebeile, Vincent Lam, Mason Majszak, and Tim Räz. Machine learning and the quest for objectivity in climate model parameterization.Climatic Change, 176(8):101, July 2023. doi: 10.1007/s10584-023-03532-1

  34. [34]

    Pritchard, and Pierre Gentine

    Mohamed Aziz Bhouri, Liran Peng, Michael S. Pritchard, and Pierre Gentine. Multi-fidelity climate model parameterization for better generalization and extrapolation, September 2023. URLhttps://arxiv.org/abs/2309.10231

  35. [35]

    Andersson, Andrew El-Kadi, Do- minic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, Remi Lam, and Matthew Willson

    Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R. Andersson, Andrew El-Kadi, Do- minic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, Remi Lam, and Matthew Willson. Probabilistic weather forecasting with machine learning.Nature, 637(8044): 84–90, January 2025. doi: 10.1038/s41586-024-08252-9

  36. [36]

    Watson-Parris, Y

    D. Watson-Parris, Y . Rao, D. Olivié, Ø. Seland, P. Nowack, G. Camps-Valls, P. Stier, S. Bouabid, M. Dewey, E. Fons, J. Gonzalez, P. Harder, K. Jeggle, J. Lenhardt, P. Manshausen, M. Novi- tasari, L. Ricard, and C. Roesch. ClimateBench v1.0: A Benchmark for Data-Driven Cli- mate Projections.Journal of Advances in Modeling Earth Systems, 14(10), 2022. doi:...

  37. [37]

    ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning.Advances in Neural Information Processing Systems, 36:21757–21792, December 2023

    Julia Kaltenborn, Charlotte Lange, Venkatesh Ramesh, Philippe Brouillard, Yaniv Gurwicz, Chandni Nagda, Jakob Runge, Peer Nowack, and David Rolnick. ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning.Advances in Neural Information Processing Systems, 36:21757–21792, December 2023

  38. [38]

    WeatherBench 2: A Benchmark for the Next Generation of Data-Driven Global Weather Models.Journal of Advances in Modeling Earth Systems, 16(6), 2024

    Stephan Rasp, Stephan Hoyer, Alexander Merose, Ian Langmore, Peter Battaglia, Tyler Russell, Alvaro Sanchez-Gonzalez, Vivian Yang, Rob Carver, Shreya Agrawal, Matthew Chantry, Zied Ben Bouallegue, Peter Dueben, Carla Bromberg, Jared Sisk, Luke Barrington, Aaron Bell, and Fei Sha. WeatherBench 2: A Benchmark for the Next Generation of Data-Driven Global We...

  39. [39]

    Stern, Tom Beucler, Bryce Harrop, Benjamin R

    Sungduk Yu, Walter Hannah, Liran Peng, Jerry Lin, Mohamed Aziz Bhouri, Ritwik Gupta, Björn Lütjens, Justus Christopher Will, Gunnar Behrens, Julius Busecke, Nora Loose, Charles I. Stern, Tom Beucler, Bryce Harrop, Benjamin R. Hillman, Andrea Jenney, Savannah Ferretti, Nana Liu, Anima Anandkumar, Noah D. Brenowitz, Veronika Eyring, Nicholas Geneva, Pierre ...

  40. [40]

    Salva Rühling Cachay, Venkatesh Ramesh, Jason N. S. Cole, Howard Barker, and David Rolnick. ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models, November 2021. URLhttps://arxiv.org/abs/2111.14671

  41. [41]

    ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling.Advances in Neural Information Processing Systems, 36:75009–75025, December 2023

    Tung Nguyen, Jason Jewik, Hritik Bansal, Prakhar Sharma, and Aditya Grover. ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling.Advances in Neural Information Processing Systems, 36:75009–75025, December 2023

  42. [42]

    Climax: A foundation model for weather and climate,

    Tung Nguyen, Johannes Brandstetter, Ashish Kapoor, Jayesh K. Gupta, and Aditya Grover. ClimaX: A foundation model for weather and climate, December 2023. URL https://arxiv. org/abs/2301.10343

  43. [43]

    Assessing the Robustness of Climate Foundation Models under No-Analog Distribution Shifts

    Maria Conchita Agana Navarro, Geng Li, Theo Wolf, and Maria Perez-Ortiz. Assessing the Robustness of Climate Foundation Models under No-Analog Distribution Shifts, March 2026. URLhttps://arxiv.org/abs/2603.23043

  44. [44]

    doi:10.1017/9781009157896 , abstract =

    Intergovernmental Panel On Climate Change (Ipcc).Climate Change 2021 – The Physical Science Basis: Working Group I Contribution to the Sixth Assessment Report of the Intergov- ernmental Panel on Climate Change. Cambridge University Press, 1 edition, July 2023. ISBN 978-1-009-15789-6. doi: 10.1017/9781009157896. 13

  45. [45]

    Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang

    Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton Earnshaw, Imran Haque, Sara M. Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang. WILDS: A...

  46. [46]

    In Search of Lost Domain Generalization, July 2020

    Ishaan Gulrajani and David Lopez-Paz. In Search of Lost Domain Generalization, July 2020

  47. [47]

    Igor Goldenberg and Geoffrey I. Webb. Survey of distance measures for quantifying concept drift and shift in numeric data.Knowledge and Information Systems, 60(2):591–615, August

  48. [48]

    doi: 10.1007/s10115-018-1257-z

  49. [49]

    Hans Hersbach, Bill Bell, Paul Berrisford, Shoji Hirahara, András Horányi, Joaquín Muñoz- Sabater, Julien Nicolas, Carole Peubey, Raluca Radu, Dinand Schepers, Adrian Simmons, Cornel Soci, Saleh Abdalla, Xavier Abellan, Gianpaolo Balsamo, Peter Bechtold, Gionata Biavati, Jean Bidlot, Massimo Bonavita, Giovanna De Chiara, Per Dahlgren, Dick Dee, Michail Di...

  50. [50]

    Williamson, Chad W

    Mark S. Williamson, Chad W. Thackeray, Peter M. Cox, Alex Hall, Chris Huntingford, and Femke J. M. M. Nijsse. Emergent constraints on climate sensitivities.Reviews of Modern Physics, 93(2), May 2021. doi: 10.1103/RevModPhys.93.025004

  51. [51]

    Donat, Andrew J

    Markus G. Donat, Andrew J. Pitman, and Oliver Angélil. Understanding and Reducing Future Uncertainty in Midlatitude Daily Heat Extremes Via Land Surface Feedback Constraints. Geophysical Research Letters, 45(19):10,627–10,636, 2018. doi: 10.1029/2018GL079128

  52. [52]

    Cox, Veronika Eyring, and Pierre Friedlingstein

    Sabrina Wenzel, Peter M. Cox, Veronika Eyring, and Pierre Friedlingstein. Projected land photosynthesis constrained by changes in the seasonal cycle of atmospheric CO2.Nature, 538 (7626):499–501, October 2016. doi: 10.1038/nature19772

  53. [53]

    Thackeray and Alex Hall

    Chad W. Thackeray and Alex Hall. An emergent constraint on future Arctic sea-ice albedo feedback.Nature Climate Change, 9(12):972–978, December 2019. doi: 10.1038/ s41558-019-0619-1

  54. [54]

    On the persistent spread in snow-albedo feedback.Climate Dynamics, 42(1):69–81, January 2014

    Xin Qu and Alex Hall. On the persistent spread in snow-albedo feedback.Climate Dynamics, 42(1):69–81, January 2014. doi: 10.1007/s00382-013-1774-0

  55. [55]

    Using the current seasonal cycle to constrain snow albedo feedback in fu- ture climate change.Geophysical Research Letters, 33(3), 2006

    Alex Hall and Xin Qu. Using the current seasonal cycle to constrain snow albedo feedback in fu- ture climate change.Geophysical Research Letters, 33(3), 2006. doi: 10.1029/2005GL025127

  56. [56]

    Jiang, and Hui Su

    Chengxing Zhai, Jonathan H. Jiang, and Hui Su. Long-term cloud change imprinted in seasonal cloud variation: More evidence of high climate sensitivity.Geophysical Research Letters, 42 (20):8729–8737, 2015. doi: 10.1002/2015GL065911

  57. [57]

    Meehl, Myles R

    Reto Knutti, Gerald A. Meehl, Myles R. Allen, and David A. Stainforth. Constraining Climate Sensitivity from the Seasonal Cycle in Surface Temperature.Journal of Climate, 19(17): 4224–4233, September 2006. doi: 10.1175/JCLI3865.1

  58. [58]

    Covey, A

    C. Covey, A. Abe-Ouchi, G. J. Boer, B. A. Boville, U. Cubasch, L. Fairhead, G. M. Flato, H. Gordon, E. Guilyardi, X. Jiang, T. C. Johns, H. Le Treut, G. Madec, G. A. Meehl, R. Miller, A. Noda, S. B. Power, E. Roeckner, G. Russell, E. K. Schneider, R. J. Stouffer, L. Terray, and J.-S. von Storch. The seasonal cycle in coupled ocean-atmosphere general circu...

  59. [59]

    Miller, Rohan Taori, Aditi Raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal Shankar, Percy Liang, Yair Carmon, and Ludwig Schmidt

    John P. Miller, Rohan Taori, Aditi Raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal Shankar, Percy Liang, Yair Carmon, and Ludwig Schmidt. Accuracy on the Line: on the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization. InProceedings of the 38th International Conference on Machine Learning, pages 7721–7735. PMLR, July 2021. 14

  60. [60]

    Accuracy on the wrong line: On the pitfalls of noisy data for out-of-distribution generalisation

    Amartya Sanyal, Yaxi Hu, Yaodong Yu, Yian Ma, Yixin Wang, and Bernhard Schölkopf. Accuracy on the wrong line: On the pitfalls of noisy data for out-of-distribution generalisation. InProceedings of The 28th International Conference on Artificial Intelligence and Statistics, pages 2170–2178. PMLR, April 2025

  61. [61]

    Checkerboard patterns in E3SMv2 and E3SM-MMFv2.Geoscientific Model Development, 15(15):6243–6257, August 2022

    Walter Hannah, Kyle Pressel, Mikhail Ovchinnikov, and Gregory Elsaesser. Checkerboard patterns in E3SMv2 and E3SM-MMFv2.Geoscientific Model Development, 15(15):6243–6257, August 2022. doi: 10.5194/gmd-15-6243-2022

  62. [62]

    Hannah, Andrew M

    Walter M. Hannah, Andrew M. Bradley, Oksana Guba, Qi Tang, Jean-Christophe Golaz, and Jon Wolfe. Separating Physics and Dynamics Grids for Improved Computational Efficiency in Spectral Element Earth System Models.Journal of Advances in Modeling Earth Systems, 13(7): e2020MS002419, 2021. doi: 10.1029/2020MS002419

  63. [63]

    W. M. Hannah, C. R. Jones, B. R. Hillman, M. R. Norman, D. C. Bader, M. A. Taylor, L. R. Leung, M. S. Pritchard, M. D. Branson, G. Lin, K. G. Pressel, and J. M. Lee. Initial Results From the Super-Parameterized E3SM.Journal of Advances in Modeling Earth Systems, 12(1): e2019MS001863, 2020. doi: 10.1029/2019MS001863

  64. [64]

    Unprecedented cloud resolution in a GPU-enabled full-physics atmospheric climate simulation on OLCF’s summit supercomputer

    Matthew R Norman, David C Bader, Christopher Eldred, Walter M Hannah, Benjamin R Hillman, Christopher R Jones, Jungmin M Lee, LR Leung, Isaac Lyngaas, Kyle G Pressel, Sarat Sreepathi, Mark A Taylor, and Xingqiu Yuan. Unprecedented cloud resolution in a GPU-enabled full-physics atmospheric climate simulation on OLCF’s summit supercomputer. The Internationa...

  65. [65]

    doi: 10.1177/10943420211027539

  66. [66]

    What is a cognitive map? Organizing knowledge for flexible behavior.Neuron, 100(2):490–509, 2018

    Timothy EJ Behrens, Timothy H Muller, James CR Whittington, Shirley Mark, Alon B Baram, Kimberly L Stachenfeld, and Zeb Kurth-Nelson. What is a cognitive map? Organizing knowledge for flexible behavior.Neuron, 100(2):490–509, 2018

  67. [67]

    Abstraction and analogy-making in artificial intelligence.Annals of the New York Academy of Sciences, 1505(1):79–101, 2021

    Melanie Mitchell. Abstraction and analogy-making in artificial intelligence.Annals of the New York Academy of Sciences, 1505(1):79–101, 2021

  68. [68]

    MIT press, 2004

    Gregory Murphy.The big book of concepts. MIT press, 2004

  69. [69]

    How to grow a mind: Statistics, structure, and abstraction.science, 331(6022):1279–1285, 2011

    Joshua B Tenenbaum, Charles Kemp, Thomas L Griffiths, and Noah D Goodman. How to grow a mind: Statistics, structure, and abstraction.science, 331(6022):1279–1285, 2011. doi: 10.1126/science.1192788

  70. [70]

    Fodor and Zenon W

    Jerry A. Fodor and Zenon W. Pylyshyn. Connectionism and cognitive architecture: A critical analysis.Cognition, 28(1-2):3–71, March 1988. doi: 10.1016/0010-0277(88)90031-5

  71. [71]

    Inductive biases for deep learning of higher-level cognition

    Anirudh Goyal and Yoshua Bengio. Inductive biases for deep learning of higher-level cognition. Proceedings of the Royal Society A, 478(2266), 2022. doi: 10.1098/rspa.2021.0068

  72. [72]

    On the binding problem in artificial neural networks, 2020

    Klaus Greff, Sjoerd Van Steenkiste, and Jürgen Schmidhuber. On the binding problem in artificial neural networks, 2020. URLhttps://arxiv.org/abs/2012.05208

  73. [73]

    Building machines that learn and think like people.Behavioral and brain sciences, 40:e253, 2017

    Brenden M Lake, Tomer D Ullman, Joshua B Tenenbaum, and Samuel J Gershman. Building machines that learn and think like people.Behavioral and brain sciences, 40:e253, 2017. doi: 10.1017/S0140525X16001837

  74. [74]

    Divergence estimation for multidimensional densities via k -nearest-neighbor distances.IEEE Transactions on Information Theory, 55(5): 2392–2405, 2009

    Qing Wang, Sanjeev R Kulkarni, and Sergio Verdú. Divergence estimation for multidimensional densities via k -nearest-neighbor distances.IEEE Transactions on Information Theory, 55(5): 2392–2405, 2009. doi: 10.1109/TIT.2009.2016060

  75. [75]

    A convnet for the 2020s

    Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022

  76. [76]

    Beyond the Training Data: Confidence-Guided Mixing of Parameterizations in a Hybrid AI-Climate Model

    Helge Heuer, Tom Beucler, Mierk Schwabe, Julien Savre, Manuel Schlund, and Veronika Eyring. Beyond the Training Data: Confidence-Guided Mixing of Parameterizations in a Hybrid AI-Climate Model, March 2026. URLhttps://arxiv.org/abs/2510.08107. 15

  77. [77]

    SPARTAN: A Sparse Transformer Learning Local Causation, November 2024

    Anson Lei, Bernhard Schölkopf, and Ingmar Posner. SPARTAN: A Sparse Transformer Learning Local Causation, November 2024. URLhttps://arxiv.org/abs/2411.06890

  78. [78]

    Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021

    Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021. doi: 10.1109/JPROC.2021.3058954

  79. [79]

    Disentangling dynamical systems: Causal representation learning meets local sparse attention, 2026

    Markus W Baumgartner, Anson Lei, Joe Watson, and Ingmar Posner. Disentangling dynamical systems: Causal representation learning meets local sparse attention, 2026. URL https: //arxiv.org/abs/2603.14483

  80. [80]

    Marrying causal representation learning with dynamical systems for science.Advances in Neural Information Processing Systems, 37:71705–71736, 2024

    Dingling Yao, Caroline Muller, and Francesco Locatello. Marrying causal representation learning with dynamical systems for science.Advances in Neural Information Processing Systems, 37:71705–71736, 2024

Showing first 80 references.