Recognition: unknown
climt-paraformer: Stable Emulation of Convective Parameterization using a Temporal Memory-aware Transformer
Pith reviewed 2026-05-09 22:10 UTC · model grok-4.3
The pith
A Transformer emulator for convective parameterization that models temporal memory achieves lower errors and stays stable over 10-year single-column simulations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The temporal memory-aware Transformer emulator for the Emanuel convective parameterization captures correlations and nonlinear interactions across consecutive atmospheric states, yielding lower offline errors than memory-less multilayer perceptron or LSTM baselines. Sensitivity analysis shows best performance at a memory length of approximately 100 minutes, with longer memory degrading results. When inserted into long-term coupled single-column model simulations, the emulator remains stable over 10 years.
What carries the argument
temporal memory-aware Transformer that processes sequences of atmospheric states via attention to predict convective tendencies
If this is right
- Lower point-wise errors in predicting convective heating and moistening tendencies from atmospheric profiles.
- Decade-long stability in coupled single-column integrations without the instabilities sometimes seen in earlier neural emulators.
- Performance peaks at a memory window of roughly 100 minutes and declines for substantially longer windows.
- Explicit sequence modeling via attention outperforms both memory-less networks and standard recurrent architectures for this task.
Where Pith is reading between the lines
- The same sequence-modeling strategy could be applied to other sub-grid processes such as cloud microphysics or radiation that also depend on recent history.
- An adaptive memory length that varies with local conditions might remove the need to tune a fixed window of 100 minutes.
- If the single-column stability carries over, the emulator could replace expensive convective schemes inside operational global models and thereby free compute for higher resolution or ensemble size.
Load-bearing premise
Superior offline accuracy and decade-scale stability seen in single-column model tests will continue without degradation when the emulator is placed inside full three-dimensional global climate models.
What would settle it
A 10-year integration of the emulator inside a full global climate model that produces growing drift in temperature, humidity, or precipitation fields, or offline errors that exceed those of the original Emanuel scheme on independent data.
Figures
read the original abstract
Accurate representation of moist convective sub-grid-scale processes remains a major challenge in global climate models, as traditional parameterization schemes are both computationally expensive and difficult to scale. Neural network (NN) emulators offer a promising alternative by learning efficient mappings between atmospheric states and convective tendencies while retaining fidelity to the underlying physics. However, most existing NN-based parameterizations are memory-less and rely only on instantaneous inputs, even though convection evolves over time and depends on prior atmospheric states. Recent studies have begun to incorporate convective memory, but they often treat past states as independent features rather than modeling temporal dependencies explicitly. In this work, we develop a temporal memory-aware Transformer emulator for the Emanuel convective parameterization and evaluate it in a single-column climate model (SCM) under both offline and online configurations. The Transformer captures temporal correlations and nonlinear interactions across consecutive atmospheric states. Compared with baseline emulators, including a memory-less multilayer perceptron and a recurrent long short-term memory model, the Transformer achieves lower offline errors. Sensitivity analysis indicates that a memory length of approximately 100 minutes yields the best performance, whereas longer memory degrades performance. We further test the emulator in long-term coupled simulations and show that it remains stable over 10 years. Overall, this study demonstrates the importance of explicit temporal modeling for NN-based parameterizations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a temporal memory-aware Transformer to emulate the Emanuel convective parameterization. It is trained and evaluated in a single-column model (SCM) under offline and online configurations, claiming lower offline errors than memory-less MLP and LSTM baselines, an optimal memory length of ~100 minutes, and 10-year stability in long-term coupled simulations.
Significance. If the performance and stability advantages generalize beyond the SCM, the work would usefully demonstrate the benefit of explicit temporal modeling via Transformers for sub-grid convective processes. This could support more efficient and physically consistent emulators in climate modeling, with the memory-length sensitivity analysis offering practical guidance for related efforts.
major comments (2)
- [Abstract and Results] Abstract and online evaluation sections: The central claims of lower offline errors and 10-year stability are presented without quantitative error metrics (e.g., RMSE values), training/validation split details, baseline implementation specifics, or statistical significance tests. This prevents assessment of whether the reported improvements are meaningful or reproducible.
- [Online evaluation / coupled simulations] Online evaluation and coupled simulations: Stability over 10 years is demonstrated only within an SCM. The abstract and introduction position the emulator as an alternative for global climate models, yet no tests incorporate 3D dynamics, horizontal advection, or multi-column interactions. Because these feedbacks can introduce instabilities absent in SCM, the extrapolation is load-bearing for the applicability claim and requires either scope clarification or additional full-GCM experiments.
minor comments (3)
- [Abstract] Clarify the precise definition of 'coupled simulations' and whether they include any 3D components.
- [Methods] Expand the methods description of the Transformer architecture (layers, heads, memory implementation) and loss function for full reproducibility.
- [Results] Add error bars or significance tests to any comparison figures or tables showing baseline performance.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive comments on our manuscript. We address each major comment below and describe the revisions we will incorporate.
read point-by-point responses
-
Referee: [Abstract and Results] Abstract and online evaluation sections: The central claims of lower offline errors and 10-year stability are presented without quantitative error metrics (e.g., RMSE values), training/validation split details, baseline implementation specifics, or statistical significance tests. This prevents assessment of whether the reported improvements are meaningful or reproducible.
Authors: We agree that quantitative details are necessary for assessing the significance and reproducibility of the results. In the revised manuscript, we will add specific RMSE values comparing the Transformer to the MLP and LSTM baselines, explicit descriptions of the training/validation splits used, details on baseline model implementations, and statistical significance tests (such as paired t-tests or bootstrap confidence intervals) to support the reported improvements. revision: yes
-
Referee: [Online evaluation / coupled simulations] Online evaluation and coupled simulations: Stability over 10 years is demonstrated only within an SCM. The abstract and introduction position the emulator as an alternative for global climate models, yet no tests incorporate 3D dynamics, horizontal advection, or multi-column interactions. Because these feedbacks can introduce instabilities absent in SCM, the extrapolation is load-bearing for the applicability claim and requires either scope clarification or additional full-GCM experiments.
Authors: We acknowledge that the 10-year stability demonstration is limited to SCM experiments, which exclude 3D dynamical feedbacks, horizontal advection, and multi-column interactions. The work intentionally uses the SCM to isolate the convective parameterization emulator. We will revise the abstract and introduction to clarify the study scope, explicitly noting that the emulator is evaluated within an SCM framework as a controlled testbed and that full incorporation of 3D effects remains future work. This will ensure the applicability claims are appropriately scoped. revision: yes
Circularity Check
No circularity: empirical NN emulation evaluated on held-out data
full rationale
The paper trains a Transformer on atmospheric states and convective tendencies generated by the Emanuel parameterization in an SCM, then measures offline error and online stability directly against that data and against independent baselines (MLP, LSTM). No equation, parameter fit, or self-citation reduces the reported performance metrics or stability claim to the training inputs by construction; the results are standard held-out evaluation of a learned mapping. The work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- memory length
axioms (1)
- domain assumption Convective processes depend on prior atmospheric states over time scales of minutes to hours
Reference graph
Works this paper leans on
-
[1]
Nature Climate Change , volume=
Climate goals and computing the future of clouds , author=. Nature Climate Change , volume=. 2017 , publisher=
2017
-
[2]
Causes of higher climate sensitivity in
Zelinka, Mark D and Myers, Timothy A and McCoy, Daniel T and Po-Chedley, Stephen and Caldwell, Peter M and Ceppi, Paulo and Klein, Stephen A and Taylor, Karl E , journal=. Causes of higher climate sensitivity in. 2020 , publisher=
2020
-
[3]
Proceedings of the national academy of sciences , volume=
Deep learning to represent subgrid processes in climate models , author=. Proceedings of the national academy of sciences , volume=. 2018 , publisher=
2018
-
[4]
Geophysical Research Letters , volume=
Could machine learning break the convection parameterization deadlock? , author=. Geophysical Research Letters , volume=. 2018 , publisher=
2018
-
[5]
Effects of stochastic parametrizations in the
Wilks, Daniel S , journal=. Effects of stochastic parametrizations in the. 2005 , publisher=
2005
-
[6]
Machine learning for stochastic parameterization: Generative adversarial networks in the
Gagne, David John and Christensen, Hannah M and Subramanian, Aneesh C and Monahan, Adam H , journal=. Machine learning for stochastic parameterization: Generative adversarial networks in the. 2020 , publisher=
2020
-
[7]
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , volume=
Stochastic parametrizations and model uncertainty in the Lorenz’96 system , author=. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , volume=. 2013 , publisher=
2013
-
[8]
Advances in Neural Information Processing Systems , volume=
ClimSim: A large multi-scale dataset for hybrid physics-ML climate emulation , author=. Advances in Neural Information Processing Systems , volume=
-
[9]
Advances in Neural Information Processing Systems , year=
Attention is all you need , author=. Advances in Neural Information Processing Systems , year=
-
[10]
arXiv preprint arXiv:1906.01787 , year=
Learning deep transformer models for machine translation , author=. arXiv preprint arXiv:1906.01787 , year=
-
[11]
ACM Computing Surveys , volume=
Pre-trained language models for text generation: A survey , author=. ACM Computing Surveys , volume=. 2024 , publisher=
2024
-
[12]
Proceedings of the AAAI conference on artificial intelligence , volume=
Informer: Beyond efficient transformer for long sequence time-series forecasting , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
-
[13]
Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining , pages=
A transformer-based framework for multivariate time series representation learning , author=. Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining , pages=
-
[14]
Transformers in time series: A survey.arXiv preprint arXiv:2202.07125, 2022
Transformers in time series: A survey , author=. arXiv preprint arXiv:2202.07125 , year=
-
[15]
Journal of Advances in Modeling Earth Systems , volume=
Applications of deep learning to ocean data inference and subgrid parameterization , author=. Journal of Advances in Modeling Earth Systems , volume=. 2019 , publisher=
2019
-
[16]
arXiv preprint arXiv:1903.10274 , year=
A data-driven approach to precipitation parameterizations using convolutional encoder-decoder neural networks , author=. arXiv preprint arXiv:1903.10274 , year=
-
[17]
Advances in Artificial Neural Systems , volume=
Using ensemble of neural networks to learn stochastic convection parameterizations for climate and numerical weather prediction models from data simulated by a cloud resolving model , author=. Advances in Artificial Neural Systems , volume=. 2013 , publisher=
2013
-
[18]
Environmental Data Science , volume=
Stochastic parameterization of column physics using generative adversarial networks , author=. Environmental Data Science , volume=. 2022 , publisher=
2022
-
[19]
Journal of Advances in Modeling Earth Systems , volume=
Generative data-driven approaches for stochastic subgrid parameterizations in an idealized ocean model , author=. Journal of Advances in Modeling Earth Systems , volume=. 2023 , publisher=
2023
-
[20]
Advances in Neural Information Processing Systems , volume=
Earthformer: Exploring space-time transformers for earth system forecasting , author=. Advances in Neural Information Processing Systems , volume=
-
[21]
Philosophical Transactions of the Royal Society A , volume=
Physics-informed machine learning: case studies for weather and climate modelling , author=. Philosophical Transactions of the Royal Society A , volume=. 2021 , publisher=
2021
-
[22]
Carbon dioxide and climate
Senior, CA and Mitchell, JFB , journal=. Carbon dioxide and climate
-
[23]
Slingo, A , journal=. A
-
[24]
Ensemble data assimilation with the
Whitaker, Jeffrey S and Hamill, Thomas M and Wei, Xue and Song, Yucheng and Toth, Zoltan , journal=. Ensemble data assimilation with the
-
[25]
Journal of Climate , volume=
A prognostic cloud water parameterization for global climate models , author=. Journal of Climate , volume=. 1996 , publisher=
1996
-
[26]
An improved strategy for the evaluation of cloud parameterizations in
Jakob, Christian , journal=. An improved strategy for the evaluation of cloud parameterizations in. 2003 , publisher=
2003
-
[27]
Journal of Geophysical Research: Atmospheres , volume=
Ensemble data assimilation in the whole atmosphere community climate model , author=. Journal of Geophysical Research: Atmospheres , volume=. 2014 , publisher=
2014
-
[28]
Geophysical Research Letters , volume=
A generalized approach to parameterizing convection combining ensemble and data assimilation techniques , author=. Geophysical Research Letters , volume=. 2002 , publisher=
2002
-
[29]
Combining analog method and ensemble data assimilation: application to the
Tandeo, Pierre and Ailliot, Pierre and Ruiz, Juan and Hannart, Alexis and Chapron, Bertrand and Cuzol, Anne and Monbet, Val. Combining analog method and ensemble data assimilation: application to the. Machine Learning and Data Mining Approaches to Climate Science: proceedings of the 4th International Workshop on Climate Informatics , pages=. 2015 , organization=
2015
-
[30]
Journal of Advances in Modeling Earth Systems , volume=
Improved weather forecasting using neural network emulation for radiation parameterization , author=. Journal of Advances in Modeling Earth Systems , volume=. 2021 , publisher=
2021
-
[31]
Nature communications , volume=
Stable machine-learning parameterization of subgrid processes for climate modeling at a range of resolutions , author=. Nature communications , volume=. 2020 , publisher=
2020
-
[32]
Journal of Advances in Modeling Earth Systems , volume=
Using machine learning to parameterize moist convection: Potential for modeling of climate, climate change, and extreme events , author=. Journal of Advances in Modeling Earth Systems , volume=. 2018 , publisher=
2018
-
[33]
Journal of climate , volume=
An evaluation of proposed representations of subgrid hydrologic processes in climate models , author=. Journal of climate , volume=
-
[34]
Climate Dynamics , volume=
Improving a subgrid runoff parameterization scheme for climate models by the use of high resolution data derived from satellite observations , author=. Climate Dynamics , volume=. 2003 , publisher=
2003
-
[35]
Journal of Physics A: Mathematical and Theoretical , volume=
Subgrid-scale physical parameterization in atmospheric modeling: How can we make it consistent? , author=. Journal of Physics A: Mathematical and Theoretical , volume=. 2016 , publisher=
2016
-
[36]
Validation of a high-resolution regional climate model for the
Im, Eun Soon and Coppola, E and Giorgi, F and Bi, X , journal=. Validation of a high-resolution regional climate model for the. 2010 , publisher=
2010
-
[37]
Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography , volume=
Bias and data assimilation , author=. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography , volume=. 2005 , publisher=
2005
-
[38]
Journal of Advances in Modeling Earth Systems , volume=
A Machine Learning Augmented Data Assimilation Method for High-Resolution Observations , author=. Journal of Advances in Modeling Earth Systems , volume=. 2024 , publisher=
2024
-
[39]
Geoscientific Model Development , volume=
Robustness of neural network emulations of radiative transfer parameterizations in a state-of-the-art general circulation model , author=. Geoscientific Model Development , volume=. 2021 , publisher=
2021
-
[40]
Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=
Efficient attention: Attention with linear complexities , author=. Proceedings of the IEEE/CVF winter conference on applications of computer vision , pages=
-
[41]
Reformer: The Efficient Transformer
Reformer: The efficient transformer , author=. arXiv preprint arXiv:2001.04451 , year=
work page internal anchor Pith review arXiv 2001
-
[42]
Journal of Advances in Modeling Earth Systems , volume=
A Physics-Incorporated Deep Learning Framework for Parameterization of Atmospheric Radiative Transfer , author=. Journal of Advances in Modeling Earth Systems , volume=. 2023 , publisher=
2023
-
[43]
Journal of Advances in Modeling Earth Systems , volume=
Clouds, circulation, and climate sensitivity in a radiative-convective equilibrium channel model , author=. Journal of Advances in Modeling Earth Systems , volume=. 2017 , publisher=
2017
-
[44]
Journal of Atmospheric Sciences , volume=
Interaction of a cumulus cloud ensemble with the large-scale environment, Part I , author=. Journal of Atmospheric Sciences , volume=
-
[45]
Monthly weather review , volume=
A comprehensive mass flux scheme for cumulus parameterization in large-scale models , author=. Monthly weather review , volume=
-
[46]
Journal of Atmospheric Sciences , volume=
A scheme for representing cumulus convection in large-scale models , author=. Journal of Atmospheric Sciences , volume=
-
[47]
Part II: Single-column and global results , author=
A new two-moment bulk stratiform cloud microphysics scheme in the Community Atmosphere Model, version 3 (CAM3). Part II: Single-column and global results , author=. Journal of Climate , volume=
-
[48]
Journal of Geophysical Research: Atmospheres , volume=
Radiative transfer for inhomogeneous atmospheres: RRTM, a validated correlated-k model for the longwave , author=. Journal of Geophysical Research: Atmospheres , volume=. 1997 , publisher=
1997
-
[49]
Journal of Atmospheric Sciences , volume=
A parameterization for the absorption of solar radiation in the earth's atmosphere , author=. Journal of Atmospheric Sciences , volume=
-
[50]
Journal of climate , volume=
The cumulus parameterization problem: Past, present, and future , author=. Journal of climate , volume=
-
[51]
Journal of the Atmospheric Sciences , volume=
Development and evaluation of a convection scheme for use in climate models , author=. Journal of the Atmospheric Sciences , volume=
-
[52]
science , volume=
What are climate models missing? , author=. science , volume=. 2013 , publisher=
2013
-
[53]
Monthly Weather Review , volume=
A mass flux convection scheme with representation of cloud ensemble characteristics and stability-dependent closure , author=. Monthly Weather Review , volume=
-
[54]
Geophysical Research Letters , volume=
Prognostic validation of a neural network unified physics parameterization , author=. Geophysical Research Letters , volume=. 2018 , publisher=
2018
-
[55]
Journal of Advances in Modeling Earth Systems , volume=
Stable machine-learning parameterization of subgrid processes in a comprehensive atmospheric model learned from embedded convection-permitting simulations , author=. Journal of Advances in Modeling Earth Systems , volume=. 2025 , publisher=
2025
-
[56]
Monthly Weather Review , volume=
Simulated climatology of a general circulation model with a hydrologic cycle , author=. Monthly Weather Review , volume=
-
[57]
Journal of Atmospheric Sciences , volume=
Further studies of the parameterization of the influence of cumulus convection on large-scale flow , author=. Journal of Atmospheric Sciences , volume=
-
[58]
Atmosphere-ocean , volume=
Sensitivity of climate simulations to the parameterization of cumulus convection in the Canadian Climate Centre general circulation model , author=. Atmosphere-ocean , volume=. 1995 , publisher=
1995
-
[59]
Journal of Geophysical Research: Atmospheres , volume=
A simple model of convection with memory , author=. Journal of Geophysical Research: Atmospheres , volume=. 2009 , publisher=
2009
-
[60]
Journal of Advances in Modeling Earth Systems , volume=
A moist physics parameterization based on deep learning , author=. Journal of Advances in Modeling Earth Systems , volume=. 2020 , publisher=
2020
-
[61]
Dynamics of atmospheres and oceans , volume=
The mesoscale convection life cycle: Building block or prototype for large-scale tropical waves? , author=. Dynamics of atmospheres and oceans , volume=. 2006 , publisher=
2006
-
[62]
0 , author=
Machine learning parameterization of the multi-scale Kain--Fritsch (MSKF) convection scheme and stable simulation coupled in the Weather Research and Forecasting (WRF) model using WRF--ML v1. 0 , author=. Geoscientific Model Development , volume=. 2024 , publisher=
2024
-
[63]
Journal of Advances in Modeling Earth Systems , volume=
Emulation of cloud microphysics in a climate model , author=. Journal of Advances in Modeling Earth Systems , volume=. 2024 , publisher=
2024
-
[64]
Journal of Advances in Modeling Earth Systems , volume=
A decadal hybrid GCM simulation using deep-learning-based cloud and convection parameterization generalized to a warm climate , author=. Journal of Advances in Modeling Earth Systems , volume=. 2025 , publisher=
2025
-
[65]
npj Climate and Atmospheric Science , volume=
CondensNet: enabling stable long-term climate simulations via hybrid deep learning models with adaptive physical constraints , author=. npj Climate and Atmospheric Science , volume=. 2026 , publisher=
2026
-
[66]
Artificial Intelligence for the Earth Systems , volume=
Online test of a neural network deep convection parameterization in arp-gem1 , author=. Artificial Intelligence for the Earth Systems , volume=. 2025 , publisher=
2025
-
[67]
Journal of Advances in Modeling Earth Systems , volume=
An ensemble of neural networks for moist physics processes, its generalizability and stable integration , author=. Journal of Advances in Modeling Earth Systems , volume=. 2023 , publisher=
2023
-
[68]
Journal of Advances in Modeling Earth Systems , volume=
Simulating atmospheric processes in Earth system models and quantifying uncertainties with deep learning multi-member and stochastic parameterizations , author=. Journal of Advances in Modeling Earth Systems , volume=. 2025 , publisher=
2025
-
[69]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[70]
Journal of Advances in Modeling Earth Systems , volume=
Navigating the Noise: Bringing Clarity to ML Parameterization Design With O O (100) Ensembles , author=. Journal of Advances in Modeling Earth Systems , volume=. 2025 , publisher=
2025
-
[71]
Proceedings of the National Academy of Sciences , volume=
Implicit learning of convective organization explains precipitation stochasticity , author=. Proceedings of the National Academy of Sciences , volume=. 2023 , publisher=
2023
-
[72]
Neural computation , volume=
Long short-term memory , author=. Neural computation , volume=. 1997 , publisher=
1997
-
[73]
sympl (v. 0.4. 0) and climt (v. 0.15. 3)--towards a flexible framework for building model hierarchies in Python , author=. Geoscientific Model Development , volume=. 2018 , publisher=
2018
-
[74]
Abdus Salam ICTP, Trieste , volume=
Regional climate model RegCM: reference manual version 4.5 , author=. Abdus Salam ICTP, Trieste , volume=
-
[75]
Journal of Geophysical Research: Atmospheres , volume=
Radiative forcing by long-lived greenhouse gases: Calculations with the AER radiative transfer models , author=. Journal of Geophysical Research: Atmospheres , volume=. 2008 , publisher=
2008
-
[76]
Journal of Advances in Modeling Earth Systems , volume=
Idealized tropical cyclone simulations of intermediate complexity: A test case for AGCMs , author=. Journal of Advances in Modeling Earth Systems , volume=. 2012 , publisher=
2012
-
[77]
Representing Subgrid-Scale Cloud Effects in a Radiation Parameterization using Machine Learning: MLe-radiation v1. 0 , author=. arXiv preprint arXiv:2510.05963 , year=
-
[78]
Geophysical Research Letters , volume=
Physically interpretable emulation of a moist convecting atmosphere with a recurrent neural network , author=. Geophysical Research Letters , volume=. 2025 , publisher=
2025
-
[79]
Proceedings of the 25th
Optuna: A Next-generation Hyperparameter Optimization Framework , author=. Proceedings of the 25th
-
[80]
Beyond the Training Data: Confidence-Guided Mixing of Parameterizations in a Hybrid AI-Climate Model
Beyond the Training Data: Confidence-Guided Mixing of Parameterizations in a Hybrid AI-Climate Model , author=. arXiv preprint arXiv:2510.08107 , year=
work page internal anchor Pith review Pith/arXiv arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.