CNN-based Surface Temperature Forecasts with Ensemble Numerical Weather Prediction
Pith reviewed 2026-05-19 03:46 UTC · model grok-4.3
The pith
Applying a CNN to each member of a low-resolution ensemble improves both deterministic and probabilistic surface temperature forecasts to 5 km resolution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CNN-based post-processing applied separately to each of the 51 ensemble members reduces systematic errors and performs spatial downscaling from 40 km to 5 km, yielding improved deterministic accuracy and better probabilistic reliability with a spread-skill ratio that differs from the smoothing effect of ensemble averaging.
What carries the argument
Member-wise CNN post-processing that performs bias correction and spatial downscaling on individual ensemble members before recombining them into a high-resolution ensemble forecast.
If this is right
- Deterministic forecast accuracy improves through bias correction and downscaling on each member.
- Probabilistic reliability and spread-skill ratio improve in a manner distinct from the error reduction of ensemble averaging.
- Forecast information is maintained at levels comparable to other high-resolution forecasts rather than being smoothed away.
- The method supplies a practical, scalable route to better medium-range temperature predictions for centers with limited computational resources.
Where Pith is reading between the lines
- The same member-wise correction could be tested on other surface variables or vertical levels to check whether the CNN learns general error patterns.
- Periodic retraining on recent model output would be needed if the underlying NWP model physics or resolution changes.
- The approach might combine with existing high-resolution limited-area models to blend global ensemble information with local detail.
Load-bearing premise
The CNN trained on historical low-resolution forecasts and verifying analyses will generalize to future independent forecasts without significant degradation from changing model versions or unrepresented error regimes.
What would settle it
Running the trained CNN on forecasts from a new version of the underlying NWP model and finding no skill gain or outright degradation relative to the uncorrected low-resolution ensemble.
Figures
read the original abstract
Due to limited computational resources, medium-range temperature forecasts typically rely on low-resolution numerical weather prediction (NWP) models, which are prone to systematic and random errors. We propose a method that integrates a convolutional neural network (CNN) with an ensemble of low-resolution NWP models (40-km horizontal resolution) to produce high-resolution (5-km) surface temperature forecasts with lead times extending up to 5.5 days (132 h). First, CNN-based post-processing (bias correction and spatial downscaling) is applied to individual ensemble members to reduce systematic errors and perform downscaling, which improves the deterministic forecast accuracy. Second, this member-wise correction is applied to all 51 ensemble members to construct a new high-resolution ensemble forecasting system with an improved probabilistic reliability and spread-skill ratio that differs from the simple error reduction mechanism of ensemble averaging. Whereas averaging reduces forecast errors by smoothing spatial fields, our member-wise CNN correction reduces error from noise while maintaining forecast information at a level comparable to that of other high-resolution forecasts. Experimental results indicate that the proposed method provides a practical and scalable solution for improving medium-range temperature forecasts, which is particularly valuable for use in operational centers with limited computational resources.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes integrating a convolutional neural network (CNN) with a 51-member low-resolution (40 km) ensemble NWP system to generate high-resolution (5 km) surface temperature forecasts out to 132 h lead time. Member-wise CNN post-processing performs bias correction and spatial downscaling on each ensemble member; the corrected members are then used to form a new ensemble whose deterministic accuracy, probabilistic reliability, and spread-skill ratio are claimed to exceed those obtained by simple ensemble averaging or by other high-resolution forecasts. The method is presented as a computationally lightweight, scalable solution for operational centers lacking resources for high-resolution NWP.
Significance. If the reported gains in accuracy and reliability hold on truly independent future forecasts, the approach would offer a practical route to improved medium-range temperature guidance without requiring additional high-resolution model integrations. The distinction drawn between error reduction via member-wise correction versus smoothing via averaging is conceptually useful for ensemble post-processing literature.
major comments (2)
- [Abstract] Abstract: the central claims of improved deterministic accuracy, probabilistic reliability, and spread-skill ratio are asserted without any quantitative metrics, verification periods, baseline comparisons, or uncertainty estimates. Because these numbers are load-bearing for the practical-and-scalable-solution conclusion, their absence prevents assessment of effect size or statistical significance.
- [Results / Discussion] Results / Discussion: the claim that the CNN learns invariant physical relationships rather than transient model-specific biases rests on the untested assumption that training and test periods are separated by model upgrades or regime shifts. No cross-validation across different model versions, seasons, or climate states is described, directly undermining the operational generalization argument.
minor comments (2)
- [Methods] Notation for the CNN architecture (number of layers, filter sizes, activation functions) should be stated explicitly in the Methods section rather than left to supplementary material.
- [Figures] Figure captions should include the exact verification period, number of cases, and baseline models used for each panel to allow immediate comparison with the text claims.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment in turn and indicate where revisions will be made.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claims of improved deterministic accuracy, probabilistic reliability, and spread-skill ratio are asserted without any quantitative metrics, verification periods, baseline comparisons, or uncertainty estimates. Because these numbers are load-bearing for the practical-and-scalable-solution conclusion, their absence prevents assessment of effect size or statistical significance.
Authors: We agree that the abstract would be strengthened by the inclusion of key quantitative results. The body of the manuscript reports specific verification metrics (including RMSE reductions, CRPS improvements, reliability diagram scores, and spread-skill ratios) over a multi-month independent test period, with comparisons to the raw 40 km ensemble and other high-resolution references. In the revised manuscript we will add concise quantitative statements and the verification period to the abstract so that the magnitude of the reported gains is immediately apparent. revision: yes
-
Referee: [Results / Discussion] Results / Discussion: the claim that the CNN learns invariant physical relationships rather than transient model-specific biases rests on the untested assumption that training and test periods are separated by model upgrades or regime shifts. No cross-validation across different model versions, seasons, or climate states is described, directly undermining the operational generalization argument.
Authors: The training and test periods in our experiments are temporally disjoint, with the test window occurring after the training data to emulate operational use. However, we did not conduct explicit cross-validation across ECMWF model cycles or additional climate regimes. We will revise the Methods and Discussion sections to state the exact dates of the training and test periods, note any known model changes within that interval, and explicitly acknowledge the limitation on broader generalization. If space permits we will also add a brief sensitivity experiment using an alternate seasonal split. revision: partial
Circularity Check
No circularity: empirical ML post-processing on external forecast-analysis pairs
full rationale
The paper trains a CNN on historical low-resolution ensemble forecasts paired with verifying analyses to perform bias correction and downscaling, then evaluates the trained model on separate test periods. This is a standard supervised learning pipeline whose outputs on new inputs are not equivalent to the training data by construction. No equations define a target metric in terms of itself, no fitted parameters are relabeled as independent predictions, and no load-bearing claims rest on self-citations or author-specific uniqueness theorems. The central results (improved deterministic accuracy, probabilistic reliability, spread-skill ratio) are obtained by direct comparison against independent verification data and therefore remain falsifiable outside the fitted values.
Axiom & Free-Parameter Ledger
free parameters (1)
- CNN network weights
axioms (1)
- domain assumption Error characteristics of the 40-km NWP model are sufficiently stationary and spatially structured to be learned and corrected by a CNN trained on past cases.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a method that integrates a convolutional neural network (CNN) with an ensemble of low-resolution NWP models (40-km horizontal resolution) to produce high-resolution (5-km) surface temperature forecasts
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CNN-based post-processing (bias correction and spatial downscaling) is applied to individual ensemble members
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Anadranistakis, M., K. Lagouvardos, V. Kotroni, and H. Elefteriadis, 2004: Correcting temperature and humidity forecasts using Kalman filtering: potential for agricultural protection in Northern Greece. Atmos. Res., 71, 115–125, https://doi.org/10.1016/j.atmosres.2004.03.007. 28
-
[2]
Araki, K., 2019: Study on heavy snowfall associated with ‘South-Coast Cyclones’: Present state and future work. Meteor. Res. Notes, No. 241, 605–614, Japan Meteorological
work page 2019
-
[3]
Bauer, P., A. Thorpe, and G. Brunet, 2015: The quiet revolution of numerical weather prediction. Nature, 525, 47–55, https://doi.org/10.1038/nature14956
-
[5]
Cho, D., C. Yoo, B. Son, J. Im, D. Yoon, and D.-H. Cha, 2022: A novel ensemble learning for post-processing of NWP Model’s next-day maximum air temperature forecast in summer using deep learning and statistical approaches. Wea. Climate Extreme, 35, 100410, https://doi.org/10.1016/j.wace.2022.100410
-
[6]
Dosovitskiy, A., and Coauthors, 2021: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Proc. 9th Int. Conf. on Learning Representations (ICLR 2021), Virtual Only, Computational and Biological Learning Society, Paper 3458, https://doi.org/10.48550/ARXIV.2010.11929
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.11929 2021
-
[7]
Matsuzawa, 2009: Snowfall amount guidance
Furuichi, Y., and N. Matsuzawa, 2009: Snowfall amount guidance. In Textbook for Numerical Weather Prediction, No. 42, Japan Meteorological Agency, Tokyo, Japan, 27–38, https://www.jma.go.jp/jma/kishou/books/nwptext/42/chapter2.pdf. (in Japanese)
work page 2009
-
[8]
Glahn, H. R., and D. A. Lowry, 1972: The Use of Model Output Statistics (MOS) in Objective Weather Forecasting. J. Appl. Meteor. Climatol., 11, 1203–1211, https://doi.org/10.1175/1520-0450(1972)011<1203:TUOMOS>2.0.CO;2
-
[9]
Boers, 2022: Deep Learning for Improving Numerical Weather Prediction of Heavy Rainfall
Hess, P., and N. Boers, 2022: Deep Learning for Improving Numerical Weather Prediction of Heavy Rainfall. J. Adv. Model. Earth Syst., 14, e2021MS002765, https://doi.org/10.1029/2021MS002765
-
[11]
Hunt, B. R., E. J. Kostelich, and I. Szunyogh, 2007: Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Phys. D, 230, 112– 126, https://doi.org/10.1016/j.physd.2006.11.008
-
[12]
Ikuta, Y., T. Fujita, Y. Ota, and Y. Honda, 2021: Variational Data Assimilation System for Operational Regional Models at Japan Meteorological Agency. J. Meteor. Soc. Japan, 99, 1563–1592, https://doi.org/10.2151/jmsj.2021-076
-
[13]
Inoue, T., T. T. Sekiyama, and A. Kudo, 2024: Development of a Temperature Prediction Method Combining Deep Neural Networks and a Kalman Filter. J. Meteor. Soc. Japan, 102, 415–427, https://doi.org/10.2151/jmsj.2024-020. Intergovernmental Panel on Climate Change (IPCC), 2023: Climate Change 2021 – The Physical Science Basis: Working Group I Contribution t...
-
[14]
Jennings, K. S., and N. P. Molotch, 2019: The sensitivity of modeled snow accumulation and melt to precipitation phase methods across a climatic gradient. Hydrol. Earth Syst. Sci., 23, 3765–3786, https://doi.org/10.5194/hess-23-3765-2019
-
[15]
Kawabata, T., H. Seko, K. Saito, T. Kuroda, K. Tamiya, T. Tsuyuki, Y. Honda, and Y. Wakazuki, 2007: An Assimilation and Forecasting Experiment of the Nerima Heavy Rainfa11 with a Cloud-Resolving Nonhydrostatic 4-Dimensional Variational Data Assimilation System. J. Meteor. Soc. Japan, 85, 255–276, https://doi.org/10.2151/jmsj.85.255. 30 ——, T. Kuroda, H. S...
-
[16]
Krizhevsky, A., I. Sutskever, and G. E. Hinton, 2012: ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 25 (NeurIPS 2012), Lake Tahoe, NV, Neural Inf. Process. Syst. Foundation, 1097–1105, https://papers.nips.cc/paper_files/paper/2012/hash/c399862d3b9d6b76c8436e924a68c 45b-Abstract.html
work page 2012
-
[17]
Kudo, A., 2022: Statistical Post-Processing for Gridded Temperature Prediction Using Encoder–Decoder-Based Deep Convolutional Neural Networks. J. Meteor. Soc. Japan, 100, 219–232, https://doi.org/10.2151/jmsj.2022-011
-
[18]
In Textbook for Numerical Weather
Kuroki, Y., 2017: Improvement of gridded temperature guidance and changes of guidance for snowfall amount and categorized weather. In Textbook for Numerical Weather
work page 2017
-
[19]
E., 1974: Theoretical Skill of Monte Carlo Forecasts
Leith, C. E., 1974: Theoretical Skill of Monte Carlo Forecasts. Mon. Wea. Rev., 102, 409– 418, https://doi.org/10.1175/1520-0493(1974)102<0409:TSOMCF>2.0.CO;2
-
[20]
N., 1969: The predictability of a flow which possesses many scales of motion
Lorenz, E. N., 1969: The predictability of a flow which possesses many scales of motion. Tellus, 21, 289–307, https://doi.org/10.1111/j.2153-3490.1969.tb00444.x. Ministry of Land, Infrastructure, Transport and Tourism, 2022: Emergency statement concerning heavy snowfall. Tech. doc., Ministry of Land, Infrastructure, Transport and
-
[21]
Tourism, Tokyo, Japan, 3 pp, https://www.mlit.go.jp/common/001463621.pdf. (in Japanese)
-
[22]
Palmer, T. N., 2001: A nonlinear dynamical perspective on model error: A proposal for non‐ local stochastic‐dynamic parametrization in weather and climate prediction models. Quart. J. Roy. Meteor. Soc., 127, 279–304, https://doi.org/10.1002/qj.49712757202
-
[23]
In Report of Numerical Prediction Division, No
Sannohe, Y., 2018: Temperature guidance. In Report of Numerical Prediction Division, No. 64, Japan Meteorological Agency, Tokyo, Japan, 132–143, https://www.jma.go.jp/jma/kishou/books/nwpreport/64/chapter4.pdf. (in Japanese). 31
work page 2018
-
[24]
Sayeed, A., Y. Choi, J. Jung, Y. Lops, E. Eslami, and A. K. Salman, 2023: A Deep Convolutional Neural Network Model for Improving WRF Simulations. IEEE Trans. Neural Netw. Learn. Syst., 34, 750–760, https://doi.org/10.1109/TNNLS.2021.3100902
-
[25]
Sha, Y., D. J. Gagne Ii, G. West, and R. Stull, 2022: A hybrid analog-ensemble, convolutional-neural-network method for post-processing precipitation forecasts. Mon. Wea. Rev., https://doi.org/10.1175/MWR-D-21-0154.1
-
[26]
Very Deep Convolutional Networks for Large-Scale Image Recognition
Simonyan, K., and A. Zisserman, 2015: Very Deep Convolutional Networks for Large-Scale Image Recognition. Proc. 3rd Int. Conf. on Learning Representations (ICLR 2015), Computational and Biological Learning Society, San Diego, CA, Paper 1409.1556, https://doi.org/10.48550/arXiv.1409.1556
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1409.1556 2015
-
[27]
C., 2004: Evaluating Mesoscale NWP Models Using Kinetic Energy Spectra
Skamarock, W. C., 2004: Evaluating Mesoscale NWP Models Using Kinetic Energy Spectra. Mon. Wea. Rev., 132, 3019–3032, https://doi.org/10.1175/MWR2830.1
-
[28]
J., 2007: Parameterization Schemes: Keys to Understanding Numerical Weather Prediction Models
Stensrud, D. J., 2007: Parameterization Schemes: Keys to Understanding Numerical Weather Prediction Models. 1st ed. Cambridge University Press, https://doi.org/10.1017/CBO9780511812590
-
[29]
Swinbank, R., and Coauthors, 2016: The TIGGE Project and Its Achievements. Bull. Amer. Meteor. Soc., 97, 49–67, https://doi.org/10.1175/BAMS-D-13-00191.1
-
[30]
Toth, Z., and E. Kalnay, 1997: Ensemble Forecasting at NCEP and the Breeding Method. Mon. Wea. Rev., 125, 3297–3319, https://doi.org/10.1175/1520- 0493(1997)125<3297:EFANAT>2.0.CO;2
-
[31]
Wakayama, I., T. Imai, T. Kitamura, and K. Kobayashi, 2020: About estimated weather distribution. Wea. Serv. Bull., 87, 1–18, Japan Meteorological Society, Tokyo, Japan, ISSN 1342-5692, https://www.jma.go.jp/jma/kishou/books/sokkou/87/vol87p001.pdf
work page 2020
-
[32]
Wang, J., J. Chen, J. Du, Y. Zhang, Y. Xia, and G. Deng, 2018: Sensitivity of Ensemble Forecast Verification to Model Bias. Mon. Wea. Rev., 146, 781–796, https://doi.org/10.1175/MWR-D-17-0223.1
-
[33]
S., 2011: Statistical Methods in the Atmospheric Sciences
Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd Edition. Academic Press 704pp. 32
work page 2011
-
[34]
Wu, P.-Y., T. Kawabata, and L. Duc, 2025: The Importance of Perturbation Rank in Ensemble Simulations. Mon. Wea. Rev., 153, 247–261, https://doi.org/10.1175/MWR- D-24-0067.1
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.