Flow Matching for Convective-Scale Precipitation Downscaling
Pith reviewed 2026-06-28 19:05 UTC · model grok-4.3
The pith
Flow matching produces better spatial skill than diffusion models when downscaling precipitation from 8 km to 2 km over Singapore.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Flow matching achieves consistently better spatial skill than CPMGEM: higher fractions skill score at every precipitation threshold and neighbourhood scale tested, and tighter structure and amplitude components of the SAL score with comparable location skill. Flow matching underestimates the upper tail of the precipitation distribution, resulting in a dry bias in the climatological mean. These results suggest that flow matching is a competitive generative framework for convective-scale precipitation downscaling, particularly well suited to capturing spatial structure.
What carries the argument
Flow matching generative model trained to map daily 8 km precipitation fields to 2 km resolution fields.
If this is right
- Flow matching yields higher fractions skill scores than the diffusion baseline at every tested precipitation threshold and neighbourhood scale.
- Structure and amplitude components of the SAL score improve with flow matching while location skill remains comparable.
- Flow matching underestimates heavy precipitation amounts, resulting in a dry bias in mean precipitation.
- The method is particularly well suited to reproducing spatial structure in convective-scale fields.
Where Pith is reading between the lines
- Flow matching could be paired with a separate correction step for extremes to reduce the dry bias while retaining its spatial advantages.
- Repeating the comparison on domains outside Singapore would test whether the spatial-skill gains generalise beyond the chosen evaluation region.
- Flow matching may integrate more readily with ensemble methods that quantify uncertainty in downscaled precipitation.
Load-bearing premise
The Singapore-centred domain together with the chosen FSS and SAL metrics is sufficient to establish that flow matching is competitive for convective-scale downscaling in general.
What would settle it
Finding that flow matching does not yield higher fractions skill scores than the diffusion model on an independent geographic domain or with a different set of verification metrics would undermine the claim of consistent superiority in spatial skill.
Figures
read the original abstract
Generative machine learning is an increasingly important complement to dynamical downscaling for producing high-resolution precipitation projections, with diffusion models currently the leading approach. Flow matching is a related generative framework that has recently achieved strong results across image, video and other domains, and shown early promise for downscaling. We train a flow matching model to map daily precipitation from 8 km to 2 km over a convective-scale domain centred on Singapore, and benchmark it against CPMGEM, a score-based diffusion model. Flow matching achieves consistently better spatial skill: higher fractions skill score at every precipitation threshold and neighbourhood scale tested, and tighter structure and amplitude components of the SAL score with comparable location skill. However, flow matching underestimates the upper tail of the precipitation distribution, resulting in a dry bias in the climatological mean. These results suggest that flow matching is a competitive generative framework for convective-scale precipitation downscaling, particularly well suited to capturing spatial structure.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a flow matching generative model, trained to downscale daily precipitation from 8 km to 2 km over a convective-scale domain centered on Singapore, achieves consistently higher fractions skill score (FSS) at every precipitation threshold and neighbourhood scale tested, as well as tighter structure and amplitude components of the SAL score (with comparable location skill), compared to the CPMGEM score-based diffusion model. It notes a dry bias arising from underestimation of the upper tail but concludes that flow matching is competitive for convective-scale precipitation downscaling, particularly for capturing spatial structure.
Significance. If the spatial-skill advantages hold, the work would establish flow matching as a practical alternative to diffusion models for generative downscaling in atmospheric science, with the direct empirical benchmark against an external test set and a second generative framework constituting a clear strength. The result could influence choices of generative frameworks for high-resolution precipitation projections if the single-domain limitation is addressed.
major comments (2)
- [Abstract] Abstract: the claim that flow matching is 'competitive … for convective-scale precipitation downscaling' in general is load-bearing for the paper's conclusion yet rests exclusively on results from one 8 km-to-2 km domain centred on Singapore; convective regimes differ substantially in orographic forcing, diurnal cycles and tail behaviour, so the representativeness of this test bed requires either additional domains or an explicit qualification of scope.
- [Abstract] Abstract (and methods, if present): no quantitative error bars, bootstrap intervals, or statistical significance tests accompany the reported FSS and SAL improvements, and no ablation studies or training hyperparameter details are referenced, preventing assessment of whether the spatial-skill advantage is robust or sensitive to post-hoc choices.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on the scope of our claims and the need for statistical robustness measures. We respond to each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that flow matching is 'competitive … for convective-scale precipitation downscaling' in general is load-bearing for the paper's conclusion yet rests exclusively on results from one 8 km-to-2 km domain centred on Singapore; convective regimes differ substantially in orographic forcing, diurnal cycles and tail behaviour, so the representativeness of this test bed requires either additional domains or an explicit qualification of scope.
Authors: We agree that the abstract phrasing risks implying broader generality than supported by a single-domain study. We will revise the abstract and discussion to explicitly qualify the scope, stating that the competitiveness is demonstrated for the Singapore convective-scale domain and that results may not directly extend to regimes with substantially different orographic forcing, diurnal cycles or tail behaviour without further testing. revision: yes
-
Referee: [Abstract] Abstract (and methods, if present): no quantitative error bars, bootstrap intervals, or statistical significance tests accompany the reported FSS and SAL improvements, and no ablation studies or training hyperparameter details are referenced, preventing assessment of whether the spatial-skill advantage is robust or sensitive to post-hoc choices.
Authors: We acknowledge that uncertainty quantification strengthens the interpretation of the FSS and SAL results. In revision we will add bootstrap confidence intervals computed from the existing ensemble of generated fields. Hyperparameter details are provided in the methods; we will expand this section with additional configuration values and, where computationally feasible, include a limited sensitivity check in supplementary material. revision: partial
Circularity Check
No circularity: empirical benchmark rests on external test data
full rationale
The paper trains flow-matching and diffusion models on precipitation data and reports FSS/SAL scores on held-out test fields from the Singapore domain. No equations, fitted parameters, or self-citations are invoked to derive the performance claims; the reported skill differences are direct outputs of model evaluation against independent observations. The derivation chain is therefore self-contained against external benchmarks and contains none of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Precipitation fields at convective scales can be treated as samples from a probability distribution amenable to flow-based generative modeling.
- domain assumption Fractions skill score and SAL components are appropriate and sufficient metrics for judging spatial skill in downscaled precipitation.
Forward citations
Cited by 1 Pith paper
-
CORDEX-ML-Bench: A Benchmark for Data-Driven Regional Climate Downscaling -Experiment Design and Overview
CORDEX-ML-Bench benchmarks 40 ML models for climate downscaling and finds generative models outperform deterministic ones on precipitation while historically trained models underestimate future climate signals.
Reference graph
Works this paper leans on
-
[1]
Machine learning emulation of precipitation from km-scale uk regional climate simulations using a diffusion model,
Henry Addison, Elizabeth Kendon, Suman Ravuri, Laurence Aitchison, and Peter AG Watson. Machine learning emulation of precipitation from km-scale uk regional climate simulations using a diffusion model,
- [2]
-
[3]
Stochastic Interpolants: A Unifying Framework for Flows and Diffusions
Michael S. Albergo, Nicholas M. Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions, 2025. URLhttps://arxiv.org/abs/2303.08797
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Downscaling multi-model climate projection ensembles with deep learning (deepesd): contribution to cordex eur-44.Geoscientific Model Development Discussions, 2022:1–14, 2022
Jorge Ba˜ no-Medina, Rodrigo Manzanas, Ezequiel Cimadevilla, Jes´ us Fern´ andez, Jose Gonz´ alez-Abad, Antonio Santiago Cofi˜ no, and Jos´ e Manuel Guti´ errez. Downscaling multi-model climate projection ensembles with deep learning (deepesd): contribution to cordex eur-44.Geoscientific Model Development Discussions, 2022:1–14, 2022
2022
-
[5]
Singapore’s third national climate change study: Science report
Centre for Climate Research Singapore. Singapore’s third national climate change study: Science report. Technical report, Meteorological Service Singapore, 2024. URLhttps://www.mss-int.sg/ v3-climate-projections/resources/v3-reports. Available athttps://www.mss-int.sg/docs/ default-source/v3_reports/v3_science_report/v3-science-report-full.pdf
2024
-
[6]
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis, 2021. URLhttps: //arxiv.org/abs/2105.05233
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[7]
Singv: A convective-scale weather forecast model for singapore.Quarterly Journal of the Royal Meteorological Society, 146(733): 4131–4146, 2020
Anurag Dipankar, Stuart Webster, Xiangming Sun, Claudio Sanchez, Rachel North, Kalli Furtado, Jonathan Wilkinson, Adrian Lock, Simon Vosper, Xiang-Yu Huang, et al. Singv: A convective-scale weather forecast model for singapore.Quarterly Journal of the Royal Meteorological Society, 146(733): 4131–4146, 2020. 7
2020
-
[8]
Regional climate model emulator based on deep learning: Concept and first evaluation of a novel hybrid downscaling approach.Climate Dynamics, 60(5):1751–1779, 2023
Antoine Doury, Samuel Somot, Sebastien Gadat, Aur´ elien Ribes, and Lola Corre. Regional climate model emulator based on deep learning: Concept and first evaluation of a novel hybrid downscaling approach.Climate Dynamics, 60(5):1751–1779, 2023
2023
-
[9]
On the suitability of a convolutional neural network based rcm-emulator for fine spatio-temporal precipitation.Climate Dynamics, 62(9):8587– 8613, 2024
Antoine Doury, Samuel Somot, and Sebastien Gadat. On the suitability of a convolutional neural network based rcm-emulator for fine spatio-temporal precipitation.Climate Dynamics, 62(9):8587– 8613, 2024
2024
-
[10]
Stochastic flow matching for resolving small-scale physics, 2024
Stathi Fotiadis, Noah Brenowitz, Tomas Geffner, Yair Cohen, Michael Pritchard, Arash Vahdat, and Morteza Mardani. Stochastic flow matching for resolving small-scale physics, 2024. URLhttps:// arxiv.org/abs/2410.19814
-
[11]
William J Gutowski Jr, Paul Aaron Ullrich, Alex Hall, L Ruby Leung, Travis Allen O’Brien, CM Patricola-DiRosario, Raymond W Arritt, Melissa S Bukovsky, Katherine V Calvin, Zhe Feng, et al. The ongoing need for high-resolution regional climate models: Process understanding and stakeholder information.Bulletin of the American Meteorological Society, 101(5):...
2020
-
[12]
Realism of rainfall in a very high-resolution regional climate model.Journal of Climate, 25(17):5791–5806, 2012
Elizabeth J Kendon, Nigel M Roberts, Catherine A Senior, and Malcolm J Roberts. Realism of rainfall in a very high-resolution regional climate model.Journal of Climate, 25(17):5791–5806, 2012
2012
-
[13]
Potential for machine learn- ing emulators to augment regional climate simulations in provision of local climate change information
Elizabeth J Kendon, Henry Addison, Antoine Doury, Samuel Somot, Peter AG Watson, Ben BB Booth, Erika Coppola, Jos´ e Manuel Guti´ errez, James Murphy, and Calum Scullion. Potential for machine learn- ing emulators to augment regional climate simulations in provision of local climate change information. Bulletin of the American Meteorological Society, 106(...
2025
-
[14]
Ebert, Harrison Cook, Mohammadreza Kha- narmuei, Robert J
Tennessee Leeuwenburg, Nicholas Loveday, Elizabeth E. Ebert, Harrison Cook, Mohammadreza Kha- narmuei, Robert J. Taggart, Nikeeth Ramanathan, Maree Carroll, Stephanie Chong, Aidan Griffiths, and John Sharples. scores: A Python package for verifying and evaluating models and predictions with xarray.Journal of Open Source Software, 9(99):6889, July 2024. do...
-
[15]
Taggart, Durga Shrestha, Mohammadreza Khanarmuei, Harrison Cook, Liam Bluett, Elizabeth E
Tennessee Leeuwenburg, Nicholas Loveday, Nikeeth Ramanathan, Stephanie Chong, Robert J. Taggart, Durga Shrestha, Mohammadreza Khanarmuei, Harrison Cook, Liam Bluett, Elizabeth E. Ebert, Maree Carroll, Belinda Trotta, John Sharples, Sam Bishop, Dougal T. Squire, Aidan Griffiths, Thomas C. Pagano, A.J. Fisher, Taylor Mandelbaum, Fu Jinghan, Paul R. Smith, E...
2026
-
[16]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling, 2023. URLhttps://arxiv.org/abs/2210.02747
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[17]
Precip- itation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user.Reviews of geophysics, 48(3), 2010
Douglas Maraun, Frederick Wetterhall, Anderson M Ireson, Richard E Chandler, Elizabeth J Kendon, Martin Widmann, Stephan Brienen, Henning W Rust, Tobias Sauter, Matthias Themeßl, et al. Precip- itation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user.Reviews of geophysics, 48(3), 2010
2010
-
[18]
Residual corrective diffusion modeling for km-scale atmospheric downscal- ing, 2024
Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Mohammad Amin Nabian, Tao Ge, Akshay Subramaniam, Karthik Kashinath, Jan Kautz, and Mike Pritchard. Residual corrective diffusion modeling for km-scale atmospheric downscal- ing, 2024. URLhttps://arxiv.org/abs/2309.15214
-
[19]
S. Pulkkinen, D. Nerini, A. A. P´ erez Hortal, C. Velasco-Forero, A. Seed, U. Germann, and L. Foresti. Pysteps: an open-source python library for probabilistic precipitation nowcasting (v1.0).Geoscientific Model Development, 12(10):4185–4219, 2019. doi: 10.5194/gmd-12-4185-2019. URLhttps://gmd. copernicus.org/articles/12/4185/2019/. 8
-
[20]
Nigel M. Roberts and Humphrey W. Lean. Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events.Monthly Weather Review, 136(1):78 – 97, 2008. doi: 10.1175/2007MWR2123.1. URLhttps://journals.ametsoc.org/view/journals/mwre/136/1/ 2007mwr2123.1.xml
-
[21]
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations, 2021. URLhttps: //arxiv.org/abs/2011.13456
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[22]
Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T. Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimensional domains, 2020. URLhttps://arxiv.org/abs/2006.10739
-
[23]
Deep learning regional climate model emulators: A comparison of two downscaling training frameworks.Journal of Advances in Modeling Earth Systems, 15(6):e2022MS003593, 2023
Marijn Van Der Meer, Sophie de Roda Husman, and Stef Lhermitte. Deep learning regional climate model emulators: A comparison of two downscaling training frameworks.Journal of Advances in Modeling Earth Systems, 15(6):e2022MS003593, 2023
2023
-
[24]
arXiv preprint arXiv:2512.13987 , year=
Bryn Ward-Leikis, Neelesh Rampal, Yun Sing Koh, Peter B. Gibson, Hong-Yang Liu, Vassili Kitsios, Tristan Meyers, Jeff Adie, Yang Juntao, and Steven C. Sherwood. An intercomparison of generative machine learning methods for downscaling precipitation at fine spatial scales, 2025. URLhttps:// arxiv.org/abs/2512.13987
-
[25]
Sal—a novel quality measure for the verification of quantitative precipitation forecasts.Monthly Weather Review, 136(11):4470 – 4487,
Heini Wernli, Marcus Paulat, Martin Hagen, and Christoph Frei. Sal—a novel quality measure for the verification of quantitative precipitation forecasts.Monthly Weather Review, 136(11):4470 – 4487,
-
[26]
URLhttps://journals.ametsoc.org/view/journals/mwre/ 136/11/2008mwr2415.1.xml
doi: 10.1175/2008MWR2415.1. URLhttps://journals.ametsoc.org/view/journals/mwre/ 136/11/2008mwr2415.1.xml. 9
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.