pith. sign in

arxiv: 2606.00281 · v1 · pith:YYC2JP62new · submitted 2026-05-29 · ⚛️ physics.ao-ph · cs.LG

Flow Matching for Convective-Scale Precipitation Downscaling

Pith reviewed 2026-06-28 19:05 UTC · model grok-4.3

classification ⚛️ physics.ao-ph cs.LG
keywords flow matchingprecipitation downscalingconvective scalegenerative modelsdiffusion modelsfractions skill scoreSAL scoreSingapore
0
0 comments X

The pith

Flow matching produces better spatial skill than diffusion models when downscaling precipitation from 8 km to 2 km over Singapore.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper trains a flow matching model to generate 2 km precipitation fields from daily 8 km inputs over a convective-scale domain centred on Singapore and benchmarks it against the score-based diffusion model CPMGEM. Flow matching records higher fractions skill scores at every precipitation threshold and neighbourhood scale examined, along with improved structure and amplitude in the SAL score while location skill stays comparable. It nevertheless underestimates the upper tail of the distribution, producing a dry bias in the climatological mean. A sympathetic reader would care because generative downscaling offers a lower-cost route to high-resolution precipitation fields than full dynamical models, and spatial accuracy matters for local flood and water-resource applications.

Core claim

Flow matching achieves consistently better spatial skill than CPMGEM: higher fractions skill score at every precipitation threshold and neighbourhood scale tested, and tighter structure and amplitude components of the SAL score with comparable location skill. Flow matching underestimates the upper tail of the precipitation distribution, resulting in a dry bias in the climatological mean. These results suggest that flow matching is a competitive generative framework for convective-scale precipitation downscaling, particularly well suited to capturing spatial structure.

What carries the argument

Flow matching generative model trained to map daily 8 km precipitation fields to 2 km resolution fields.

If this is right

  • Flow matching yields higher fractions skill scores than the diffusion baseline at every tested precipitation threshold and neighbourhood scale.
  • Structure and amplitude components of the SAL score improve with flow matching while location skill remains comparable.
  • Flow matching underestimates heavy precipitation amounts, resulting in a dry bias in mean precipitation.
  • The method is particularly well suited to reproducing spatial structure in convective-scale fields.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Flow matching could be paired with a separate correction step for extremes to reduce the dry bias while retaining its spatial advantages.
  • Repeating the comparison on domains outside Singapore would test whether the spatial-skill gains generalise beyond the chosen evaluation region.
  • Flow matching may integrate more readily with ensemble methods that quantify uncertainty in downscaled precipitation.

Load-bearing premise

The Singapore-centred domain together with the chosen FSS and SAL metrics is sufficient to establish that flow matching is competitive for convective-scale downscaling in general.

What would settle it

Finding that flow matching does not yield higher fractions skill scores than the diffusion model on an independent geographic domain or with a different set of verification metrics would undermine the claim of consistent superiority in spatial skill.

Figures

Figures reproduced from arXiv: 2606.00281 by Tom Wetherell.

Figure 1
Figure 1. Figure 1: Fractions skill score as a function of neighbourhood scale for flow matching (blue) and CPMGEM [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distributions of SAL components — structure ( [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Daily-mean precipitation climatology over the test period (2050–2059). Top row: time-mean [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Probability density function of daily precipitation (mm/day) for the target (orange), flow matching [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Generative machine learning is an increasingly important complement to dynamical downscaling for producing high-resolution precipitation projections, with diffusion models currently the leading approach. Flow matching is a related generative framework that has recently achieved strong results across image, video and other domains, and shown early promise for downscaling. We train a flow matching model to map daily precipitation from 8 km to 2 km over a convective-scale domain centred on Singapore, and benchmark it against CPMGEM, a score-based diffusion model. Flow matching achieves consistently better spatial skill: higher fractions skill score at every precipitation threshold and neighbourhood scale tested, and tighter structure and amplitude components of the SAL score with comparable location skill. However, flow matching underestimates the upper tail of the precipitation distribution, resulting in a dry bias in the climatological mean. These results suggest that flow matching is a competitive generative framework for convective-scale precipitation downscaling, particularly well suited to capturing spatial structure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims that a flow matching generative model, trained to downscale daily precipitation from 8 km to 2 km over a convective-scale domain centered on Singapore, achieves consistently higher fractions skill score (FSS) at every precipitation threshold and neighbourhood scale tested, as well as tighter structure and amplitude components of the SAL score (with comparable location skill), compared to the CPMGEM score-based diffusion model. It notes a dry bias arising from underestimation of the upper tail but concludes that flow matching is competitive for convective-scale precipitation downscaling, particularly for capturing spatial structure.

Significance. If the spatial-skill advantages hold, the work would establish flow matching as a practical alternative to diffusion models for generative downscaling in atmospheric science, with the direct empirical benchmark against an external test set and a second generative framework constituting a clear strength. The result could influence choices of generative frameworks for high-resolution precipitation projections if the single-domain limitation is addressed.

major comments (2)
  1. [Abstract] Abstract: the claim that flow matching is 'competitive … for convective-scale precipitation downscaling' in general is load-bearing for the paper's conclusion yet rests exclusively on results from one 8 km-to-2 km domain centred on Singapore; convective regimes differ substantially in orographic forcing, diurnal cycles and tail behaviour, so the representativeness of this test bed requires either additional domains or an explicit qualification of scope.
  2. [Abstract] Abstract (and methods, if present): no quantitative error bars, bootstrap intervals, or statistical significance tests accompany the reported FSS and SAL improvements, and no ablation studies or training hyperparameter details are referenced, preventing assessment of whether the spatial-skill advantage is robust or sensitive to post-hoc choices.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on the scope of our claims and the need for statistical robustness measures. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that flow matching is 'competitive … for convective-scale precipitation downscaling' in general is load-bearing for the paper's conclusion yet rests exclusively on results from one 8 km-to-2 km domain centred on Singapore; convective regimes differ substantially in orographic forcing, diurnal cycles and tail behaviour, so the representativeness of this test bed requires either additional domains or an explicit qualification of scope.

    Authors: We agree that the abstract phrasing risks implying broader generality than supported by a single-domain study. We will revise the abstract and discussion to explicitly qualify the scope, stating that the competitiveness is demonstrated for the Singapore convective-scale domain and that results may not directly extend to regimes with substantially different orographic forcing, diurnal cycles or tail behaviour without further testing. revision: yes

  2. Referee: [Abstract] Abstract (and methods, if present): no quantitative error bars, bootstrap intervals, or statistical significance tests accompany the reported FSS and SAL improvements, and no ablation studies or training hyperparameter details are referenced, preventing assessment of whether the spatial-skill advantage is robust or sensitive to post-hoc choices.

    Authors: We acknowledge that uncertainty quantification strengthens the interpretation of the FSS and SAL results. In revision we will add bootstrap confidence intervals computed from the existing ensemble of generated fields. Hyperparameter details are provided in the methods; we will expand this section with additional configuration values and, where computationally feasible, include a limited sensitivity check in supplementary material. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical benchmark rests on external test data

full rationale

The paper trains flow-matching and diffusion models on precipitation data and reports FSS/SAL scores on held-out test fields from the Singapore domain. No equations, fitted parameters, or self-citations are invoked to derive the performance claims; the reported skill differences are direct outputs of model evaluation against independent observations. The derivation chain is therefore self-contained against external benchmarks and contains none of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions of generative modeling (precipitation fields can be sampled from a learned transport map) and on the validity of the chosen verification metrics for convective precipitation; no new entities are introduced and no free parameters are reported in the abstract.

axioms (2)
  • domain assumption Precipitation fields at convective scales can be treated as samples from a probability distribution amenable to flow-based generative modeling.
    Invoked implicitly when training the flow matching model to map coarse to fine precipitation.
  • domain assumption Fractions skill score and SAL components are appropriate and sufficient metrics for judging spatial skill in downscaled precipitation.
    Used to declare consistent superiority without further justification in the abstract.

pith-pipeline@v0.9.1-grok · 5678 in / 1340 out tokens · 22059 ms · 2026-06-28T19:05:38.462928+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CORDEX-ML-Bench: A Benchmark for Data-Driven Regional Climate Downscaling -Experiment Design and Overview

    physics.ao-ph 2026-06 unverdicted novelty 7.0

    CORDEX-ML-Bench benchmarks 40 ML models for climate downscaling and finds generative models outperform deterministic ones on precipitation while historically trained models underestimate future climate signals.

Reference graph

Works this paper leans on

26 extracted references · 13 canonical work pages · cited by 1 Pith paper · 4 internal anchors

  1. [1]

    Machine learning emulation of precipitation from km-scale uk regional climate simulations using a diffusion model,

    Henry Addison, Elizabeth Kendon, Suman Ravuri, Laurence Aitchison, and Peter AG Watson. Machine learning emulation of precipitation from km-scale uk regional climate simulations using a diffusion model,

  2. [2]

    URLhttps://arxiv.org/abs/2407.14158

  3. [3]

    Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

    Michael S. Albergo, Nicholas M. Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions, 2025. URLhttps://arxiv.org/abs/2303.08797

  4. [4]

    Downscaling multi-model climate projection ensembles with deep learning (deepesd): contribution to cordex eur-44.Geoscientific Model Development Discussions, 2022:1–14, 2022

    Jorge Ba˜ no-Medina, Rodrigo Manzanas, Ezequiel Cimadevilla, Jes´ us Fern´ andez, Jose Gonz´ alez-Abad, Antonio Santiago Cofi˜ no, and Jos´ e Manuel Guti´ errez. Downscaling multi-model climate projection ensembles with deep learning (deepesd): contribution to cordex eur-44.Geoscientific Model Development Discussions, 2022:1–14, 2022

  5. [5]

    Singapore’s third national climate change study: Science report

    Centre for Climate Research Singapore. Singapore’s third national climate change study: Science report. Technical report, Meteorological Service Singapore, 2024. URLhttps://www.mss-int.sg/ v3-climate-projections/resources/v3-reports. Available athttps://www.mss-int.sg/docs/ default-source/v3_reports/v3_science_report/v3-science-report-full.pdf

  6. [6]

    Diffusion Models Beat GANs on Image Synthesis

    Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis, 2021. URLhttps: //arxiv.org/abs/2105.05233

  7. [7]

    Singv: A convective-scale weather forecast model for singapore.Quarterly Journal of the Royal Meteorological Society, 146(733): 4131–4146, 2020

    Anurag Dipankar, Stuart Webster, Xiangming Sun, Claudio Sanchez, Rachel North, Kalli Furtado, Jonathan Wilkinson, Adrian Lock, Simon Vosper, Xiang-Yu Huang, et al. Singv: A convective-scale weather forecast model for singapore.Quarterly Journal of the Royal Meteorological Society, 146(733): 4131–4146, 2020. 7

  8. [8]

    Regional climate model emulator based on deep learning: Concept and first evaluation of a novel hybrid downscaling approach.Climate Dynamics, 60(5):1751–1779, 2023

    Antoine Doury, Samuel Somot, Sebastien Gadat, Aur´ elien Ribes, and Lola Corre. Regional climate model emulator based on deep learning: Concept and first evaluation of a novel hybrid downscaling approach.Climate Dynamics, 60(5):1751–1779, 2023

  9. [9]

    On the suitability of a convolutional neural network based rcm-emulator for fine spatio-temporal precipitation.Climate Dynamics, 62(9):8587– 8613, 2024

    Antoine Doury, Samuel Somot, and Sebastien Gadat. On the suitability of a convolutional neural network based rcm-emulator for fine spatio-temporal precipitation.Climate Dynamics, 62(9):8587– 8613, 2024

  10. [10]

    Stochastic flow matching for resolving small-scale physics, 2024

    Stathi Fotiadis, Noah Brenowitz, Tomas Geffner, Yair Cohen, Michael Pritchard, Arash Vahdat, and Morteza Mardani. Stochastic flow matching for resolving small-scale physics, 2024. URLhttps:// arxiv.org/abs/2410.19814

  11. [11]

    William J Gutowski Jr, Paul Aaron Ullrich, Alex Hall, L Ruby Leung, Travis Allen O’Brien, CM Patricola-DiRosario, Raymond W Arritt, Melissa S Bukovsky, Katherine V Calvin, Zhe Feng, et al. The ongoing need for high-resolution regional climate models: Process understanding and stakeholder information.Bulletin of the American Meteorological Society, 101(5):...

  12. [12]

    Realism of rainfall in a very high-resolution regional climate model.Journal of Climate, 25(17):5791–5806, 2012

    Elizabeth J Kendon, Nigel M Roberts, Catherine A Senior, and Malcolm J Roberts. Realism of rainfall in a very high-resolution regional climate model.Journal of Climate, 25(17):5791–5806, 2012

  13. [13]

    Potential for machine learn- ing emulators to augment regional climate simulations in provision of local climate change information

    Elizabeth J Kendon, Henry Addison, Antoine Doury, Samuel Somot, Peter AG Watson, Ben BB Booth, Erika Coppola, Jos´ e Manuel Guti´ errez, James Murphy, and Calum Scullion. Potential for machine learn- ing emulators to augment regional climate simulations in provision of local climate change information. Bulletin of the American Meteorological Society, 106(...

  14. [14]

    Ebert, Harrison Cook, Mohammadreza Kha- narmuei, Robert J

    Tennessee Leeuwenburg, Nicholas Loveday, Elizabeth E. Ebert, Harrison Cook, Mohammadreza Kha- narmuei, Robert J. Taggart, Nikeeth Ramanathan, Maree Carroll, Stephanie Chong, Aidan Griffiths, and John Sharples. scores: A Python package for verifying and evaluating models and predictions with xarray.Journal of Open Source Software, 9(99):6889, July 2024. do...

  15. [15]

    Taggart, Durga Shrestha, Mohammadreza Khanarmuei, Harrison Cook, Liam Bluett, Elizabeth E

    Tennessee Leeuwenburg, Nicholas Loveday, Nikeeth Ramanathan, Stephanie Chong, Robert J. Taggart, Durga Shrestha, Mohammadreza Khanarmuei, Harrison Cook, Liam Bluett, Elizabeth E. Ebert, Maree Carroll, Belinda Trotta, John Sharples, Sam Bishop, Dougal T. Squire, Aidan Griffiths, Thomas C. Pagano, A.J. Fisher, Taylor Mandelbaum, Fu Jinghan, Paul R. Smith, E...

  16. [16]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling, 2023. URLhttps://arxiv.org/abs/2210.02747

  17. [17]

    Precip- itation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user.Reviews of geophysics, 48(3), 2010

    Douglas Maraun, Frederick Wetterhall, Anderson M Ireson, Richard E Chandler, Elizabeth J Kendon, Martin Widmann, Stephan Brienen, Henning W Rust, Tobias Sauter, Matthias Themeßl, et al. Precip- itation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user.Reviews of geophysics, 48(3), 2010

  18. [18]

    Residual corrective diffusion modeling for km-scale atmospheric downscal- ing, 2024

    Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Mohammad Amin Nabian, Tao Ge, Akshay Subramaniam, Karthik Kashinath, Jan Kautz, and Mike Pritchard. Residual corrective diffusion modeling for km-scale atmospheric downscal- ing, 2024. URLhttps://arxiv.org/abs/2309.15214

  19. [19]

    Pulkkinen, D

    S. Pulkkinen, D. Nerini, A. A. P´ erez Hortal, C. Velasco-Forero, A. Seed, U. Germann, and L. Foresti. Pysteps: an open-source python library for probabilistic precipitation nowcasting (v1.0).Geoscientific Model Development, 12(10):4185–4219, 2019. doi: 10.5194/gmd-12-4185-2019. URLhttps://gmd. copernicus.org/articles/12/4185/2019/. 8

  20. [20]

    Roberts and Humphrey W

    Nigel M. Roberts and Humphrey W. Lean. Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events.Monthly Weather Review, 136(1):78 – 97, 2008. doi: 10.1175/2007MWR2123.1. URLhttps://journals.ametsoc.org/view/journals/mwre/136/1/ 2007mwr2123.1.xml

  21. [21]

    Score-Based Generative Modeling through Stochastic Differential Equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations, 2021. URLhttps: //arxiv.org/abs/2011.13456

  22. [22]

    and Mildenhall, Ben and Fridovich-Keil, Sara and Raghavan, Nithin and Singhal, Utkarsh and Ramamoorthi, Ravi and Barron, Jonathan T

    Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan T. Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimensional domains, 2020. URLhttps://arxiv.org/abs/2006.10739

  23. [23]

    Deep learning regional climate model emulators: A comparison of two downscaling training frameworks.Journal of Advances in Modeling Earth Systems, 15(6):e2022MS003593, 2023

    Marijn Van Der Meer, Sophie de Roda Husman, and Stef Lhermitte. Deep learning regional climate model emulators: A comparison of two downscaling training frameworks.Journal of Advances in Modeling Earth Systems, 15(6):e2022MS003593, 2023

  24. [24]

    arXiv preprint arXiv:2512.13987 , year=

    Bryn Ward-Leikis, Neelesh Rampal, Yun Sing Koh, Peter B. Gibson, Hong-Yang Liu, Vassili Kitsios, Tristan Meyers, Jeff Adie, Yang Juntao, and Steven C. Sherwood. An intercomparison of generative machine learning methods for downscaling precipitation at fine spatial scales, 2025. URLhttps:// arxiv.org/abs/2512.13987

  25. [25]

    Sal—a novel quality measure for the verification of quantitative precipitation forecasts.Monthly Weather Review, 136(11):4470 – 4487,

    Heini Wernli, Marcus Paulat, Martin Hagen, and Christoph Frei. Sal—a novel quality measure for the verification of quantitative precipitation forecasts.Monthly Weather Review, 136(11):4470 – 4487,

  26. [26]

    URLhttps://journals.ametsoc.org/view/journals/mwre/ 136/11/2008mwr2415.1.xml

    doi: 10.1175/2008MWR2415.1. URLhttps://journals.ametsoc.org/view/journals/mwre/ 136/11/2008mwr2415.1.xml. 9