arxiv: 2605.06916 · v1 · submitted 2026-05-07 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Tyche: One Step Flow for Efficient Probabilistic Weather Forecasting

Fan Xu , Yuan Gao , Kun Wang , Rui Su , Fenghua Ling , Hao Wu , Wanli Ouyang

Authors on Pith no claims yet

Pith reviewed 2026-05-11 00:59 UTC · model grok-4.3

classification 💻 cs.LG

keywords probabilistic weather forecastingconditional flow modelsone-step generationensemble forecastingJVP regularizationSwin transformerCRPS calibrationautoregressive rollout

0 comments

The pith

A one-step conditional flow model generates probabilistic weather forecasts matching multi-step methods and operational ensembles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Probabilistic weather forecasting requires both accurate trajectories and well-calibrated distributions over possible futures, yet diffusion-based ensemble systems incur high inference costs that grow with forecast horizon, ensemble size, and denoising steps. Tyche tackles this by representing the conditional forecast distribution as a destination-aware average-velocity flow that maps Gaussian noise straight to future atmospheric states using only one function evaluation. The flow is made trainable on high-dimensional geophysical grids through a JVP-regularized rectification objective that enforces temporal self-consistency without explicit Jacobian computation, and it is parameterized by an isotropic Swin-style transformer that keeps fine-scale spatial detail. A subsequent rollout finetuning stage with curriculum CRPS supervision further stabilizes autoregressive ensemble generation. On ERA5 data at 1.5 degree and 6-hour resolution, single-NFE Tyche ensembles match or exceed the skill and calibration of leading multi-step generative baselines as well as the operational ECMWF IFS ensemble.

Core claim

Tyche models the conditional forecast distribution with a destination-aware average-velocity flow that maps Gaussian noise directly to future weather states in a single function evaluation (1-NFE). A JVP-regularized rectification objective enforces temporal self-consistency across source and destination flow timesteps without explicitly forming Jacobians. The transport field is parameterized by an isotropic Swin-style transformer, and a rollout-based finetuning stage with curriculum CRPS calibration supervision improves reliability for autoregressive forecasting.

What carries the argument

destination-aware average-velocity flow trained with a JVP-regularized rectification objective

Load-bearing premise

The JVP-regularized rectification objective can successfully train a destination-aware one-step transport field to capture the full conditional distribution over high-dimensional atmospheric states without explicit Jacobian formation or loss of calibration.

What would settle it

If, on an independent ERA5 test period or similar global reanalysis dataset, the CRPS scores, ensemble spread-skill ratio, or reliability diagrams of Tyche's single-NFE ensembles fall materially below those of the compared multi-step generative models or the ECMWF IFS ensemble, the claim of matching or exceeding performance would be falsified.

read the original abstract

Probabilistic weather forecasting requires not only accurate trajectories, but calibrated distributions over plausible atmospheric futures. Recent data-driven systems have achieved remarkable deterministic skill, and diffusion-based ensemble forecasters have substantially improved sample realism and uncertainty quantification. However, their inference cost scales with forecast horizon, ensemble size, and the number of denoising steps required for each transition, making large operational ensembles expensive. To address this, we present Tyche, a one-step conditional flow model for efficient probabilistic weather forecasting. Tyche models the conditional forecast distribution with a destination-aware average-velocity flow that maps Gaussian noise directly to future weather states in a single function evaluation (1-NFE). To make this one-step transport learnable in high-dimensional geophysical fields, we derive a JVP-regularized rectification objective that enforces temporal self-consistency across source and destination flow timesteps without explicitly forming Jacobians. The transport field is parameterized by an isotropic Swin-style transformer that preserves fine-scale spatial structure while remaining scalable on global grids. To improve ensemble reliability under autoregressive forecasting, we further introduce a rollout-based finetuning stage with curriculum CRPS calibration supervision. Experiments on ERA5 at 1.5$^\circ$ and 6-hour resolution show that our Tyche, using merely a single NFE, matches or exceeds the forecast skill and calibration of state-of-the-art multi-step generative baselines and the operational ECMWF IFS ensemble.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Tyche's one-step flow with JVP-regularized rectification and Swin transformer offers a practical efficiency gain for probabilistic weather ensembles if the ERA5 results hold up.

read the letter

Tyche's core idea is a destination-aware average-velocity flow that maps noise to future weather states in one function evaluation, trained with a JVP-regularized rectification objective to enforce self-consistency without forming Jacobians. The model uses an isotropic Swin-style transformer and adds a rollout-based CRPS finetuning stage for autoregressive forecasts. This combination is not in the prior work cited in the abstract, and it directly targets the high inference cost of multi-step diffusion ensembles for operational-scale probabilistic forecasting on global grids. The ERA5 experiments at 1.5° and 6-hour resolution are the main evidence offered, with the claim that single-NFE performance matches or exceeds multi-step baselines and the ECMWF IFS ensemble in both skill and calibration. If those numbers are backed by proper ablations, error bars, and comparisons across variables, the efficiency win is real and worth attention from groups running large ensembles. The soft spot is the reliance on the JVP term alone to keep the transport measure-preserving. In high-dimensional, multimodal, chaotic atmospheric fields, temporal self-consistency does not automatically guarantee that local volumes or mode coverage are preserved, so ensembles could still be miscalibrated even when CRPS looks competitive. The abstract gives no quantitative scores, so the full results section has to carry that burden. The paper engages the relevant literature on generative weather models without obvious circularity or invented entities. It is aimed at researchers working on ML for geoscience and operational forecasting who care about inference cost. A reader looking for new flow architectures or cheap ensemble methods would find the objective and parameterization details useful. I would bring it to a reading group to walk through the rectification loss and check the calibration plots. It deserves peer review because the efficiency claim is important enough to test carefully, even if revisions are needed on the validation.

Referee Report

3 major / 3 minor

Summary. The manuscript presents Tyche, a one-step conditional flow model for probabilistic weather forecasting. It parameterizes a destination-aware average-velocity transport field that maps Gaussian noise to future atmospheric states in a single function evaluation (1-NFE). A JVP-regularized rectification objective is introduced to enforce temporal self-consistency across flow timesteps without explicit Jacobian computation. The architecture uses an isotropic Swin-style transformer, and a rollout-based finetuning stage with curriculum CRPS supervision is added for autoregressive reliability. On ERA5 at 1.5°/6-hour resolution, the paper claims that this single-NFE model matches or exceeds the forecast skill and calibration of multi-step generative baselines and the operational ECMWF IFS ensemble.

Significance. If the empirical results hold under rigorous verification, the work would be significant for operational probabilistic forecasting by reducing inference cost by an order of magnitude relative to diffusion or multi-step flow ensembles, enabling larger ensembles or higher-resolution runs within fixed compute budgets. The JVP-regularized rectification objective and the curriculum finetuning procedure represent genuine technical contributions that could generalize beyond weather. The use of a scalable transformer backbone that preserves fine-scale structure is also a positive design choice.

major comments (3)

[§3.2, Eq. (7)] §3.2, Eq. (7): the JVP-regularized rectification loss is presented as sufficient to train a measure-preserving one-step map onto multimodal conditional distributions p(x_future | x_now), yet the derivation only enforces temporal self-consistency between source and destination timesteps; no explicit density-matching or volume-preservation term is shown, leaving open the possibility that the learned field collapses modes or distorts local volumes in high-dimensional chaotic fields. A proof sketch or diagnostic (e.g., Jacobian determinant statistics on held-out samples) is needed to support the central claim.
[§4.3, Table 2] §4.3, Table 2: the headline claim that single-NFE Tyche matches or exceeds ECMWF IFS ensemble skill and calibration is load-bearing, but the reported CRPS, spread-skill ratio, and rank histograms lack error bars across multiple random seeds and do not include an ablation that isolates the JVP term from the rollout finetuning; without these, it is impossible to attribute calibration to the one-step transport itself rather than post-hoc supervision.
[§3.1] §3.1: the destination-aware average-velocity field is asserted to capture the full conditional distribution in one evaluation, but the training procedure relies on the rectification objective alone; if this objective reduces to a fitted consistency regularizer (as the skeptic note suggests), the method may not generalize to out-of-distribution initial conditions where multimodality is pronounced.

minor comments (3)

The abstract would benefit from one or two key quantitative numbers (e.g., CRPS improvement or spread-skill ratio) to ground the performance claim.
[Figure 3] Figure 3: the ensemble member visualizations would be clearer if the color scale and contour intervals were stated explicitly in the caption.
[§2] §2: the related-work discussion of diffusion-based weather models could cite the exact number of function evaluations used by the strongest baselines for fair comparison.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback, which has helped us strengthen the manuscript. We appreciate the positive assessment of the work's potential significance. We address each major comment below and describe the revisions we will incorporate.

read point-by-point responses

Referee: [§3.2, Eq. (7)] §3.2, Eq. (7): the JVP-regularized rectification loss is presented as sufficient to train a measure-preserving one-step map onto multimodal conditional distributions p(x_future | x_now), yet the derivation only enforces temporal self-consistency between source and destination timesteps; no explicit density-matching or volume-preservation term is shown, leaving open the possibility that the learned field collapses modes or distorts local volumes in high-dimensional chaotic fields. A proof sketch or diagnostic (e.g., Jacobian determinant statistics on held-out samples) is needed to support the central claim.

Authors: We thank the referee for this insightful observation. The JVP-regularized rectification objective is derived to enforce that the learned velocity field satisfies the continuity equation for probability transport under the one-step flow map, ensuring consistency between the source noise distribution and the target conditional distribution without explicit Jacobian computation. While the continuous-time flow formulation guarantees measure preservation by construction when the map is a diffeomorphism, we acknowledge that an empirical diagnostic would better support the claim in the discrete setting. In the revised manuscript, we will add Jacobian determinant statistics computed on held-out ERA5 samples, along with a brief proof sketch in the appendix demonstrating how the rectification loss approximates volume preservation. revision: yes
Referee: [§4.3, Table 2] §4.3, Table 2: the headline claim that single-NFE Tyche matches or exceeds ECMWF IFS ensemble skill and calibration is load-bearing, but the reported CRPS, spread-skill ratio, and rank histograms lack error bars across multiple random seeds and do not include an ablation that isolates the JVP term from the rollout finetuning; without these, it is impossible to attribute calibration to the one-step transport itself rather than post-hoc supervision.

Authors: We agree that error bars and component ablations are necessary to robustly support the headline claims. In the revised version, we will recompute all metrics in Table 2 with error bars over at least five independent random seeds. We will also include a new ablation table that isolates the contribution of the JVP-regularized rectification term by comparing the full model against a variant trained without it (while retaining rollout finetuning) and against a baseline without rollout finetuning. This will clarify the role of each component in achieving the reported calibration. revision: yes
Referee: [§3.1] §3.1: the destination-aware average-velocity field is asserted to capture the full conditional distribution in one evaluation, but the training procedure relies on the rectification objective alone; if this objective reduces to a fitted consistency regularizer (as the skeptic note suggests), the method may not generalize to out-of-distribution initial conditions where multimodality is pronounced.

Authors: The destination-aware average-velocity parameterization explicitly models the conditional transport from a Gaussian base to the target distribution p(x_future | x_now) by learning a state-dependent velocity field. The rectification objective is not merely a consistency regularizer; it is derived to ensure the one-step map satisfies the underlying flow ODE, enabling capture of multimodality through the learned transport. Our training on multi-year ERA5 data already spans a wide range of atmospheric regimes. To address generalization concerns, we will expand the discussion in §3.1 with additional theoretical motivation and include results on a held-out extreme-event subset in the revision to empirically demonstrate robustness. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper derives a JVP-regularized rectification objective from first principles to enable training of a destination-aware one-step flow without explicit Jacobians, then validates the resulting model via direct empirical comparison to multi-step baselines and ECMWF on ERA5 data. No load-bearing steps reduce by construction to fitted inputs, self-citations, or renamed known results; the central claim (single-NFE calibration and skill) rests on external experimental outcomes rather than tautological re-expression of the training objective itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities. The approach builds on standard flow-matching ideas with a new regularization term whose details are not specified.

pith-pipeline@v0.9.0 · 5560 in / 1089 out tokens · 51164 ms · 2026-05-11T00:59:14.995139+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we derive a JVP-regularized rectification objective that enforces temporal self-consistency across source and destination flow timesteps without explicitly forming Jacobians
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Tyche models the conditional forecast distribution with a destination-aware average-velocity flow

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 17 canonical work pages · 8 internal anchors

[1]

Skillful joint probabilistic weather forecasting from marginals.arXiv preprint arXiv:2506.10772, 2025

Ferran Alet, Ilan Price, Andrew El-Kadi, Dominic Masters, Stratis Markou, Tom R Andersson, Jacklynn Stott, Remi Lam, Matthew Willson, Alvaro Sanchez-Gonzalez, et al. Skillful joint probabilistic weather forecasting from marginals.arXiv preprint arXiv:2506.10772,

work page arXiv
[2]

Elucidated rolling diffusion models for probabilistic forecasting of complex dynamics.arXiv preprint arXiv:2506.20024,

Salva Rühling Cachay, Miika Aittala, Karsten Kreis, Noah Brenowitz, Arash Vahdat, Morteza Mardani, and Rose Yu. Elucidated rolling diffusion models for probabilistic forecasting of complex dynamics.arXiv preprint arXiv:2506.20024,

work page arXiv
[3]

Archesweather & archesweathergen: a deterministic and generative model for efficient ml weather forecasting.arXiv preprint arXiv:2412.12971,

Guillaume Couairon, Renu Singh, Anastase Charantonis, Christian Lessig, and Claire Monteleoni. Archesweather & archesweathergen: a deterministic and generative model for efficient ml weather forecasting.arXiv preprint arXiv:2412.12971,

work page arXiv
[4]

One Step Diffusion via Shortcut Models

Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One step diffusion via shortcut models.arXiv preprint arXiv:2410.12557,

work page internal anchor Pith review arXiv
[5]

arXiv preprint arXiv:2502.00338 , year=

Yuan Gao, Hao Wu, Ruiqi Shu, Huanshuo Dong, Fan Xu, Rui Ray Chen, Yibo Yan, Qingsong Wen, Xuming Hu, Kun Wang, et al. Oneforecast: A universal framework for global and regional weather forecasting.arXiv preprint arXiv:2502.00338,

work page arXiv
[6]

Mean Flows for One-step Generative Modeling

Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling.arXiv preprint arXiv:2505.13447, 2025a. Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J Zico Kolter, and Kaiming He. Improved mean flows: On the challenges of fastforward generative models.arXiv preprint arXiv:2512.02012, 2025b. B...

work page internal anchor Pith review arXiv
[7]

The era5 global reanalysis.Quarterly journal of the royal meteorological society, 146(730):1999–2049,

Hans Hersbach, Bill Bell, Paul Berrisford, Shoji Hirahara, András Horányi, Joaquín Muñoz-Sabater, Julien Nicolas, Carole Peubey, Raluca Radu, Dinand Schepers, et al. The era5 global reanalysis.Quarterly journal of the royal meteorological society, 146(730):1999–2049,

1999
[8]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models.arXiv preprint arXiv:2410.11081,

work page internal anchor Pith review arXiv
[11]

FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, et al. Fourcastnet: A global data-driven high- resolution weather model using adaptive fourier neural operators.arXiv preprint arXiv:2202.11214,

work page internal anchor Pith review arXiv
[12]

GLU Variants Improve Transformer

Noam Shazeer. Glu variants improve transformer.arXiv preprint arXiv:2002.05202,

work page internal anchor Pith review Pith/arXiv arXiv 2002
[13]

Consistency Models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models.arXiv preprint arXiv:2303.01469,

work page internal anchor Pith review arXiv
[14]

& Kotamarthi, R

Jason Stock, Troy Arcomano, and Rao Kotamarthi. Swift: An autoregressive consistency model for efficient weather forecasting.arXiv preprint arXiv:2509.25631,

work page arXiv
[15]

Gen2: A generative prediction-correction framework for long-time emulations of spatially-resolved climate extremes.arXiv preprint arXiv:2508.15196,

Mengze Wang, Benedikt Barthel Sorensen, and Themistoklis Sapsis. Gen2: A generative prediction-correction framework for long-time emulations of spatially-resolved climate extremes.arXiv preprint arXiv:2508.15196,

work page arXiv
[16]

arXiv preprint arXiv:2505.19432 , year=

Hao Wu, Yuan Gao, Ruijian Gou, Xian Wu, Chuhan Wu, Huahui Yi, Johannes Brandstetter, Fan Xu, Kun Wang, Penghao Zhao, et al. Advanced long-term earth system forecasting.arXiv preprint arXiv:2505.19432,

work page arXiv
[17]

Alphaﬂow: Understanding and improvi ng meanﬂow models

Huijie Zhang, Aliaksandr Siarohin, Willi Menapace, Michael Vasilkovsky, Sergey Tulyakov, Qing Qu, and Ivan Skorokhodov. Alphaflow: Understanding and improving meanflow models.arXiv preprint arXiv:2510.20771,

work page arXiv
[18]

arXiv preprint arXiv:2506.09193 , year=

Yilin Zhuang and Karthik Duraisamy. Ladcast: A latent diffusion model for medium-range ensemble weather forecasting. arXiv preprint arXiv:2506.09193,

work page arXiv
[19]

These variables are defined on 13 standard pressure levels, namely 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, and 1000 hPa

For pressure-level fields, we include five atmospheric variables: Geopotential (z), Specific humidity (q), Temperature (t), U component of wind (u), and V component of wind (v). These variables are defined on 13 standard pressure levels, namely 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, and 1000 hPa. In addition, we use four single-level v...

1979