pith. sign in

arxiv: 2605.14285 · v2 · pith:EWKWJLMRnew · submitted 2026-05-14 · 📡 eess.IV · cs.LG

ForcingDAS: Unified and Robust Data Assimilation via Diffusion Forcing

Pith reviewed 2026-06-30 20:21 UTC · model grok-4.3

classification 📡 eess.IV cs.LG
keywords data assimilationdiffusion modelsweather forecastingnowcastingsmoothingtrajectory priorNavier-Stokesatmospheric estimation
0
0 comments X

The pith

A single diffusion-forcing model unifies filtering and smoothing for data assimilation without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ForcingDAS as a framework that learns a joint trajectory prior over entire sequences rather than relying on fragile frame-to-frame transitions. This addresses error accumulation when observations are non-Markovian, a common issue in real-world weather data where measurements capture only part of a higher-dimensional state. The same trained model handles nowcasting, fixed-lag smoothing, and full reanalysis simply by selecting different inference schedules. Experiments on 2D Navier-Stokes flow, precipitation nowcasting, and global atmospheric estimation show the approach matches or exceeds specialized learned and classical baselines, with the biggest gains on weather tasks.

Core claim

ForcingDAS uses Diffusion Forcing with an independent noise level per frame to learn a joint-trajectory prior instead of frame-to-frame transitions. This prior captures long-horizon temporal dependencies and reduces error accumulation. At inference the identical model covers the full range from filtering to smoothing by changing only the noise schedule, without any retraining.

What carries the argument

Diffusion Forcing with independent per-frame noise levels, which builds a joint prior over full trajectories rather than sequential transitions.

If this is right

  • The model supports nowcasting, fixed-lag smoothing, and batch reanalysis from one set of weights.
  • Error accumulation is reduced on long horizons with partial observations.
  • Competitive or better accuracy than regime-specific baselines holds on fluid simulation, precipitation, and global weather tasks.
  • Largest gains appear on real-world weather benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Operational meteorology systems could switch analysis modes on demand without maintaining separate models.
  • The joint-prior approach may extend to other partially observed dynamical systems such as ocean or biological state estimation.
  • Hybrid pipelines could combine the learned trajectory prior with classical variational methods for improved robustness.

Load-bearing premise

Independent per-frame noise levels in the diffusion process are enough to capture long-range dependencies and stop error buildup when observations depend on unobserved state variables.

What would settle it

Measure whether prediction error grows linearly or sub-linearly over long sequences of real atmospheric data whose observations are known to be non-Markovian.

Figures

Figures reproduced from arXiv: 2605.14285 by Chanyong Jung, Haijie Yuan, Ismail Alkhouri, Jeffrey A Fessler, Lianghe Shi, Qing Qu, Saiprasad Ravishankar, Siyi Chen, Xiao Li, Yida Pan, Yixuan Jia, Yue Cynthia Wu.

Figure 1
Figure 1. Figure 1: ForcingDAS at a glance. (a-c) A single trained ForcingDAS model covers filtering, fixed-lag smoothing, and full-sequence smoothing, with the data-assimilation regime selected purely at inference. (d) Per-frame adaptive observation guidance keeps the solver robust over long horizons. 1 arXiv:2605.14285v1 [eess.IV] 14 May 2026 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Demonstration of ForcingDAS on precipitation nowcasting on a held-out trajectory from the Storm Event Imagery (SEVIR) dataset, Vertically Integrated Liquid (VIL) radar product, under sparse pixel observations (10% of pixels visible) with 6 clean context frames seeding the sequence (blue-bordered columns). Top: ground truth and predictions from the per-step learned filter FlowDAS and three inference regimes… view at source ↗
Figure 3
Figure 3. Figure 3: Per-frame filtering comparison on a representative NS trajectory under SO-5% with 10 clean context frames (blue-bordered columns). Top four rows: ground truth and predictions from the classical EnKF, the learned filter FlowDAS, and ForcingDAS-AR. Fifth row: per-frame radially-averaged kinetic-energy spectrum 𝐸(𝑘) on log-log axes. Bottom: per-frame NRMSE, mid-𝑘, and all-𝑘 spectrum relative error (↓). The sm… view at source ↗
Figure 4
Figure 4. Figure 4: ERA5 SO-10% with-context assimilation, Z500 (geopotential at 500 hPa) on a representa￾tive held-out trajectory. Rows (top to bottom): ground truth, ForcingDAS-Pyr prediction, TensorVar prediction, ForcingDAS-Pyr pixel-wise error, TensorVar pixel-wise error, sparse observation pat￾tern, and the per-frame radially-averaged zonal-wavenumber spectrum overlaying predictions and ground truth. Columns are evenly-… view at source ↗
read the original abstract

Data assimilation (DA) estimates the state of an evolving dynamical system from noisy, partial observations, and is widely used in scientific simulation as well as weather and climate science. In practice, filtering methods rely on frame-to-frame transition models. However, these models are fragile when observations are non-Markovian (when they form only a partial slice of a higher-dimensional latent state as in real-world weather data): they tend to accumulate errors over long horizons. At the same time, learned DA methods typically commit to a single regime, either filtering (nowcasting, real-time forecasting) or smoothing (retrospective reanalysis), which splits what should be a shared prior across application-specific pipelines. To address both issues, we introduce ForcingDAS, a unified and robust DA framework. Built on Diffusion Forcing with an independent noise level assigned to each frame, ForcingDAS learns a joint-trajectory prior instead of frame-to-frame transitions. This allows it to capture long-horizon temporal dependencies and reduce error accumulation. In addition, the same trained model spans the full filtering to smoothing spectrum at inference time. Specifically, nowcasting, fixed-lag smoothing, and batch reanalysis are selected through the inference schedule alone, without retraining. We evaluate ForcingDAS on 2D Navier-Stokes vorticity, precipitation nowcasting, and global atmospheric state estimation. Across all settings, a single model is competitive with or outperforms both learned and classical baselines that are specialized for individual regimes, with the largest gains observed on real-world weather benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces ForcingDAS, a unified data assimilation framework built on Diffusion Forcing. By assigning an independent noise level to each frame, the method learns a joint-trajectory prior rather than frame-to-frame transition models. This is claimed to capture long-horizon temporal dependencies and reduce error accumulation when observations are non-Markovian. The same trained model is asserted to span the filtering-to-smoothing spectrum (nowcasting, fixed-lag smoothing, batch reanalysis) at inference time via the inference schedule alone, without retraining. Evaluations are reported on 2D Navier-Stokes vorticity, precipitation nowcasting, and global atmospheric state estimation, with the claim that a single model is competitive with or outperforms both learned and classical baselines specialized for individual regimes, with largest gains on real-world weather benchmarks.

Significance. If the central empirical claims hold after verification of the full derivations and ablations, the result would be significant for weather and climate applications: it offers a single prior that unifies regimes previously handled by separate pipelines and directly targets the fragility of sequential transitions under partial observations. The approach builds on an external diffusion-forcing technique, so credit would attach to the adaptation and the reported outperformance on real-world benchmarks rather than to a parameter-free derivation or machine-checked proof.

major comments (2)
  1. [Abstract] Abstract: the claim that 'the inference schedule alone' selects filtering vs. smoothing regimes without retraining is load-bearing for the unification result, yet the abstract provides no description of the schedule construction, the precise conditioning on observations, or how independent per-frame noise levels are set at test time to avoid implicit regime-specific adjustments.
  2. [Abstract] Abstract: the assertion that the joint-trajectory prior 'reduces error accumulation' for non-Markovian observations rests on the diffusion-forcing construction, but no quantitative evidence (e.g., error-growth curves, horizon-dependent metrics, or comparison to a Markovian baseline) is supplied in the available text to isolate this mechanism from other factors.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on the abstract. We address each major point below with references to the full manuscript and indicate where revisions are appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'the inference schedule alone' selects filtering vs. smoothing regimes without retraining is load-bearing for the unification result, yet the abstract provides no description of the schedule construction, the precise conditioning on observations, or how independent per-frame noise levels are set at test time to avoid implicit regime-specific adjustments.

    Authors: We agree the abstract is too concise on this point. Section 3.2 of the manuscript details the construction: the inference schedule independently assigns noise levels to each frame (sampled from a noise schedule that increases for unobserved future frames in filtering mode and remains low for all frames in batch smoothing), with observations conditioned by clamping their noise level to zero. This uses the same trained joint-trajectory model without retraining or regime-specific parameters. We will revise the abstract to include a brief clause describing the per-frame noise schedule and observation conditioning. revision: yes

  2. Referee: [Abstract] Abstract: the assertion that the joint-trajectory prior 'reduces error accumulation' for non-Markovian observations rests on the diffusion-forcing construction, but no quantitative evidence (e.g., error-growth curves, horizon-dependent metrics, or comparison to a Markovian baseline) is supplied in the available text to isolate this mechanism from other factors.

    Authors: The abstract summarizes the claim; quantitative isolation of the mechanism appears in the full manuscript. Section 5.1 reports error-growth curves on 2D Navier-Stokes vorticity, comparing ForcingDAS against a Markovian frame-to-frame diffusion baseline and showing slower long-horizon error accumulation under partial observations. Horizon-dependent RMSE metrics with the same baseline are given for precipitation nowcasting in Section 5.2. We will add a short parenthetical reference to these results in the abstract if space allows. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and description position ForcingDAS as an application of the external Diffusion Forcing technique (independent per-frame noise to learn a joint-trajectory prior), with empirical evaluations on Navier-Stokes, precipitation, and weather benchmarks showing competitiveness against specialized baselines. No equations, fitted-parameter predictions, or self-citation chains are exhibited that reduce the central claims or performance numbers to the paper's own inputs by construction. The method's ability to span filtering-to-smoothing regimes via inference schedule is presented as a direct consequence of the adopted construction rather than a self-referential fit. This is the most common honest outcome for papers whose core contribution is an empirical application of an external prior technique.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that diffusion models with independent per-frame noise can represent joint trajectory distributions for non-Markovian observation processes; no free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption Diffusion forcing with independent noise per frame captures long-horizon dependencies in non-Markovian dynamical systems
    Central modeling choice stated in the abstract

pith-pipeline@v0.9.1-grok · 5851 in / 1114 out tokens · 23356 ms · 2026-06-30T20:21:08.296391+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Evaluating the Representation Space of Diffusion Models via Self-Supervised Principles

    cs.LG 2026-06 unverdicted novelty 6.0

    Introduces the Invariant Contamination Ratio (ICR), a Fisher-based metric, to evaluate how diffusion models balance invariant representations with residual variation and to detect the onset of memorization during training.

Reference graph

Works this paper leans on

17 extracted references · 7 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Appa: Bending Weather Dynamics with Latent Diffusion Models for Global Data Assimilation

    url:https://openreview.net/forum?id=28Essvtvkw. [And+25] Gérôme Andry, Sacha Lewin, François Rozet, Omer Rochman, Victor Mangeleer, Matthias Pirlet, Elise Faulx, Marilaure Grégoire, and Gilles Louppe. “Appa: Bending Weather Dynamics with Latent Diffusion Models for Global Data Assimilation”. In: arXiv(2025).doi:10.48550/arxiv.2504.18720. [BDV23] Arundhuti...

  2. [2]

    Sharp failure rates for the bootstrap particle filter in high dimensions

    2024, pp. 4965–4987. 13 [BLB08] Peter Bickel, Bo Li, and Thomas Bengtsson. “Sharp failure rates for the bootstrap particle filter in high dimensions”. In:Pushing the limits of contemporary statistics: Contributions in honor of Jayanta K. Ghosh. Vol

  3. [3]

    Closed-loop turbulence control: Progress and challenges

    Institute of Mathematical Statistics, 2008, pp. 318–330. [BN15] Steven L Brunton and Bernd R Noack. “Closed-loop turbulence control: Progress and challenges”. In:Applied Mechanics Reviews67.5 (2015), p. 050801. [Boc+15] Marc Bocquet, H Elbern, H Eskes, M Hirtl, R Žabkar, GR Carmichael, J Flemming, A Inness, M Pagowski, JL Pérez Camaño, et al. “Data assimi...

  4. [4]

    Dataassimilation in the geosciences: An overview of methods, issues, and perspectives

    2025, pp. 3360–3385. [Car+18] AlbertoCarrassi,MarcBocquet,LaurentBertino,andGeirEvensen.“Dataassimilation in the geosciences: An overview of methods, issues, and perspectives”. In:Wiley Interdisciplinary Reviews: Climate Change9.5 (2018), e535. [Che+24a] Boyuan Chen, Diego Martí Monsó, Yilun Du, Max Simchowitz, Russ Tedrake, and Vincent Sitzmann. “Diffusi...

  5. [5]

    2018.2837502

    [GC99] Gregory Gaspari and Stephen E Cohn. “Construction of correlation functions in two and three dimensions”. In:Quarterly Journal of the Royal Meteorological Society125.554 (1999), pp. 723–757. [Gee+18] AlanJGeer,KatrinLonitz,PeterWeston,MasahiroKazumori,KozoOkamoto,Yanqiu Zhu, Emily Huichun Liu, Andrew Collard, William Bell, Stefano Migliorini, et al....

  6. [6]

    Mani- fold preserving guided diffusion

    IET. 1993, pp. 107–113. [He+24] YutongHe,NaokiMurata,Chieh-HsinLai,YuhtaTakida,ToshimitsuUesaka,Dongjun Kim, WeiHsiang Liao, Yuki Mitsufuji, Zico Kolter, Ruslan Salakhutdinov, et al. “Mani- fold preserving guided diffusion”. In:International Conference on Learning Representa- tions. Vol

  7. [7]

    The ERA5 global reanalysis

    2024, pp. 44819–44850. 15 [Her+20] Hans Hersbach, Bill Bell, Paul Berrisford, Shoji Hirahara, András Horányi, Joaquín Muñoz-Sabater, Julien Nicolas, Carole Peubey, Raluca Radu, Dinand Schepers, et al. “The ERA5 global reanalysis”. In:Quarterly Journal of the Royal Meteorological Society 146.730 (2020), pp. 1999–2049. [Her00] Hans Hersbach. “Decomposition ...

  8. [8]

    Decoupled data consistency with diffusion purification for image restoration,

    [Kál60] Rudolf Emil Kálmán. “A new approach to linear filtering and prediction problems”. In:Journal of Basic Engineering82.1 (1960), pp. 35–45. [Li+24] Xiang Li, Soo Min Kwon, Ismail R Alkhouri, Saiprasad Ravishanka, and Qing Qu. “Decoupled Data Consistency with Diffusion Purification for Image Restoration”. In: arXiv preprint arXiv:2403.06054(2024). [Li...

  9. [9]

    MCLR: Improving Conditional Modeling via Inter-Class Likelihood-Ratio Maximization and Unifying Classifier-Free Guidance with Alignment Objectives

    [Li+26a] XiangLi,YixuanJia,XiaoLi,JeffreyAFessler,RongrongWang,andQingQu.“MCLR: ImprovingConditionalModelinginVisualGenerativeModelsviaInter-ClassLikelihood- Ratio Maximization and Establishing the Equivalence between Classifier-Free Guid- ance and Alignment Objectives”. In:arXiv preprint arXiv:2603.22364(2026). [Li+26b] Xiao Li, Zekai Zhang, Xiang Li, Si...

  10. [10]

    Data assimilation

    [LSZ15] Kody Law, Andrew Stuart, and Kostas Zygalakis. “Data assimilation”. In:Cham, Switzerland: Springer214 (2015), p

  11. [11]

    Interpretable structural model error discovery from sparse assimilation increments using spectral bias-reduced neural networks: A quasi-geostrophic turbulence test case

    16 [MCH24] Rambod Mojgani, Ashesh Chattopadhyay, and Pedram Hassanzadeh. “Interpretable structural model error discovery from sparse assimilation increments using spectral bias-reduced neural networks: A quasi-geostrophic turbulence test case”. In:Journal of Advances in Modeling Earth Systems16.3 (2024), e2023MS004033. [PX23] William Peebles and Saining X...

  12. [12]

    Spatiotemporal diffusion model with paired sampling for accelerated cardiac cine MRI

    [Qiu+24] ShihanQiu,ShaoyanPan,YikangLiu,LinZhao,JianXu,QiLiu,TerrenceChen,EricZ Chen, Xiao Chen, and Shanhui Sun. “Spatiotemporal diffusion model with paired sampling for accelerated cardiac cine MRI”. In:arXiv preprint arXiv:2403.08758(2024). [Rab05] Florence Rabier. “Overview of global data assimilation developments in numerical weather-prediction centr...

  13. [13]

    Tensor-Var: Efficient Four-Dimensional Variational Data Assimila- tion

    [Yan+25] Yiming Yang, Xiaoyuan Cheng, Daniel Giles, Sibo Cheng, Yi He, Xiao Xue, Boli Chen, and Yukun Hu. “Tensor-Var: Efficient Four-Dimensional Variational Data Assimila- tion”. In:International Conference on Machine Learning. 2025.url:https://openreview. net/forum?id=bXilZCSueG. [Zha+23] Zhehao Zhang, Jiaming Liu, Deshan Yang, Ulugbek S. Kamilov, and G...

  14. [14]

    masking along the noise axis

    Per-variable, latitude-weighted CRPS and ensemble-mean NRMSE in z-score-normalized data space (per-channel std≈1); lower is better for both. Per lead time ℎ∈{1,2,3}and as a mean over the three lead times. This subsection expands on the motivation for CAT (§3.1) and reports its effect on probabilistic forecasting under the protocol above. Thetrain–testgap....

  15. [15]

    These are the conversion factors between the data space (where the model and the observation operator act) and the raw physical space

    Variable𝜇 𝑐 𝜎𝑐 Units Z50053 859.76 3,137.37m 2s−2 T850273.13 15.03K U10−0.148 5.249ms −1 V10−0.224 4.410ms −1 Table S9: Per-channel climatological mean𝜇𝑐 and standard deviation𝜎𝑐 (in raw physical units), computed from the ERA5 1979–2015 training period. These are the conversion factors between the data space (where the model and the observation operator a...

  16. [16]

    , 𝐽}, latitude index𝑖∈{1,

    Notation.Let ˆ𝑠𝑡,𝑏,𝑖,𝑗 denote the assimilated value at frame𝑡 of trajectory𝑏, longitude index𝑗∈ {1, . . . , 𝐽}, latitude index𝑖∈{1, . . . , 𝐼}; let𝑠𝑡,𝑏,𝑖,𝑗 denote the ERA5 reference; and let𝑐𝑡,𝑏,𝑖 denote the per-grid-point, per-time climatology (varying with day-of-year and hour-of-day). All quantities are in z-score-normalized data space (§S6.3.1). Latit...

  17. [17]

    This implicitly specifies 𝑩=𝜎 2 𝑏 𝑲𝑲⊤, a spatially correlated covariance whose off-diagonal structure allows observations to inform nearby unobserved grid points through the correlation length scale of𝐾. We use per- variable isotropic Gaussian kernels with length scalesℓ=(8, 6, 5, 5)grid points for (Z500, T850, U10, V10) respectively, reflecting the decre...