pith. sign in

arxiv: 2510.06637 · v3 · submitted 2025-10-08 · 💻 cs.LG · cs.AI· cs.CV

Control-Augmented Autoregressive Diffusion for Data Assimilation

Pith reviewed 2026-05-18 08:58 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV
keywords data assimilationautoregressive diffusion modelsstochastic optimal controlspatiotemporal PDEsERA5 reanalysischaotic systemsgenerative model guidance
0
0 comments X

The pith

Adding a learned controller to pretrained autoregressive diffusion models turns data assimilation into a fast feed-forward rollout for chaotic PDEs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an amortized controller trained offline to augment a pretrained autoregressive diffusion model. By previewing future rollouts, the controller learns small stepwise corrections that steer generation toward matching observations under a terminal-cost objective. This keeps the process close to the original model dynamics while enabling guided sampling. At test time the approach replaces expensive optimization loops with a single rollout that includes on-the-fly corrections, delivering large speed gains and better stability for spatiotemporal data assimilation.

Core claim

Motivated by a stochastic optimal control view of ARDM trajectories, we augment a pretrained autoregressive diffusion model with an offline-trained controller. The controller learns to inject small controls within each denoising sub-step by optimizing a terminal-cost objective over previewed rollouts, yielding a reusable policy for guided generation that stays close to the pretrained dynamics. For data assimilation in chaotic PDEs this reduces the task to a feed-forward rollout with corrections at inference time.

What carries the argument

An amortized controller that learns stepwise corrective controls for ARDM denoising steps by previewing rollouts under a terminal-cost objective.

Load-bearing premise

The controller can learn effective stepwise corrections by previewing future rollouts under a terminal-cost objective while injecting only small controls that keep trajectories close to the pretrained ARDM dynamics without introducing instability.

What would settle it

Testing whether the reported order-of-magnitude speedup and accuracy gains persist when the same controller is applied to a previously unseen chaotic PDE or to a different reanalysis dataset with new observation densities.

Figures

Figures reproduced from arXiv: 2510.06637 by Farrin Marouf Sofian, Francesco Immorlano, Kushagra Pandey, Prakhar Srivastava, Stephan Mandt.

Figure 1
Figure 1. Figure 1: Overview of (CADA). A pretrained autoregressive diffusion model (ARDM) generates forecasts by conditioning on previous states xt in its denoising steps. A separately trained lightweight controller injects additive controls u (s) t+1 into the denoising sub-steps z (s+1) t+1 →z (s) t+1, using past states and previewed future observations yτ within the forecast window. These controls guide the frozen ARDM so … view at source ↗
Figure 2
Figure 2. Figure 2: Our method improves rollout stability and reconstruction consistency over long horizons. Qualitative results on 2D Kolmogorov flow (horizon 180) show snapshots at representative timesteps, including regions with and without observations. While most baselines drift around t ≈ 35, our approach (CADA) preserves sharp reconstructions. Joint-AR avoids catastrophic divergence but loses fine-scale structures at l… view at source ↗
Figure 3
Figure 3. Figure 3: Our method better preserves physics-aware diagnostics under sparse observations. (a) Total variation (TV) error for the 1D Kuramoto–Sivashinsky system over a 640-step rollout; lower TV error indicates more faithful preservation of spatial oscillations. (b) Dissipation rate error for 2D Kolmogorov flow over a 60-step rollout; accurate dissipation reflects correct energy cascade to small scales. Both figures… view at source ↗
Figure 4
Figure 4. Figure 4: ERA5 vorticity assimilation under sparse MS-4 observations (500 hPa) (ERA5 temperature in [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Stability. Our method yields su￾perior long-horizon rollouts on the 1D Ku￾ramoto–Sivashinsky PDE (horizon 640) un￾der sparse spatiotemporal observations (green dots). Darker colors indicate lower forecast error [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: ERA5 temperature assimilation under sparse MS-4 observations (500 hPa, North America). Each row shows forecast snapshots at selected timesteps, with the leftmost column displaying the observation pattern (dense or missing). Columns compare the ground truth (GT) against CADA and four strong baselines: Universal Amortized, Plain Amortized, Joint AR, and Joint AAO. The regime is extremely challenging—observat… view at source ↗
Figure 7
Figure 7. Figure 7: Kolmogorov flow assimilation under MS-4. CADA preserves coherent vortices and fine￾scale filamentation across long horizons, closely matching ground truth. EnKF shows increasing phase and amplitude errors, 3D-Var and 4D-Var oversmooth or misalign small-scale structures under sparse observations, and the uncontrolled ARDM rapidly drifts. These visual trends mirror quantitative results in Tab. 6, where class… view at source ↗
read the original abstract

Despite advances in test-time scaling and diffusion finetuning, guidance for Auto-Regressive Diffusion Models (ARDMs) remains underexplored. We introduce an amortized framework that augments a pretrained ARDM with an offline-trained controller. By previewing future rollouts, the controller learns stepwise corrections that anticipate observations under a terminal-cost objective, yielding a reusable policy for guided generation. Motivated by a stochastic optimal control view of ARDM trajectories, our method injects small controls within each denoising sub-step while staying close to the pretrained dynamics. We study this approach for dataassimilation (DA) in chaotic spatiotemporal partial differential equations (PDEs), where existing methods are often computationally expensive and susceptible to forecast drift under sparse observations. At inference, DA becomes a feed-forward rollout with on-the-fly corrections, achieving an order-of-magnitude speedup over strong diffusion-based baselines. Across two canonical PDEs and a compact ECMWF Reanalysis v5 (ERA5) pilot spanning six observation regimes, our method consistently improves stability and accuracy over state-of-the-art alternatives, with similar improvements observed in a larger-scale GenCast study.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Control-Augmented Autoregressive Diffusion Models (CA-ARDM) for data assimilation. A pretrained ARDM is augmented with an offline-trained amortized controller that previews future rollouts under a terminal-cost objective to learn small stepwise corrections at each denoising sub-step. Motivated by a stochastic optimal control perspective, the method converts DA into a feed-forward rollout with on-the-fly corrections. Experiments on two canonical PDEs and a compact ERA5 pilot across six observation regimes report order-of-magnitude speedups and consistent gains in stability and accuracy over diffusion-based baselines, with similar trends noted in a GenCast study.

Significance. If the central claims hold, the work offers a promising direction for efficient, amortized guidance in diffusion-based forecasting and assimilation tasks. The offline controller training and small-control injection within pretrained dynamics could reduce the computational burden of test-time guidance while addressing forecast drift in chaotic systems. Reproducible aspects such as the feed-forward inference procedure and the explicit terminal-cost formulation are strengths that facilitate follow-up work.

major comments (2)
  1. [Method (controller design and terminal-cost objective)] The central stability claim rests on the assumption that the learned controller produces only small controls that keep trajectories close to the pretrained ARDM without amplifying drift in chaotic regimes (abstract and method description). No explicit bound on control magnitude, Lyapunov-style analysis, or ablation on observation sparsity thresholds is provided; this is load-bearing for the reported stability and accuracy gains under sparse observations.
  2. [Experimental evaluation (PDE and ERA5 results)] The abstract states consistent improvements and order-of-magnitude speedups across PDEs and six ERA5 observation regimes, yet no specific quantitative metrics, error bars, baseline implementation details, or experimental controls are reported. This weakens assessment of whether post-hoc choices affect the speedup and accuracy claims.
minor comments (2)
  1. [Abstract and conclusion] The GenCast study is mentioned only in passing; a short quantitative summary or pointer to supplementary results would strengthen the broader applicability claim.
  2. [Preliminaries and method] Notation for the amortized policy and the precise form of the terminal cost could be clarified with a short equation or pseudocode block to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Method (controller design and terminal-cost objective)] The central stability claim rests on the assumption that the learned controller produces only small controls that keep trajectories close to the pretrained ARDM without amplifying drift in chaotic regimes (abstract and method description). No explicit bound on control magnitude, Lyapunov-style analysis, or ablation on observation sparsity thresholds is provided; this is load-bearing for the reported stability and accuracy gains under sparse observations.

    Authors: We agree that an explicit analysis of control magnitudes and robustness to sparsity would strengthen the stability claims. In the revised manuscript we will add a quantitative study of the L2 norms of the injected controls across all reported experiments, confirming that they remain small relative to the diffusion noise scale. We will also include an ablation varying observation density to identify performance thresholds. While a full Lyapunov analysis is challenging for the underlying stochastic diffusion process, we will expand the discussion of the terminal-cost objective to clarify how it encourages proximity to the pretrained ARDM dynamics. These additions directly address the load-bearing nature of the stability argument. revision: yes

  2. Referee: [Experimental evaluation (PDE and ERA5 results)] The abstract states consistent improvements and order-of-magnitude speedups across PDEs and six ERA5 observation regimes, yet no specific quantitative metrics, error bars, baseline implementation details, or experimental controls are reported. This weakens assessment of whether post-hoc choices affect the speedup and accuracy claims.

    Authors: The experimental section of the manuscript already contains tables with mean errors, standard deviations across multiple random seeds, and comparisons against the listed baselines, together with implementation details for the PDE solvers and ERA5 preprocessing. To improve accessibility, we will revise the abstract to cite the specific speedup factors and error reductions (with references to the relevant tables and figures). We will also add a dedicated paragraph in the experimental setup subsection that explicitly lists all post-hoc choices, hyperparameter ranges, and controls used to ensure the reported gains are not artifacts of particular selections. revision: partial

Circularity Check

0 steps flagged

No significant circularity; new controller and evaluations are independent of inputs.

full rationale

The paper introduces an offline-trained controller that augments a pretrained ARDM using preview rollouts and a terminal-cost objective to learn stepwise corrections for data assimilation. This framework is motivated by stochastic optimal control but implements a distinct amortized policy that remains close to the base dynamics while providing on-the-fly guidance. Performance improvements and speedups are demonstrated via direct evaluations on two PDEs and ERA5 across six observation regimes, without any reduction of claims to quantities fitted from the same evaluation data or to self-citations that bear the central load. The derivation chain is self-contained, with the controller training and inference rollout serving as externally verifiable additions rather than tautological restatements of pretrained components.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on a pretrained ARDM providing suitable base dynamics and the new controller being trainable to produce useful corrections; no explicit free parameters or invented physical entities are named in the abstract.

axioms (1)
  • domain assumption ARDM trajectories admit a stochastic optimal control interpretation that justifies injecting small controls at denoising sub-steps
    This view is invoked to motivate the controller design and the terminal-cost objective.
invented entities (1)
  • amortized controller policy no independent evidence
    purpose: Learns reusable stepwise corrections by previewing rollouts to anticipate observations
    New component added to the pretrained ARDM; no independent evidence of its existence outside the training procedure is provided.

pith-pipeline@v0.9.0 · 5743 in / 1469 out tokens · 43745 ms · 2026-05-18T08:58:22.588079+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 4 internal anchors

  1. [1]

    WIREs Climate Change , author =

    doi: https://doi.org/10.1002/wcc.535. URL https://wires.onlinelibrary. wiley.com/doi/abs/10.1002/wcc.535. Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InThe Eleventh International Conference on Learning Representations,

  2. [2]

    Directly Fine-Tuning Diffusion Models on Differentiable Rewards

    Kevin Clark, Paul Vicol, Kevin Swersky, and David J Fleet. Directly fine-tuning diffusion models on differentiable rewards.arXiv preprint arXiv:2309.17400,

  3. [3]

    Evensen, F

    doi: https://doi.org/10.1007/978-3-030-96709-3. Hang Fan, Ben Fei, Pierre Gentine, Yi Xiao, Kun Chen, Yubao Liu, Yongquan Qu, Fenghua Ling, and Lei Bai. Physically consistent global atmospheric data assimilation with machine learning in a latent space.arXiv preprint arXiv:2502.02884,

  4. [4]

    URL https://agupubs.onlinelibrary.wiley.com/doi/ abs/10.1029/2025WR040078

    doi: https:// doi.org/10.1029/2025WR040078. URL https://agupubs.onlinelibrary.wiley.com/doi/ abs/10.1029/2025WR040078. e2025WR040078 2025WR040078. Han Gao, Xu Han, Xiantao Fan, Luning Sun, Li-Ping Liu, Lian Duan, and Jian-Xun Wang. Bayesian conditional diffusion models for versatile spatiotemporal turbulence generation.Computer Methods in Applied Mechanic...

  5. [5]

    Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter D Dueben, and Torsten Hoefler

    URLhttp://dx.doi.org/10.1007/978-3-031-19790-1_7. Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter D Dueben, and Torsten Hoefler. Diffda: a diffusion model for weather-scale data assimilation.arXiv preprint arXiv:2401.05932,

  6. [6]

    Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

    Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion.arXiv preprint arXiv:2506.08009,

  7. [7]

    Kalnay , Atmospheric Modeling, Data Assimilation and Predictability , Cambridge University Press, 2002, https://doi.org/10.1017/CBO9780511802270

    doi: https://doi.org/10.1017/CBO9780511802270. Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration models.Advances in Neural Information Processing Systems, 35:23593–23606,

  8. [8]

    URL https: //arxiv.org/abs/2406.16947. 12 Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Mohammad Amin Nabian, Tao Ge, Akshay Subramaniam, Karthik Kashinath, Jan Kautz, and Mike Pritchard. Residual corrective diffusion modeling for km-scale atmospheric downscaling.Communications Earth &; Environme...

  9. [9]

    doi: 10.1038/s43247-025-02042-5

    ISSN 2662-4435. doi: 10.1038/s43247-025-02042-5. URLhttp://dx.doi.org/10.1038/s43247-025-02042-5. Kushagra Pandey, Ruihan Yang, and Stephan Mandt. Fast samplers for inverse problems in iterative refinement models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems,

  10. [10]

    Kushagra Pandey, Farrin Marouf Sofian, Felix Draxler, Theofanis Karaletsos, and Stephan Mandt

    URLhttps://openreview.net/forum?id=qxS4IvtLdD. Kushagra Pandey, Farrin Marouf Sofian, Felix Draxler, Theofanis Karaletsos, and Stephan Mandt. Variational control for guidance in diffusion models.arXiv preprint arXiv:2502.03686,

  11. [11]

    Ashwini Pokle, Matthew J

    URL https://arxiv.org/abs/2408.10958. Ashwini Pokle, Matthew J. Muckley, Ricky T. Q. Chen, and Brian Karrer. Training-free linear image inverses via flows.Transactions on Machine Learning Research,

  12. [12]

    Gencast: Diffusion- based ensemble forecasting for medium-range weather

    ISSN 2835-8856. URL https://openreview.net/forum?id=PLIt3a4yTm. Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, et al. Gencast: Diffusion- based ensemble forecasting for medium-range weather.arXiv preprint arXiv:2312.15796,

  13. [13]

    A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848, 2025

    Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848,

  14. [14]

    Denoising Diffusion Implicit Models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502,

  15. [15]

    Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review.arXiv preprint arXiv:2407.13734, 2024

    URL https://arxiv.org/abs/2407.13734. Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, and Tommaso Biancalani. Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review,

  16. [16]

    Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review

    URLhttps://arxiv.org/abs/2501.09685. Francisco Vargas, Will Sussman Grathwohl, and Arnaud Doucet. Denoising diffusion samplers. InThe Eleventh International Conference on Learning Representations,

  17. [17]

    Rui Wang and Rose Yu

    URL https: //openreview.net/forum?id=8pvnfTAbu1f. Rui Wang and Rose Yu. Physics-guided deep learning for dynamical systems: A survey.arXiv preprint arXiv:2107.01272,

  18. [18]

    doi: 10.1162/ neco.1989.1.2.270

    ISSN 1530-888X. doi: 10.1162/ neco.1989.1.2.270. URLhttp://dx.doi.org/10.1162/neco.1989.1.2.270. Ruihan Yang, Prakhar Srivastava, and Stephan Mandt. Diffusion probabilistic modeling for video generation.Entropy, 25(10):1469,

  19. [19]

    Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

    Lijun Yu, Jos´e Lezama, Nitesh B Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, et al. Language model beats diffusion– tokenizer is key to visual generation.arXiv preprint arXiv:2310.05737,

  20. [20]

    Advances in variational inference.IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026,

    Cheng Zhang, Judith B ¨utepage, Hedvig Kjellstr ¨om, and Stephan Mandt. Advances in variational inference.IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026,

  21. [21]

    C Active Observation Selector We maintain a preview buffer containing all observations fromT that lie within a fixed lookahead horizon Λ from index t0. Each entry in the buffer is represented as a triplet (yj,M j,∆ j), where (i) j∈ T is the physical time index of the observation; (ii) yj is the observed signal, lifted to full resolution when necessary; (i...