Control-Augmented Autoregressive Diffusion for Data Assimilation
Pith reviewed 2026-05-18 08:58 UTC · model grok-4.3
The pith
Adding a learned controller to pretrained autoregressive diffusion models turns data assimilation into a fast feed-forward rollout for chaotic PDEs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Motivated by a stochastic optimal control view of ARDM trajectories, we augment a pretrained autoregressive diffusion model with an offline-trained controller. The controller learns to inject small controls within each denoising sub-step by optimizing a terminal-cost objective over previewed rollouts, yielding a reusable policy for guided generation that stays close to the pretrained dynamics. For data assimilation in chaotic PDEs this reduces the task to a feed-forward rollout with corrections at inference time.
What carries the argument
An amortized controller that learns stepwise corrective controls for ARDM denoising steps by previewing rollouts under a terminal-cost objective.
Load-bearing premise
The controller can learn effective stepwise corrections by previewing future rollouts under a terminal-cost objective while injecting only small controls that keep trajectories close to the pretrained ARDM dynamics without introducing instability.
What would settle it
Testing whether the reported order-of-magnitude speedup and accuracy gains persist when the same controller is applied to a previously unseen chaotic PDE or to a different reanalysis dataset with new observation densities.
Figures
read the original abstract
Despite advances in test-time scaling and diffusion finetuning, guidance for Auto-Regressive Diffusion Models (ARDMs) remains underexplored. We introduce an amortized framework that augments a pretrained ARDM with an offline-trained controller. By previewing future rollouts, the controller learns stepwise corrections that anticipate observations under a terminal-cost objective, yielding a reusable policy for guided generation. Motivated by a stochastic optimal control view of ARDM trajectories, our method injects small controls within each denoising sub-step while staying close to the pretrained dynamics. We study this approach for dataassimilation (DA) in chaotic spatiotemporal partial differential equations (PDEs), where existing methods are often computationally expensive and susceptible to forecast drift under sparse observations. At inference, DA becomes a feed-forward rollout with on-the-fly corrections, achieving an order-of-magnitude speedup over strong diffusion-based baselines. Across two canonical PDEs and a compact ECMWF Reanalysis v5 (ERA5) pilot spanning six observation regimes, our method consistently improves stability and accuracy over state-of-the-art alternatives, with similar improvements observed in a larger-scale GenCast study.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Control-Augmented Autoregressive Diffusion Models (CA-ARDM) for data assimilation. A pretrained ARDM is augmented with an offline-trained amortized controller that previews future rollouts under a terminal-cost objective to learn small stepwise corrections at each denoising sub-step. Motivated by a stochastic optimal control perspective, the method converts DA into a feed-forward rollout with on-the-fly corrections. Experiments on two canonical PDEs and a compact ERA5 pilot across six observation regimes report order-of-magnitude speedups and consistent gains in stability and accuracy over diffusion-based baselines, with similar trends noted in a GenCast study.
Significance. If the central claims hold, the work offers a promising direction for efficient, amortized guidance in diffusion-based forecasting and assimilation tasks. The offline controller training and small-control injection within pretrained dynamics could reduce the computational burden of test-time guidance while addressing forecast drift in chaotic systems. Reproducible aspects such as the feed-forward inference procedure and the explicit terminal-cost formulation are strengths that facilitate follow-up work.
major comments (2)
- [Method (controller design and terminal-cost objective)] The central stability claim rests on the assumption that the learned controller produces only small controls that keep trajectories close to the pretrained ARDM without amplifying drift in chaotic regimes (abstract and method description). No explicit bound on control magnitude, Lyapunov-style analysis, or ablation on observation sparsity thresholds is provided; this is load-bearing for the reported stability and accuracy gains under sparse observations.
- [Experimental evaluation (PDE and ERA5 results)] The abstract states consistent improvements and order-of-magnitude speedups across PDEs and six ERA5 observation regimes, yet no specific quantitative metrics, error bars, baseline implementation details, or experimental controls are reported. This weakens assessment of whether post-hoc choices affect the speedup and accuracy claims.
minor comments (2)
- [Abstract and conclusion] The GenCast study is mentioned only in passing; a short quantitative summary or pointer to supplementary results would strengthen the broader applicability claim.
- [Preliminaries and method] Notation for the amortized policy and the precise form of the terminal cost could be clarified with a short equation or pseudocode block to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Method (controller design and terminal-cost objective)] The central stability claim rests on the assumption that the learned controller produces only small controls that keep trajectories close to the pretrained ARDM without amplifying drift in chaotic regimes (abstract and method description). No explicit bound on control magnitude, Lyapunov-style analysis, or ablation on observation sparsity thresholds is provided; this is load-bearing for the reported stability and accuracy gains under sparse observations.
Authors: We agree that an explicit analysis of control magnitudes and robustness to sparsity would strengthen the stability claims. In the revised manuscript we will add a quantitative study of the L2 norms of the injected controls across all reported experiments, confirming that they remain small relative to the diffusion noise scale. We will also include an ablation varying observation density to identify performance thresholds. While a full Lyapunov analysis is challenging for the underlying stochastic diffusion process, we will expand the discussion of the terminal-cost objective to clarify how it encourages proximity to the pretrained ARDM dynamics. These additions directly address the load-bearing nature of the stability argument. revision: yes
-
Referee: [Experimental evaluation (PDE and ERA5 results)] The abstract states consistent improvements and order-of-magnitude speedups across PDEs and six ERA5 observation regimes, yet no specific quantitative metrics, error bars, baseline implementation details, or experimental controls are reported. This weakens assessment of whether post-hoc choices affect the speedup and accuracy claims.
Authors: The experimental section of the manuscript already contains tables with mean errors, standard deviations across multiple random seeds, and comparisons against the listed baselines, together with implementation details for the PDE solvers and ERA5 preprocessing. To improve accessibility, we will revise the abstract to cite the specific speedup factors and error reductions (with references to the relevant tables and figures). We will also add a dedicated paragraph in the experimental setup subsection that explicitly lists all post-hoc choices, hyperparameter ranges, and controls used to ensure the reported gains are not artifacts of particular selections. revision: partial
Circularity Check
No significant circularity; new controller and evaluations are independent of inputs.
full rationale
The paper introduces an offline-trained controller that augments a pretrained ARDM using preview rollouts and a terminal-cost objective to learn stepwise corrections for data assimilation. This framework is motivated by stochastic optimal control but implements a distinct amortized policy that remains close to the base dynamics while providing on-the-fly guidance. Performance improvements and speedups are demonstrated via direct evaluations on two PDEs and ERA5 across six observation regimes, without any reduction of claims to quantities fitted from the same evaluation data or to self-citations that bear the central load. The derivation chain is self-contained, with the controller training and inference rollout serving as externally verifiable additions rather than tautological restatements of pretrained components.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption ARDM trajectories admit a stochastic optimal control interpretation that justifies injecting small controls at denoising sub-steps
invented entities (1)
-
amortized controller policy
no independent evidence
Reference graph
Works this paper leans on
-
[1]
WIREs Climate Change , author =
doi: https://doi.org/10.1002/wcc.535. URL https://wires.onlinelibrary. wiley.com/doi/abs/10.1002/wcc.535. Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InThe Eleventh International Conference on Learning Representations,
-
[2]
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
Kevin Clark, Paul Vicol, Kevin Swersky, and David J Fleet. Directly fine-tuning diffusion models on differentiable rewards.arXiv preprint arXiv:2309.17400,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
doi: https://doi.org/10.1007/978-3-030-96709-3. Hang Fan, Ben Fei, Pierre Gentine, Yi Xiao, Kun Chen, Yubao Liu, Yongquan Qu, Fenghua Ling, and Lei Bai. Physically consistent global atmospheric data assimilation with machine learning in a latent space.arXiv preprint arXiv:2502.02884,
-
[4]
URL https://agupubs.onlinelibrary.wiley.com/doi/ abs/10.1029/2025WR040078
doi: https:// doi.org/10.1029/2025WR040078. URL https://agupubs.onlinelibrary.wiley.com/doi/ abs/10.1029/2025WR040078. e2025WR040078 2025WR040078. Han Gao, Xu Han, Xiantao Fan, Luning Sun, Li-Ping Liu, Lian Duan, and Jian-Xun Wang. Bayesian conditional diffusion models for versatile spatiotemporal turbulence generation.Computer Methods in Applied Mechanic...
-
[5]
Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter D Dueben, and Torsten Hoefler
URLhttp://dx.doi.org/10.1007/978-3-031-19790-1_7. Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter D Dueben, and Torsten Hoefler. Diffda: a diffusion model for weather-scale data assimilation.arXiv preprint arXiv:2401.05932,
-
[6]
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion.arXiv preprint arXiv:2506.08009,
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
doi: https://doi.org/10.1017/CBO9780511802270. Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration models.Advances in Neural Information Processing Systems, 35:23593–23606,
-
[8]
URL https: //arxiv.org/abs/2406.16947. 12 Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Mohammad Amin Nabian, Tao Ge, Akshay Subramaniam, Karthik Kashinath, Jan Kautz, and Mike Pritchard. Residual corrective diffusion modeling for km-scale atmospheric downscaling.Communications Earth &; Environme...
-
[9]
doi: 10.1038/s43247-025-02042-5
ISSN 2662-4435. doi: 10.1038/s43247-025-02042-5. URLhttp://dx.doi.org/10.1038/s43247-025-02042-5. Kushagra Pandey, Ruihan Yang, and Stephan Mandt. Fast samplers for inverse problems in iterative refinement models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems,
-
[10]
Kushagra Pandey, Farrin Marouf Sofian, Felix Draxler, Theofanis Karaletsos, and Stephan Mandt
URLhttps://openreview.net/forum?id=qxS4IvtLdD. Kushagra Pandey, Farrin Marouf Sofian, Felix Draxler, Theofanis Karaletsos, and Stephan Mandt. Variational control for guidance in diffusion models.arXiv preprint arXiv:2502.03686,
-
[11]
URL https://arxiv.org/abs/2408.10958. Ashwini Pokle, Matthew J. Muckley, Ricky T. Q. Chen, and Brian Karrer. Training-free linear image inverses via flows.Transactions on Machine Learning Research,
-
[12]
Gencast: Diffusion- based ensemble forecasting for medium-range weather
ISSN 2835-8856. URL https://openreview.net/forum?id=PLIt3a4yTm. Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, et al. Gencast: Diffusion- based ensemble forecasting for medium-range weather.arXiv preprint arXiv:2312.15796,
-
[13]
Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848,
-
[14]
Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502,
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[15]
URL https://arxiv.org/abs/2407.13734. Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, and Tommaso Biancalani. Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review,
-
[16]
Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review
URLhttps://arxiv.org/abs/2501.09685. Francisco Vargas, Will Sussman Grathwohl, and Arnaud Doucet. Denoising diffusion samplers. InThe Eleventh International Conference on Learning Representations,
-
[17]
URL https: //openreview.net/forum?id=8pvnfTAbu1f. Rui Wang and Rose Yu. Physics-guided deep learning for dynamical systems: A survey.arXiv preprint arXiv:2107.01272,
-
[18]
doi: 10.1162/ neco.1989.1.2.270
ISSN 1530-888X. doi: 10.1162/ neco.1989.1.2.270. URLhttp://dx.doi.org/10.1162/neco.1989.1.2.270. Ruihan Yang, Prakhar Srivastava, and Stephan Mandt. Diffusion probabilistic modeling for video generation.Entropy, 25(10):1469,
-
[19]
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Lijun Yu, Jos´e Lezama, Nitesh B Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, et al. Language model beats diffusion– tokenizer is key to visual generation.arXiv preprint arXiv:2310.05737,
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
Cheng Zhang, Judith B ¨utepage, Hedvig Kjellstr ¨om, and Stephan Mandt. Advances in variational inference.IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026,
work page 2008
-
[21]
C Active Observation Selector We maintain a preview buffer containing all observations fromT that lie within a fixed lookahead horizon Λ from index t0. Each entry in the buffer is represented as a triplet (yj,M j,∆ j), where (i) j∈ T is the physical time index of the observation; (ii) yj is the observed signal, lifted to full resolution when necessary; (i...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.