Control-Augmented Autoregressive Diffusion for Data Assimilation

Farrin Marouf Sofian; Francesco Immorlano; Kushagra Pandey; Prakhar Srivastava; Stephan Mandt

arxiv: 2510.06637 · v3 · submitted 2025-10-08 · 💻 cs.LG · cs.AI· cs.CV

Control-Augmented Autoregressive Diffusion for Data Assimilation

Prakhar Srivastava , Farrin Marouf Sofian , Francesco Immorlano , Kushagra Pandey , Stephan Mandt This is my paper

Pith reviewed 2026-05-18 08:58 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV

keywords data assimilationautoregressive diffusion modelsstochastic optimal controlspatiotemporal PDEsERA5 reanalysischaotic systemsgenerative model guidance

0 comments

The pith

Adding a learned controller to pretrained autoregressive diffusion models turns data assimilation into a fast feed-forward rollout for chaotic PDEs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an amortized controller trained offline to augment a pretrained autoregressive diffusion model. By previewing future rollouts, the controller learns small stepwise corrections that steer generation toward matching observations under a terminal-cost objective. This keeps the process close to the original model dynamics while enabling guided sampling. At test time the approach replaces expensive optimization loops with a single rollout that includes on-the-fly corrections, delivering large speed gains and better stability for spatiotemporal data assimilation.

Core claim

Motivated by a stochastic optimal control view of ARDM trajectories, we augment a pretrained autoregressive diffusion model with an offline-trained controller. The controller learns to inject small controls within each denoising sub-step by optimizing a terminal-cost objective over previewed rollouts, yielding a reusable policy for guided generation that stays close to the pretrained dynamics. For data assimilation in chaotic PDEs this reduces the task to a feed-forward rollout with corrections at inference time.

What carries the argument

An amortized controller that learns stepwise corrective controls for ARDM denoising steps by previewing rollouts under a terminal-cost objective.

Load-bearing premise

The controller can learn effective stepwise corrections by previewing future rollouts under a terminal-cost objective while injecting only small controls that keep trajectories close to the pretrained ARDM dynamics without introducing instability.

What would settle it

Testing whether the reported order-of-magnitude speedup and accuracy gains persist when the same controller is applied to a previously unseen chaotic PDE or to a different reanalysis dataset with new observation densities.

Figures

Figures reproduced from arXiv: 2510.06637 by Farrin Marouf Sofian, Francesco Immorlano, Kushagra Pandey, Prakhar Srivastava, Stephan Mandt.

**Figure 1.** Figure 1: Overview of (CADA). A pretrained autoregressive diffusion model (ARDM) generates forecasts by conditioning on previous states xt in its denoising steps. A separately trained lightweight controller injects additive controls u (s) t+1 into the denoising sub-steps z (s+1) t+1 →z (s) t+1, using past states and previewed future observations yτ within the forecast window. These controls guide the frozen ARDM so … view at source ↗

**Figure 2.** Figure 2: Our method improves rollout stability and reconstruction consistency over long horizons. Qualitative results on 2D Kolmogorov flow (horizon 180) show snapshots at representative timesteps, including regions with and without observations. While most baselines drift around t ≈ 35, our approach (CADA) preserves sharp reconstructions. Joint-AR avoids catastrophic divergence but loses fine-scale structures at l… view at source ↗

**Figure 3.** Figure 3: Our method better preserves physics-aware diagnostics under sparse observations. (a) Total variation (TV) error for the 1D Kuramoto–Sivashinsky system over a 640-step rollout; lower TV error indicates more faithful preservation of spatial oscillations. (b) Dissipation rate error for 2D Kolmogorov flow over a 60-step rollout; accurate dissipation reflects correct energy cascade to small scales. Both figures… view at source ↗

**Figure 4.** Figure 4: ERA5 vorticity assimilation under sparse MS-4 observations (500 hPa) (ERA5 temperature in [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Stability. Our method yields superior long-horizon rollouts on the 1D Kuramoto–Sivashinsky PDE (horizon 640) under sparse spatiotemporal observations (green dots). Darker colors indicate lower forecast error [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: ERA5 temperature assimilation under sparse MS-4 observations (500 hPa, North America). Each row shows forecast snapshots at selected timesteps, with the leftmost column displaying the observation pattern (dense or missing). Columns compare the ground truth (GT) against CADA and four strong baselines: Universal Amortized, Plain Amortized, Joint AR, and Joint AAO. The regime is extremely challenging—observat… view at source ↗

**Figure 7.** Figure 7: Kolmogorov flow assimilation under MS-4. CADA preserves coherent vortices and finescale filamentation across long horizons, closely matching ground truth. EnKF shows increasing phase and amplitude errors, 3D-Var and 4D-Var oversmooth or misalign small-scale structures under sparse observations, and the uncontrolled ARDM rapidly drifts. These visual trends mirror quantitative results in Tab. 6, where class… view at source ↗

read the original abstract

Despite advances in test-time scaling and diffusion finetuning, guidance for Auto-Regressive Diffusion Models (ARDMs) remains underexplored. We introduce an amortized framework that augments a pretrained ARDM with an offline-trained controller. By previewing future rollouts, the controller learns stepwise corrections that anticipate observations under a terminal-cost objective, yielding a reusable policy for guided generation. Motivated by a stochastic optimal control view of ARDM trajectories, our method injects small controls within each denoising sub-step while staying close to the pretrained dynamics. We study this approach for dataassimilation (DA) in chaotic spatiotemporal partial differential equations (PDEs), where existing methods are often computationally expensive and susceptible to forecast drift under sparse observations. At inference, DA becomes a feed-forward rollout with on-the-fly corrections, achieving an order-of-magnitude speedup over strong diffusion-based baselines. Across two canonical PDEs and a compact ECMWF Reanalysis v5 (ERA5) pilot spanning six observation regimes, our method consistently improves stability and accuracy over state-of-the-art alternatives, with similar improvements observed in a larger-scale GenCast study.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds an amortized offline controller to pretrained ARDMs for faster data assimilation in chaotic PDEs, with claimed speedups that rest on untested stability assumptions under sparse observations.

read the letter

The main thing to know is that this work trains a separate controller offline to steer trajectories from a fixed ARDM during data assimilation. At inference the whole process becomes a single feed-forward rollout with small on-the-fly corrections instead of repeated optimization or heavy test-time guidance. That combination looks like a distinct move for ARDMs in this setting rather than a routine extension of existing diffusion control ideas. They motivate it from a stochastic optimal control framing and keep the injected controls small so the base dynamics stay largely intact. The experiments report consistent gains in stability and accuracy on two standard PDEs plus a compact ERA5 pilot across six observation regimes, plus a larger GenCast check, and they highlight an order-of-magnitude inference speedup over strong diffusion baselines. That practical angle on reusable guidance without retraining the diffusion model is the clearest positive. The soft spot is the assumption that those small controls will reliably prevent exponential drift in chaotic spatiotemporal systems when observations are sparse. The abstract gives no numbers on actual control magnitudes, no ablations at extreme sparsity, and no explicit checks against error amplification, so it is not yet clear how far the robustness claim travels. If the full paper has control-size histograms or failure-case analysis, that would tighten things up; otherwise the central performance claims rest on thinner evidence than the speedups suggest. This is aimed at people working on generative models for forecasting or assimilation in fluids and climate. Readers who already use ARDMs or similar autoregressive diffusion setups and need faster inference would get the most out of it. The idea is grounded enough and the problem is real enough that it deserves a serious referee rather than a desk reject, even if the experiments will probably need more detail and controls in revision.

Referee Report

2 major / 2 minor

Summary. The paper introduces Control-Augmented Autoregressive Diffusion Models (CA-ARDM) for data assimilation. A pretrained ARDM is augmented with an offline-trained amortized controller that previews future rollouts under a terminal-cost objective to learn small stepwise corrections at each denoising sub-step. Motivated by a stochastic optimal control perspective, the method converts DA into a feed-forward rollout with on-the-fly corrections. Experiments on two canonical PDEs and a compact ERA5 pilot across six observation regimes report order-of-magnitude speedups and consistent gains in stability and accuracy over diffusion-based baselines, with similar trends noted in a GenCast study.

Significance. If the central claims hold, the work offers a promising direction for efficient, amortized guidance in diffusion-based forecasting and assimilation tasks. The offline controller training and small-control injection within pretrained dynamics could reduce the computational burden of test-time guidance while addressing forecast drift in chaotic systems. Reproducible aspects such as the feed-forward inference procedure and the explicit terminal-cost formulation are strengths that facilitate follow-up work.

major comments (2)

[Method (controller design and terminal-cost objective)] The central stability claim rests on the assumption that the learned controller produces only small controls that keep trajectories close to the pretrained ARDM without amplifying drift in chaotic regimes (abstract and method description). No explicit bound on control magnitude, Lyapunov-style analysis, or ablation on observation sparsity thresholds is provided; this is load-bearing for the reported stability and accuracy gains under sparse observations.
[Experimental evaluation (PDE and ERA5 results)] The abstract states consistent improvements and order-of-magnitude speedups across PDEs and six ERA5 observation regimes, yet no specific quantitative metrics, error bars, baseline implementation details, or experimental controls are reported. This weakens assessment of whether post-hoc choices affect the speedup and accuracy claims.

minor comments (2)

[Abstract and conclusion] The GenCast study is mentioned only in passing; a short quantitative summary or pointer to supplementary results would strengthen the broader applicability claim.
[Preliminaries and method] Notation for the amortized policy and the precise form of the terminal cost could be clarified with a short equation or pseudocode block to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Method (controller design and terminal-cost objective)] The central stability claim rests on the assumption that the learned controller produces only small controls that keep trajectories close to the pretrained ARDM without amplifying drift in chaotic regimes (abstract and method description). No explicit bound on control magnitude, Lyapunov-style analysis, or ablation on observation sparsity thresholds is provided; this is load-bearing for the reported stability and accuracy gains under sparse observations.

Authors: We agree that an explicit analysis of control magnitudes and robustness to sparsity would strengthen the stability claims. In the revised manuscript we will add a quantitative study of the L2 norms of the injected controls across all reported experiments, confirming that they remain small relative to the diffusion noise scale. We will also include an ablation varying observation density to identify performance thresholds. While a full Lyapunov analysis is challenging for the underlying stochastic diffusion process, we will expand the discussion of the terminal-cost objective to clarify how it encourages proximity to the pretrained ARDM dynamics. These additions directly address the load-bearing nature of the stability argument. revision: yes
Referee: [Experimental evaluation (PDE and ERA5 results)] The abstract states consistent improvements and order-of-magnitude speedups across PDEs and six ERA5 observation regimes, yet no specific quantitative metrics, error bars, baseline implementation details, or experimental controls are reported. This weakens assessment of whether post-hoc choices affect the speedup and accuracy claims.

Authors: The experimental section of the manuscript already contains tables with mean errors, standard deviations across multiple random seeds, and comparisons against the listed baselines, together with implementation details for the PDE solvers and ERA5 preprocessing. To improve accessibility, we will revise the abstract to cite the specific speedup factors and error reductions (with references to the relevant tables and figures). We will also add a dedicated paragraph in the experimental setup subsection that explicitly lists all post-hoc choices, hyperparameter ranges, and controls used to ensure the reported gains are not artifacts of particular selections. revision: partial

Circularity Check

0 steps flagged

No significant circularity; new controller and evaluations are independent of inputs.

full rationale

The paper introduces an offline-trained controller that augments a pretrained ARDM using preview rollouts and a terminal-cost objective to learn stepwise corrections for data assimilation. This framework is motivated by stochastic optimal control but implements a distinct amortized policy that remains close to the base dynamics while providing on-the-fly guidance. Performance improvements and speedups are demonstrated via direct evaluations on two PDEs and ERA5 across six observation regimes, without any reduction of claims to quantities fitted from the same evaluation data or to self-citations that bear the central load. The derivation chain is self-contained, with the controller training and inference rollout serving as externally verifiable additions rather than tautological restatements of pretrained components.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on a pretrained ARDM providing suitable base dynamics and the new controller being trainable to produce useful corrections; no explicit free parameters or invented physical entities are named in the abstract.

axioms (1)

domain assumption ARDM trajectories admit a stochastic optimal control interpretation that justifies injecting small controls at denoising sub-steps
This view is invoked to motivate the controller design and the terminal-cost objective.

invented entities (1)

amortized controller policy no independent evidence
purpose: Learns reusable stepwise corrections by previewing rollouts to anticipate observations
New component added to the pretrained ARDM; no independent evidence of its existence outside the training procedure is provided.

pith-pipeline@v0.9.0 · 5743 in / 1469 out tokens · 43745 ms · 2026-05-18T08:58:22.588079+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 4 internal anchors

[1]

WIREs Climate Change , author =

doi: https://doi.org/10.1002/wcc.535. URL https://wires.onlinelibrary. wiley.com/doi/abs/10.1002/wcc.535. Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InThe Eleventh International Conference on Learning Representations,

work page doi:10.1002/wcc.535
[2]

Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Kevin Clark, Paul Vicol, Kevin Swersky, and David J Fleet. Directly fine-tuning diffusion models on differentiable rewards.arXiv preprint arXiv:2309.17400,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Evensen, F

doi: https://doi.org/10.1007/978-3-030-96709-3. Hang Fan, Ben Fei, Pierre Gentine, Yi Xiao, Kun Chen, Yubao Liu, Yongquan Qu, Fenghua Ling, and Lei Bai. Physically consistent global atmospheric data assimilation with machine learning in a latent space.arXiv preprint arXiv:2502.02884,

work page doi:10.1007/978-3-030-96709-3
[4]

URL https://agupubs.onlinelibrary.wiley.com/doi/ abs/10.1029/2025WR040078

doi: https:// doi.org/10.1029/2025WR040078. URL https://agupubs.onlinelibrary.wiley.com/doi/ abs/10.1029/2025WR040078. e2025WR040078 2025WR040078. Han Gao, Xu Han, Xiantao Fan, Luning Sun, Li-Ping Liu, Lian Duan, and Jian-Xun Wang. Bayesian conditional diffusion models for versatile spatiotemporal turbulence generation.Computer Methods in Applied Mechanic...

work page doi:10.1029/2025wr040078
[5]

Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter D Dueben, and Torsten Hoefler

URLhttp://dx.doi.org/10.1007/978-3-031-19790-1_7. Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter D Dueben, and Torsten Hoefler. Diffda: a diffusion model for weather-scale data assimilation.arXiv preprint arXiv:2401.05932,

work page doi:10.1007/978-3-031-19790-1_7
[6]

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion.arXiv preprint arXiv:2506.08009,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Kalnay , Atmospheric Modeling, Data Assimilation and Predictability , Cambridge University Press, 2002, https://doi.org/10.1017/CBO9780511802270

doi: https://doi.org/10.1017/CBO9780511802270. Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration models.Advances in Neural Information Processing Systems, 35:23593–23606,

work page doi:10.1017/cbo9780511802270
[8]

URL https: //arxiv.org/abs/2406.16947. 12 Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Mohammad Amin Nabian, Tao Ge, Akshay Subramaniam, Karthik Kashinath, Jan Kautz, and Mike Pritchard. Residual corrective diffusion modeling for km-scale atmospheric downscaling.Communications Earth &; Environme...

work page arXiv
[9]

doi: 10.1038/s43247-025-02042-5

ISSN 2662-4435. doi: 10.1038/s43247-025-02042-5. URLhttp://dx.doi.org/10.1038/s43247-025-02042-5. Kushagra Pandey, Ruihan Yang, and Stephan Mandt. Fast samplers for inverse problems in iterative refinement models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems,

work page doi:10.1038/s43247-025-02042-5
[10]

Kushagra Pandey, Farrin Marouf Sofian, Felix Draxler, Theofanis Karaletsos, and Stephan Mandt

URLhttps://openreview.net/forum?id=qxS4IvtLdD. Kushagra Pandey, Farrin Marouf Sofian, Felix Draxler, Theofanis Karaletsos, and Stephan Mandt. Variational control for guidance in diffusion models.arXiv preprint arXiv:2502.03686,

work page arXiv
[11]

Ashwini Pokle, Matthew J

URL https://arxiv.org/abs/2408.10958. Ashwini Pokle, Matthew J. Muckley, Ricky T. Q. Chen, and Brian Karrer. Training-free linear image inverses via flows.Transactions on Machine Learning Research,

work page arXiv
[12]

Gencast: Diffusion- based ensemble forecasting for medium-range weather

ISSN 2835-8856. URL https://openreview.net/forum?id=PLIt3a4yTm. Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, et al. Gencast: Diffusion- based ensemble forecasting for medium-range weather.arXiv preprint arXiv:2312.15796,

work page arXiv
[13]

A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848, 2025

Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848,

work page arXiv
[14]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502,

work page internal anchor Pith review Pith/arXiv arXiv 2010
[15]

Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review.arXiv preprint arXiv:2407.13734, 2024

URL https://arxiv.org/abs/2407.13734. Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, and Tommaso Biancalani. Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review,

work page arXiv
[16]

Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review

URLhttps://arxiv.org/abs/2501.09685. Francisco Vargas, Will Sussman Grathwohl, and Arnaud Doucet. Denoising diffusion samplers. InThe Eleventh International Conference on Learning Representations,

work page arXiv
[17]

Rui Wang and Rose Yu

URL https: //openreview.net/forum?id=8pvnfTAbu1f. Rui Wang and Rose Yu. Physics-guided deep learning for dynamical systems: A survey.arXiv preprint arXiv:2107.01272,

work page arXiv
[18]

doi: 10.1162/ neco.1989.1.2.270

ISSN 1530-888X. doi: 10.1162/ neco.1989.1.2.270. URLhttp://dx.doi.org/10.1162/neco.1989.1.2.270. Ruihan Yang, Prakhar Srivastava, and Stephan Mandt. Diffusion probabilistic modeling for video generation.Entropy, 25(10):1469,

work page doi:10.1162/neco.1989.1.2.270 1989
[19]

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Lijun Yu, Jos´e Lezama, Nitesh B Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, et al. Language model beats diffusion– tokenizer is key to visual generation.arXiv preprint arXiv:2310.05737,

work page internal anchor Pith review Pith/arXiv arXiv
[20]

Advances in variational inference.IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026,

Cheng Zhang, Judith B ¨utepage, Hedvig Kjellstr ¨om, and Stephan Mandt. Advances in variational inference.IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026,

work page 2008
[21]

C Active Observation Selector We maintain a preview buffer containing all observations fromT that lie within a fixed lookahead horizon Λ from index t0. Each entry in the buffer is represented as a triplet (yj,M j,∆ j), where (i) j∈ T is the physical time index of the observation; (ii) yj is the observed signal, lifted to full resolution when necessary; (i...

work page 2024

[1] [1]

WIREs Climate Change , author =

doi: https://doi.org/10.1002/wcc.535. URL https://wires.onlinelibrary. wiley.com/doi/abs/10.1002/wcc.535. Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. Diffusion posterior sampling for general noisy inverse problems. InThe Eleventh International Conference on Learning Representations,

work page doi:10.1002/wcc.535

[2] [2]

Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Kevin Clark, Paul Vicol, Kevin Swersky, and David J Fleet. Directly fine-tuning diffusion models on differentiable rewards.arXiv preprint arXiv:2309.17400,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Evensen, F

doi: https://doi.org/10.1007/978-3-030-96709-3. Hang Fan, Ben Fei, Pierre Gentine, Yi Xiao, Kun Chen, Yubao Liu, Yongquan Qu, Fenghua Ling, and Lei Bai. Physically consistent global atmospheric data assimilation with machine learning in a latent space.arXiv preprint arXiv:2502.02884,

work page doi:10.1007/978-3-030-96709-3

[4] [4]

URL https://agupubs.onlinelibrary.wiley.com/doi/ abs/10.1029/2025WR040078

doi: https:// doi.org/10.1029/2025WR040078. URL https://agupubs.onlinelibrary.wiley.com/doi/ abs/10.1029/2025WR040078. e2025WR040078 2025WR040078. Han Gao, Xu Han, Xiantao Fan, Luning Sun, Li-Ping Liu, Lian Duan, and Jian-Xun Wang. Bayesian conditional diffusion models for versatile spatiotemporal turbulence generation.Computer Methods in Applied Mechanic...

work page doi:10.1029/2025wr040078

[5] [5]

Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter D Dueben, and Torsten Hoefler

URLhttp://dx.doi.org/10.1007/978-3-031-19790-1_7. Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter D Dueben, and Torsten Hoefler. Diffda: a diffusion model for weather-scale data assimilation.arXiv preprint arXiv:2401.05932,

work page doi:10.1007/978-3-031-19790-1_7

[6] [6]

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion.arXiv preprint arXiv:2506.08009,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Kalnay , Atmospheric Modeling, Data Assimilation and Predictability , Cambridge University Press, 2002, https://doi.org/10.1017/CBO9780511802270

doi: https://doi.org/10.1017/CBO9780511802270. Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. Denoising diffusion restoration models.Advances in Neural Information Processing Systems, 35:23593–23606,

work page doi:10.1017/cbo9780511802270

[8] [8]

URL https: //arxiv.org/abs/2406.16947. 12 Morteza Mardani, Noah Brenowitz, Yair Cohen, Jaideep Pathak, Chieh-Yu Chen, Cheng-Chin Liu, Arash Vahdat, Mohammad Amin Nabian, Tao Ge, Akshay Subramaniam, Karthik Kashinath, Jan Kautz, and Mike Pritchard. Residual corrective diffusion modeling for km-scale atmospheric downscaling.Communications Earth &; Environme...

work page arXiv

[9] [9]

doi: 10.1038/s43247-025-02042-5

ISSN 2662-4435. doi: 10.1038/s43247-025-02042-5. URLhttp://dx.doi.org/10.1038/s43247-025-02042-5. Kushagra Pandey, Ruihan Yang, and Stephan Mandt. Fast samplers for inverse problems in iterative refinement models. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems,

work page doi:10.1038/s43247-025-02042-5

[10] [10]

Kushagra Pandey, Farrin Marouf Sofian, Felix Draxler, Theofanis Karaletsos, and Stephan Mandt

URLhttps://openreview.net/forum?id=qxS4IvtLdD. Kushagra Pandey, Farrin Marouf Sofian, Felix Draxler, Theofanis Karaletsos, and Stephan Mandt. Variational control for guidance in diffusion models.arXiv preprint arXiv:2502.03686,

work page arXiv

[11] [11]

Ashwini Pokle, Matthew J

URL https://arxiv.org/abs/2408.10958. Ashwini Pokle, Matthew J. Muckley, Ricky T. Q. Chen, and Brian Karrer. Training-free linear image inverses via flows.Transactions on Machine Learning Research,

work page arXiv

[12] [12]

Gencast: Diffusion- based ensemble forecasting for medium-range weather

ISSN 2835-8856. URL https://openreview.net/forum?id=PLIt3a4yTm. Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, et al. Gencast: Diffusion- based ensemble forecasting for medium-range weather.arXiv preprint arXiv:2312.15796,

work page arXiv

[13] [13]

A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848, 2025

Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848,

work page arXiv

[14] [14]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502,

work page internal anchor Pith review Pith/arXiv arXiv 2010

[15] [15]

Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review.arXiv preprint arXiv:2407.13734, 2024

URL https://arxiv.org/abs/2407.13734. Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, and Tommaso Biancalani. Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review,

work page arXiv

[16] [16]

Inference-time alignment in diffusion models with reward-guided generation: Tutorial and review

URLhttps://arxiv.org/abs/2501.09685. Francisco Vargas, Will Sussman Grathwohl, and Arnaud Doucet. Denoising diffusion samplers. InThe Eleventh International Conference on Learning Representations,

work page arXiv

[17] [17]

Rui Wang and Rose Yu

URL https: //openreview.net/forum?id=8pvnfTAbu1f. Rui Wang and Rose Yu. Physics-guided deep learning for dynamical systems: A survey.arXiv preprint arXiv:2107.01272,

work page arXiv

[18] [18]

doi: 10.1162/ neco.1989.1.2.270

ISSN 1530-888X. doi: 10.1162/ neco.1989.1.2.270. URLhttp://dx.doi.org/10.1162/neco.1989.1.2.270. Ruihan Yang, Prakhar Srivastava, and Stephan Mandt. Diffusion probabilistic modeling for video generation.Entropy, 25(10):1469,

work page doi:10.1162/neco.1989.1.2.270 1989

[19] [19]

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Lijun Yu, Jos´e Lezama, Nitesh B Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, et al. Language model beats diffusion– tokenizer is key to visual generation.arXiv preprint arXiv:2310.05737,

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

Advances in variational inference.IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026,

Cheng Zhang, Judith B ¨utepage, Hedvig Kjellstr ¨om, and Stephan Mandt. Advances in variational inference.IEEE transactions on pattern analysis and machine intelligence, 41(8):2008–2026,

work page 2008

[21] [21]

C Active Observation Selector We maintain a preview buffer containing all observations fromT that lie within a fixed lookahead horizon Λ from index t0. Each entry in the buffer is represented as a triplet (yj,M j,∆ j), where (i) j∈ T is the physical time index of the observation; (ii) yj is the observed signal, lifted to full resolution when necessary; (i...

work page 2024