arxiv: 2604.27443 · v2 · submitted 2026-04-30 · 💻 cs.LG · cs.AI

Recognition: unknown

ABC: Any-Subset Autoregression via Non-Markovian Diffusion Bridges in Continuous Time and Space

Gabe Guo , Thanawat Sornwanee , Lutong Hao , Elon Litman , Stefano Ermon , Jose Blanchet

Authors on Pith no claims yet

Pith reviewed 2026-05-07 09:24 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords continuous-time stochastic processesdiffusion bridgesautoregressive generationscore matchingvideo generationweather forecastingnon-Markovian models

0 comments

The pith

A single continual SDE with non-Markovian diffusion bridges generates continuous-time processes conditioned on any arbitrary subset of states.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes ABC as a way to generate continuous-time continuous-space stochastic processes such as videos or weather sequences while conditioning on any chosen subset of past or future observations. Standard diffusion models start generation from random noise and apply time-insensitive noise, which fails to respect similarity between nearby states and restricts conditioning flexibility. ABC instead evolves one SDE whose time and state variables follow physical time, so generation begins from the previous close state. Dynamics are obtained by applying changes of measure on the full path space, which directly supports path-dependent conditioning on irregular or arbitrary subsets. The resulting model is learned by extending denoising score matching to depend on both path and time, and experiments indicate better results than prior methods on video and weather tasks.

Core claim

We propose ABC: Any-Subset Autoregressive Models via Non-Markovian Diffusion Bridges in Continuous Time and Space. We model the process with one continual SDE whose time variable and intermediate states track the real time and process states. This has provable advantages: the starting point for generating future states is the already-close previous state rather than uninformative noise; random noise injection scales with physical time elapsed, encouraging physically plausible dynamics with similar time-adjacent states; and path-dependent conditioning on arbitrary subsets of the state history and/or future is possible. We derive SDE dynamics via changes-of-measure on path space and learn them

What carries the argument

Non-Markovian diffusion bridges obtained via changes-of-measure on path space, which define the drift and diffusion of a single continuous SDE that tracks physical time and states to support arbitrary conditioning.

If this is right

Generation of future states begins from the already-close previous state instead of uninformative noise.
Random noise injection automatically scales with physical time elapsed between states.
Conditioning is supported on arbitrary subsets of history or future observations through path-dependent terms.
The approach yields superior empirical performance on video generation and weather forecasting relative to prior diffusion methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The path-space construction could be adapted to enforce additional physical constraints in scientific simulation domains.
Irregularly sampled sensor data from robotics or environmental monitoring would provide a direct test of the arbitrary-subset conditioning.
Long-horizon forecasting stability may improve because each step reuses the actual preceding state rather than restarting from noise.

Load-bearing premise

The SDE dynamics derived via changes-of-measure on path space can be learned effectively by the proposed path- and time-dependent extension of denoising score matching.

What would settle it

Generate sequences from the learned SDE on a simple process such as Brownian motion with known closed-form conditional distributions and verify whether samples conditioned on arbitrary subsets match the true conditional statistics; mismatch on time-adjacent similarity or future conditioning would refute the claimed advantages.

Figures

Figures reproduced from arXiv: 2604.27443 by Elon Litman, Gabe Guo, Jose Blanchet, Lutong Hao, Stefano Ermon, Thanawat Sornwanee.

**Figure 1.** Figure 1: Overview. many issues, e.g., poor structural preservation, stiff SDE integration [71]; also, the jump from noise to data is a physically implausible transition. Section 6 shows this gives incoherent generations. Lack of Time-Scaled Volatility: As a partial fix, diffusion bridges and stochastic interpolants [71, 65, 38, 34, 37, 46, 47, 9] define generative SDEs whose endpoints are the states we transition b… view at source ↗

**Figure 2.** Figure 2: Insufficiency of Autoregressive Conditional Diffusion Bridges: We empirically verify Theorem 3 on Brownian motion pinned at B(4/5) = −1, B(1) = B(0) = +1. Chained diffusion bridges Y (t), while capturing the finite-dimensional marginals, have the wrong quadratic variation. Modeling the process as one continual SDE correctly captures the dynamics. See Section K.1. settings of the conditional diffusion bridg… view at source ↗

**Figure 3.** Figure 3: Comparison of Methods: ABC outperforms conditional diffusion bridges (with equal volatility coefficient) and noise-to-data diffusion models. Generated with 250 steps, and conditioning every eight (plus final) frames, although the model can overwrite prompts (to assess adherence). Zoom in, or visit https://abc-diffusion.github.io/. architectures (e.g., SSMs [20]) that selectively compress the history. Futur… view at source ↗

**Figure 4.** Figure 4: We can consider the case where the data process is not Markov. This can happen for several view at source ↗

**Figure 5.** Figure 5: Pareto Front for Sky-Timelapse Causal Generation: ABC is Pareto-optimal across FVD and FID scores for causally-ordered generation on Sky-Timelapse. Corresponds to results in Tables 16, 15. Horizontal axis is the ordering of average rank in view at source ↗

**Figure 6.** Figure 6: SEVIR qualitative comparison. ABC Non-Causal, Conditional Diffusion Bridge, and Noise-to-Data Diffusion are evaluated under the pinned non-causal protocol, while ABC Causal is shown separately under the stricter past-only protocol. Individual examples may favor either protocol visually, but the aggregate metrics in Tables 3 and 20 favor ABC Non-Causal among the pinned methods. 39 view at source ↗

read the original abstract

Generating continuous-time, continuous-space stochastic processes (e.g., videos, weather forecasts) conditioned on partial observations (e.g., first and last frames) is a fundamental challenge. Existing approaches, (e.g., diffusion models), suffer from key limitations: (1) noise-to-data evolution fails to capture structural similarity between states close in physical time and has unstable integration in low-step regimes; (2) random noise injected is insensitive to the physical process's time elapsed, resulting in incorrect dynamics; (3) they overlook conditioning on arbitrary subsets of states (e.g., irregularly sampled timesteps, future observations). We propose ABC: Any-Subset Autoregressive Models via Non-Markovian Diffusion Bridges in Continuous Time and Space. Crucially, we model the process with one continual SDE whose time variable and intermediate states track the real time and process states. This has provable advantages: (1) the starting point for generating future states is the already-close previous state, rather than uninformative noise; (2) random noise injection scales with physical time elapsed, encouraging physically plausible dynamics with similar time-adjacent states. We derive SDE dynamics via changes-of-measure on path space, yielding another advantage: (3) path-dependent conditioning on arbitrary subsets of the state history and/or future. To learn these dynamics, we derive a path- and time-dependent extension of denoising score matching. Our experiments show ABC's superiority to competing methods on multiple domains, including video generation and weather forecasting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The non-Markovian bridge via path-space change of measure is the real move here, but whether the path-and-time-dependent score matching actually recovers the derived SDE is the part that still needs checking.

read the letter

ABC's main contribution is a single continual SDE that tracks real time and states, then uses a Girsanov-style change of measure on path space to get non-Markovian bridges. This directly gives the three listed advantages: generation starts from the previous state instead of noise, noise scales with elapsed physical time, and conditioning works on arbitrary subsets of history or future observations. The framing of the problem and the contrast with standard diffusion models is clear and useful. Extending denoising score matching to be path- and time-dependent follows logically from the construction and is a reasonable training target. The abstract also reports better results than baselines on video and weather tasks, which would follow if the advantages are realized in practice. The soft spot is the training step. Standard score matching targets marginal scores of Markov diffusions; here the target is a conditional score under the changed path measure. If the loss misses the Radon-Nikodym factors or extra drift corrections that arise from the non-Markovian bridge, the learned vector field will not match the derived SDE and the claimed benefits will not appear. The abstract does not show the explicit loss or any diagnostic that the trained model respects the theoretical dynamics, so it is hard to know how much of the reported gains come from the new construction versus implementation details. This paper is for people working on generative models for continuous spatiotemporal processes who care about irregular conditioning and time-aware dynamics. A reader who wants a principled alternative to noise-to-data diffusion will find the setup worth examining. It deserves peer review because the idea is distinct from existing work, the derivation direction is sound, and the empirical claims are concrete enough to test even if the current write-up leaves the learning details open.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes ABC: Any-Subset Autoregressive Models via Non-Markovian Diffusion Bridges in Continuous Time and Space. It models continuous-time continuous-space processes with a single continual SDE whose time and states track physical time, deriving the dynamics via Girsanov-style changes of measure on path space to enable conditioning on arbitrary subsets of history or future observations. This yields three claimed advantages: generation starts from the previous state rather than noise, noise scales with elapsed physical time, and path-dependent conditioning is supported. Training uses a derived path- and time-dependent extension of denoising score matching, with experiments claiming superiority on video generation and weather forecasting tasks.

Significance. If the path-space derivation is correct and the extended DSM loss recovers the proper conditional scores under the changed measure, the framework offers a principled route to temporally coherent, flexibly conditioned generation that addresses limitations of standard diffusion models. The change-of-measure construction is a clear strength, directly supporting the three listed advantages without ad-hoc parameter fitting. However, the significance is tempered by the need to confirm that the learning procedure implements the full Radon-Nikodym and drift corrections; absent that, the empirical gains cannot be attributed to the theoretical construction.

major comments (2)

[§4] §4 (Learning the Dynamics), Eq. (12) or equivalent: the path- and time-dependent extension of denoising score matching is presented as the training objective, yet the derivation does not explicitly include the drift correction or the Radon-Nikodym derivative term that arises from the Girsanov change of measure on path space for non-Markovian bridges. Without these terms, the optimized vector field will not correspond to the derived SDE, undermining advantages (1)–(3) and the reported experimental superiority. Please supply the full loss expression and a proof sketch showing it targets the correct conditional score for arbitrary (including future) conditioning sets.
[§3] §3 (SDE Derivation): while the change-of-measure argument on path space is conceptually sound for obtaining the non-Markovian bridge, the manuscript does not verify that the resulting SDE remains well-defined and simulable when the conditioning set includes future observations. A concrete check (e.g., existence of the Radon-Nikodym derivative under the stated regularity assumptions) is required before the “provable advantages” can be asserted.

minor comments (2)

[Figure 2] Figure 2 and associated caption: the visualization of the time-dependent noise schedule is helpful but the axis labels do not indicate whether the plotted variance corresponds to physical time or diffusion time; clarify this distinction.
[Abstract] The abstract states “provable advantages” but the proofs appear only in the main text; consider adding a short theorem statement or corollary in the abstract to make the claims self-contained for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive review of our manuscript. The comments raise important points about the explicitness of the theoretical derivations, which we will address by expanding the relevant sections in the revised version. Below we respond point by point to the major comments.

read point-by-point responses

Referee: [§4] §4 (Learning the Dynamics), Eq. (12) or equivalent: the path- and time-dependent extension of denoising score matching is presented as the training objective, yet the derivation does not explicitly include the drift correction or the Radon-Nikodym derivative term that arises from the Girsanov change of measure on path space for non-Markovian bridges. Without these terms, the optimized vector field will not correspond to the derived SDE, undermining advantages (1)–(3) and the reported experimental superiority. Please supply the full loss expression and a proof sketch showing it targets the correct conditional score for arbitrary (including future) conditioning sets.

Authors: We agree that the current presentation of the loss in Section 4 would benefit from greater explicitness concerning the Girsanov-derived terms. In the revised manuscript we will supply the complete path- and time-dependent denoising score matching objective that incorporates both the drift correction and the Radon-Nikodym derivative. We will also add a concise proof sketch establishing that the resulting objective recovers the correct conditional score for the non-Markovian bridge under arbitrary conditioning sets, including those that involve future observations. This addition will make the link between the change-of-measure construction and the training procedure fully transparent. revision: yes
Referee: [§3] §3 (SDE Derivation): while the change-of-measure argument on path space is conceptually sound for obtaining the non-Markovian bridge, the manuscript does not verify that the resulting SDE remains well-defined and simulable when the conditioning set includes future observations. A concrete check (e.g., existence of the Radon-Nikodym derivative under the stated regularity assumptions) is required before the “provable advantages” can be asserted.

Authors: We acknowledge that an explicit verification of well-definedness for conditioning sets containing future observations would strengthen the exposition. Under the regularity assumptions already stated in the manuscript (Lipschitz continuity and linear growth of the drift and diffusion coefficients), standard Girsanov theorems on path space guarantee the existence of the Radon-Nikodym derivative. In the revision we will insert a short dedicated paragraph (or appendix subsection) that recalls the relevant theorem, confirms the derivative exists for future conditioning, and briefly discusses simulability of the resulting SDE. This will substantiate that the claimed advantages hold in the general case. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on independent path-space measure change and standard score-matching extension

full rationale

The paper derives the target SDE via Girsanov-style change of measure on path space, a standard result independent of the final model or experiments. It then presents a path- and time-dependent extension of denoising score matching as a derived learning objective for that SDE. No equation reduces to a fitted parameter renamed as prediction, no self-citation supplies a uniqueness theorem or ansatz that the current work merely renames, and no self-definitional loop appears where the claimed advantages are presupposed in the inputs. The three listed advantages follow directly from the conditioned SDE construction rather than from data fitting or prior self-referential results. Experiments are presented as validation, not as definitional inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review based on abstract only; full details unavailable. The approach builds on standard SDE and diffusion frameworks but introduces new modeling elements without specifying additional free parameters or ad-hoc assumptions.

axioms (1)

standard math Standard theory of stochastic differential equations and changes of measure on path space
Invoked to derive the SDE dynamics for the non-Markovian diffusion bridges.

invented entities (1)

Non-Markovian Diffusion Bridge no independent evidence
purpose: To model the process with one continual SDE that tracks real time and process states for any-subset conditioning
Core new concept introduced to address limitations of standard diffusion models.

pith-pipeline@v0.9.0 · 5592 in / 1484 out tokens · 87145 ms · 2026-05-07T09:24:08.738638+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

80 extracted references · 27 canonical work pages · 8 internal anchors

[1]

Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. “Stochastic interpolants: A unifying framework for flows and diffusions”. In:arXiv preprint arXiv:2303.08797(2023)

work page internal anchor Pith review arXiv 2023
[2]

Robust time series generation via Schr\

Alexandre Alouadi et al. “Robust time series generation via Schr\" odinger Bridge: a compre- hensive evaluation”. In:arXiv preprint arXiv:2503.02943(2025)

work page arXiv 2025
[3]

Reverse-time diffusion equation models

Brian DO Anderson. “Reverse-time diffusion equation models”. In:Stochastic Processes and their Applications12.3 (1982), pp. 313–326

1982
[4]

Christopher M Bishop and Nasser M Nasrabadi.Pattern recognition and machine learning. V ol. 4. 4. Springer, 2006

2006
[5]

Theodore Allen Burton.Volterra integral and differential equations. V ol. 202. Elsevier, 2005

2005
[6]

Mixture of contexts for long video generation

Shengqu Cai et al. “Mixture of contexts for long video generation”. In:arXiv preprint arXiv:2508.21058(2025)

work page arXiv 2025
[7]

Diffusion forcing: Next-token prediction meets full-sequence diffusion

Boyuan Chen et al. “Diffusion forcing: Next-token prediction meets full-sequence diffusion”. In:Advances in Neural Information Processing Systems37 (2024), pp. 24081–24125

2024
[8]

Deep momentum multi-marginal Schrödinger bridge

Tianrong Chen et al. “Deep momentum multi-marginal Schrödinger bridge”. In:Advances in Neural Information Processing Systems36 (2023), pp. 57058–57086

2023
[9]

Probabilistic forecasting with stochastic interpolants and f\

Yifan Chen et al. “Probabilistic forecasting with stochastic interpolants and f\" ollmer pro- cesses”. In:arXiv preprint arXiv:2403.13724(2024)

work page arXiv 2024
[10]

Provably convergent schrödinger bridge with applications to probabilistic time series imputation

Yu Chen et al. “Provably convergent schrödinger bridge with applications to probabilistic time series imputation”. In:International Conference on Machine Learning. PMLR. 2023, pp. 4485–4513

2023
[11]

Variational inference for SDEs driven by fractional noise

Rembert Daems et al. “Variational inference for SDEs driven by fractional noise”. In:arXiv preprint arXiv:2310.12975(2023)

work page arXiv 2023
[12]

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Tri Dao. “Flashattention-2: Faster attention with better parallelism and work partitioning”. In: arXiv preprint arXiv:2307.08691(2023)

work page internal anchor Pith review arXiv 2023
[13]

Flashattention: Fast and memory-efficient exact attention with io-awareness

Tri Dao et al. “Flashattention: Fast and memory-efficient exact attention with io-awareness”. In:Advances in neural information processing systems35 (2022), pp. 16344–16359

2022
[14]

Quelques applications de la formule de changement de variables pour les semimartingales

Catherine Doléans-Dade. “Quelques applications de la formule de changement de variables pour les semimartingales”. In:Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebi- ete16.3 (1970), pp. 181–194

1970
[15]

Joseph L Doob and JI Doob.Classical potential theory and its probabilistic counterpart. V ol. 262. Springer, 1984

1984
[16]

Soft-constrained Schrödinger Bridge: a Stochastic Control Approach

Jhanvi Garg, Xianyang Zhang, and Quan Zhou. “Soft-constrained Schrödinger Bridge: a Stochastic Control Approach”. In:International Conference on Artificial Intelligence and Statistics. PMLR. 2024, pp. 4429–4437

2024
[17]

On the content bias in fréchet video distance

Songwei Ge et al. “On the content bias in fréchet video distance”. In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024, pp. 7277–7288. 10

2024
[18]

On transforming a certain class of stochastic processes by absolutely continuous substitution of measures

Igor Vladimirovich Girsanov. “On transforming a certain class of stochastic processes by absolutely continuous substitution of measures”. In:Theory of Probability & Its Applications 5.3 (1960), pp. 285–301

1960
[19]

Understanding the difficulty of training deep feedforward neural networks

Xavier Glorot and Yoshua Bengio. “Understanding the difficulty of training deep feedforward neural networks”. In:Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings. 2010, pp. 249–256

2010
[20]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. “Mamba: Linear-time sequence modeling with selective state spaces”. In:arXiv preprint arXiv:2312.00752(2023)

work page internal anchor Pith review arXiv 2023
[21]

Self-Speculative Decoding Accelerates Lossless Inference in Any-Order and Any-Subset Autoregressive Models

Gabe Guo and Stefano Ermon. “Self-Speculative Decoding Accelerates Lossless Inference in Any-Order and Any-Subset Autoregressive Models”. In:The Fourteenth International Conference on Learning Representations. 2026.URL: https://openreview.net/forum? id=hZnibTOke7

2026
[22]

Generative modeling for time series via Schr{\

Mohamed Hamdouche, Pierre Henry-Labordere, and Huyên Pham. “Generative modeling for time series via Schr{\" o}dinger bridge”. In:arXiv preprint arXiv:2304.05093(2023)

work page arXiv 2023
[23]

Gans trained by a two time-scale update rule converge to a local nash equilibrium

Martin Heusel et al. “Gans trained by a two time-scale update rule converge to a local nash equilibrium”. In:Advances in neural information processing systems30 (2017)

2017
[24]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. “Denoising diffusion probabilistic models”. In: Advances in neural information processing systems33 (2020), pp. 6840–6851

2020
[25]

Cascaded diffusion models for high fidelity image generation

Jonathan Ho et al. “Cascaded diffusion models for high fidelity image generation”. In:Journal of Machine Learning Research23.47 (2022), pp. 1–33

2022
[26]

Video diffusion models

Jonathan Ho et al. “Video diffusion models”. In:Advances in neural information processing systems35 (2022), pp. 8633–8646

2022
[27]

Trajectory inference with smooth schr\

Wanli Hong, Yuliang Shi, and Jonathan Niles-Weed. “Trajectory inference with smooth schr\" odinger bridges”. In:arXiv preprint arXiv:2503.00530(2025)

work page arXiv 2025
[28]

Longitudinal Flow Matching for Trajectory Modeling

Mohammad Mohaiminul Islam et al. “Longitudinal Flow Matching for Trajectory Modeling”. In:arXiv preprint arXiv:2510.03569(2025)

work page arXiv 2025
[29]

109. stochastic integral

Kiyosi Itô. “109. stochastic integral”. In:Proceedings of the Imperial Academy20.8 (1944), pp. 519–524

1944
[30]

Kiyosi Ito et al.On stochastic differential equations. V ol. 4. American Mathematical Society New York, 1951

1951
[31]

springer, 2014

Ioannis Karatzas and Steven Shreve.Brownian motion and stochastic calculus. springer, 2014

2014
[32]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In:arXiv preprint arXiv:1412.6980(2014)

work page internal anchor Pith review arXiv 2014
[33]

Multi-Marginal Flow Matching with Adversarially Learnt Interpolants

Oskar Kviman et al. “Multi-Marginal Flow Matching with Adversarially Learnt Interpolants”. In:arXiv preprint arXiv:2510.01159(2025)

work page arXiv 2025
[34]

Bbdm: Image-to-image translation with brownian bridge diffusion models

Bo Li et al. “Bbdm: Image-to-image translation with brownian bridge diffusion models”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern Recognition. 2023, pp. 1952–1961

2023
[35]

Autoregressive image generation without vector quantization

Tianhong Li et al. “Autoregressive image generation without vector quantization”. In:Advances in Neural Information Processing Systems37 (2024), pp. 56424–56445

2024
[36]

Flow Matching for Generative Modeling

Yaron Lipman et al. “Flow matching for generative modeling”. In:arXiv preprint arXiv:2210.02747(2022)

work page internal anchor Pith review arXiv 2022
[37]

Let us build bridges: Understanding and extending diffusion generative models

Xingchao Liu et al. “Let us build bridges: Understanding and extending diffusion generative models”. In:arXiv preprint arXiv:2208.14699(2022)

work page arXiv 2022
[38]

Frame interpolation with consecutive brownian bridge diffusion

Zonglin Lyu et al. “Frame interpolation with consecutive brownian bridge diffusion”. In: Proceedings of the 32nd ACM International Conference on Multimedia. 2024, pp. 3449–3458

2024
[39]

Latte: Latent Diffusion Transformer for Video Generation

Xin Ma et al. “Latte: Latent diffusion transformer for video generation”. In:arXiv preprint arXiv:2401.03048(2024)

work page internal anchor Pith review arXiv 2024
[40]

Sur une généralisation des intégrales de MJ Radon

Otton Nikodym. “Sur une généralisation des intégrales de MJ Radon”. In:Fundamenta Mathematicae15.1 (1930), pp. 131–179

1930
[41]

Fractional Diffusion Bridge Models

Gabriel Nobis et al. “Fractional Diffusion Bridge Models”. In:arXiv preprint arXiv:2511.01795 (2025)

work page arXiv 2025
[42]

Generative fractional diffusion models

Gabriel Nobis et al. “Generative fractional diffusion models”. In:Advances in Neural Informa- tion Processing Systems37 (2024), pp. 25469–25509. 11

2024
[43]

On an identity for stochastic integrals

Aleksandr Aleksandrovich Novikov. “On an identity for stochastic integrals”. In:Teoriya Veroyatnostei i ee Primeneniya17.4 (1972), pp. 761–765

1972
[44]

Stochastic differential equations

Bernt Øksendal. “Stochastic differential equations”. In:Stochastic differential equations: an introduction with applications. Springer, 2003, pp. 38–50

2003
[45]

Scalable diffusion models with transformers

William Peebles and Saining Xie. “Scalable diffusion models with transformers”. In:Pro- ceedings of the IEEE/CVF international conference on computer vision. 2023, pp. 4195– 4205

2023
[46]

Diffusion bridge mixture transports, Schrödinger bridge problems and generative modeling

Stefano Peluchetti. “Diffusion bridge mixture transports, Schrödinger bridge problems and generative modeling”. In:Journal of Machine Learning Research24.374 (2023), pp. 1–51

2023
[47]

Peluchetti

Stefano Peluchetti. “Non-denoising forward-time diffusions”. In:arXiv preprint arXiv:2312.14589(2023)

work page arXiv 2023
[48]

Peter Potaptchik et al.Meta Flow Maps enable scalable reward alignment. 2026. arXiv: 2601.14430 [stat.ML].URL:https://arxiv.org/abs/2601.14430

work page arXiv 2026
[49]

Hölder, 1913

Johann Radon.Theorie und Anwendungen der absolut additiven Mengenfunktionen. Hölder, 1913

1913
[50]

Fokker-planck equation

Hannes Risken. “Fokker-planck equation”. In:The Fokker-Planck equation: methods of solu- tion and applications. Springer, 1989, pp. 63–95

1989
[51]

High-resolution image synthesis with latent diffusion models

Robin Rombach et al. “High-resolution image synthesis with latent diffusion models”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022, pp. 10684–10695

2022
[52]

Training and inference on any-order autore- gressive models the right way

Andy Shih, Dorsa Sadigh, and Stefano Ermon. “Training and inference on any-order autore- gressive models the right way”. In:Advances in Neural Information Processing Systems35 (2022), pp. 2762–2775

2022
[53]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. “Generative modeling by estimating gradients of the data distribution”. In:Advances in neural information processing systems32 (2019)

2019
[54]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song et al. “Score-based generative modeling through stochastic differential equations”. In:arXiv preprint arXiv:2011.13456(2020)

work page internal anchor Pith review arXiv 2011
[55]

J Michael Steele.Stochastic calculus and financial applications. V ol. 1. Springer, 2001

2001
[56]

Ar-diffusion: Asynchronous video generation with auto-regressive diffusion

Mingzhen Sun et al. “Ar-diffusion: Asynchronous video generation with auto-regressive diffusion”. In:Proceedings of the Computer Vision and Pattern Recognition Conference. 2025, pp. 7364–7373

2025
[57]

Rethinking the inception architecture for computer vision

Christian Szegedy et al. “Rethinking the inception architecture for computer vision”. In:Pro- ceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 2818– 2826

2016
[58]

Momentum Multi-Marginal Schr\

Panagiotis Theodoropoulos et al. “Momentum Multi-Marginal Schr\" odinger Bridge Match- ing”. In:arXiv preprint arXiv:2506.10168(2025)

work page arXiv 2025
[59]

arXiv preprint arXiv:2307.03672 , year=

Alexander Tong et al. “Simulation-free schr\" odinger bridges via score and flow matching”. In:arXiv preprint arXiv:2307.03672(2023)

work page arXiv 2023
[60]

Videomae: Masked autoencoders are data-efficient learners for self- supervised video pre-training

Zhan Tong et al. “Videomae: Masked autoencoders are data-efficient learners for self- supervised video pre-training”. In:Advances in neural information processing systems35 (2022), pp. 10078–10093

2022
[61]

Towards Accurate Generative Models of Video: A New Metric & Challenges

Thomas Unterthiner et al. “Towards accurate generative models of video: A new metric & challenges”. In:arXiv preprint arXiv:1812.01717(2018)

work page internal anchor Pith review arXiv 2018
[62]

Sevir: A storm event imagery dataset for deep learning applications in radar and satellite meteorology

Mark Veillette, Siddharth Samsi, and Chris Mattioli. “Sevir: A storm event imagery dataset for deep learning applications in radar and satellite meteorology”. In:Advances in neural information processing systems33 (2020), pp. 22009–22019

2020
[63]

A connection between score matching and denoising autoencoders

Pascal Vincent. “A connection between score matching and denoising autoencoders”. In: Neural computation23.7 (2011), pp. 1661–1674

2011
[64]

Videomae v2: Scaling video masked autoencoders with dual masking

Limin Wang et al. “Videomae v2: Scaling video masked autoencoders with dual masking”. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023, pp. 14549–14560

2023
[65]

Framebridge: Improving image-to-video generation with bridge models

Yuji Wang et al. “Framebridge: Improving image-to-video generation with bridge models”. In: arXiv preprint arXiv:2410.15371(2024)

work page arXiv 2024
[66]

Christopher KI Williams and Carl Edward Rasmussen.Gaussian processes for machine learning. V ol. 2. 3. MIT press Cambridge, MA, 2006. 12

2006
[67]

Root mean square layer normalization

Biao Zhang and Rico Sennrich. “Root mean square layer normalization”. In:Advances in neural information processing systems32 (2019)

2019
[68]

Dtvnet: Dynamic time-lapse video generation via single still image

Jiangning Zhang et al. “Dtvnet: Dynamic time-lapse video generation via single still image”. In:European conference on computer vision. Springer. 2020, pp. 300–315

2020
[69]

Packing input frame context in next-frame prediction models for video generation

Lvmin Zhang et al. “Frame context packing and drift prevention in next-frame-prediction video diffusion models”. In:arXiv preprint arXiv:2504.12626(2025)

work page arXiv 2025
[70]

Pretraining Frame Preservation in Autoregressive Video Memory Com- pression

Lvmin Zhang et al. “Pretraining Frame Preservation in Autoregressive Video Memory Com- pression”. In:arXiv preprint arXiv:2512.23851(2025)

work page arXiv 2025
[71]

Denoising diffusion bridge models.arXiv preprint arXiv:2309.16948, 2023

Linqi Zhou et al. “Denoising diffusion bridge models”. In:arXiv preprint arXiv:2309.16948 (2023)

work page arXiv 2023
[72]

CelebV-HQ: A large-scale video facial attributes dataset

Hao Zhu et al. “CelebV-HQ: A large-scale video facial attributes dataset”. In:European conference on computer vision. Springer. 2022, pp. 650–667. 13 A Data-Generating SDE via Doob h-Transform In this section we prove Theorem 1 and establish the regularity conditions. Let 0 =t 0 < t 1 <· · ·< t L be fixed physical times and x0 ∈R d an initial condition. T...

2022
[73]

This issimilar to PFI’s [9] conditioning strategy, except that they omitted the initial condition

Monte Carlo estimate(single waypoint, uniformly sampled) of the intermediate path history, along with times 0 and ti∗ (that is, initial condition and most recent waypoint observation), making for a total of three conditioning points. This issimilar to PFI’s [9] conditioning strategy, except that they omitted the initial condition
[74]

This issimilar to LFM’s [28] Markovian conditioning strategy, except that they omitted the initial condition

Near-Markovconditioning, where we only use the initial condition from time 0 and the most recent waypoint at time ti∗ as conditioning. This issimilar to LFM’s [28] Markovian conditioning strategy, except that they omitted the initial condition
[75]

See Tables 5, 6, 7, 8

Prefix onlyconditioning, where we only condition on the prompt frames from the prefix that are before timet. See Tables 5, 6, 7, 8. We can draw a few conclusions from this experiment: • Ourpath-dependentconditioning provides the best results on a majority of scenarios: 15 out of 24 tests on CelebV-HQ, and on 18 out of 24 tests on Sky-Timelapse. This confi...

work page arXiv
[76]

Variable-length cross-attention.For cross-attention, we use PyTorch’s varlen_attn, which is based on FlashAttention [13, 12]

to the per-head queries and keys in both self- and cross-attention to stabilize training on long contexts (the orignal DiT did not include this, but we found the exclusion to be very unstable). Variable-length cross-attention.For cross-attention, we use PyTorch’s varlen_attn, which is based on FlashAttention [13, 12]. Brownian-bridge drift.We have an opti...
[77]

Future work can explore more complicated parameterizations

To isolate the effects of RQ2 (impact of time-dependent quadratic variation), RQ3 (impact of data-to-data bridge), we picked relatively simple choices. Future work can explore more complicated parameterizations. We always sett L = 1, ensuring that the SDE is defined on[0,1]. We set the base (non-score driven) drift to be a(t) = 0. The score network should...
[78]

We found that perturbations up to N(0,0.1 2I) minimally affected the reconstruction quality

with various Gaussian noise perturbations. We found that perturbations up to N(0,0.1 2I) minimally affected the reconstruction quality. Assuming simulation step sizes up to ∆t≤0.01 , the upper limit on the volatility introduced from Brownian motion should beσ √ ∆t≤0.1⇒σ≤1.0. To select volatility for the baseline conditional diffusion bridge, we tried a fe...

2048
[79]

Hence dY(t) =    −x0 −Y(t) 4/5−t dt+ q 5 4 dW(t), t∈[0,4/5], x0 −Y(t) 1−t dt+ √ 5dW(t), t∈[4/5,1]. (77) The drifts of X and Y coincide, but the diffusion coefficients of Y are inflated by p dτ /dt on each segment—a factor that is easy to overlook when “reusing” bridges across rescaled time windows. Empirical Results. X and Y have identical finite-dim...

2019
[80]

Ultimately, they resort to matching individual marginalsQL 1 p(xti), rather than the finite-dimensional marginals p(xt1 ,

actually does consider non-Markovian stochastic processes in their initial general formulation, but they unfortunately do not take this idea further. Ultimately, they resort to matching individual marginalsQL 1 p(xti), rather than the finite-dimensional marginals p(xt1 , . . . , xtL). Furthermore, their work only considerstwo-endpointbridges,i.e., they on...