Generative Modeling by Value-Driven Transport

Adrian M\"uller; Gergely Neu; Pablo Moreno-Mu\~noz

arxiv: 2605.22507 · v1 · pith:T7J6U6BWnew · submitted 2026-05-21 · 💻 cs.LG · stat.ML

Generative Modeling by Value-Driven Transport

Pablo Moreno-Mu\~noz , Adrian M\"uller , Gergely Neu This is my paper

Pith reviewed 2026-05-22 07:55 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords generative modelingvalue-driven transportmeasure transportstochastic controlprimal-dual algorithmstraight pathsoptimal transportdiffusion models

0 comments

The pith

Generative modeling can be recast as optimal control for measure transport, yielding straight-path policies from value functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a new framework for generative modeling by formulating measure transport as a discrete-time stochastic control problem. Adapting classic control theory results, they pose the problem as a linear program whose dual variables correspond to the optimal value function that directly encodes the optimal control policy. A simulation-free primal-dual algorithm then computes approximate value functions and the associated value-driven transport policies. Well-trained VDT policies produce straight transport paths that support fast and robust simulation while allowing the same enhancements as diffusion and flow models, such as conditional generation and classifier-free guidance. Experiments indicate competitive performance with potential for scalability.

Core claim

By adapting results from control theory, the measure transport problem is posed as a linear program whose dual variables correspond to the optimal value function of the control problem, which directly encodes the optimal control policy. An efficient simulation-free primal-dual algorithm computes approximately optimal value functions and the resulting value-driven transport policies that approximate the true optimal policy for generative modeling.

What carries the argument

The primal-dual algorithm approximating the optimal value function of the stochastic control formulation of measure transport, which directly defines the value-driven transport policy.

If this is right

Transport occurs along straight paths, enabling quick and robust simulation of the generative process.
VDT policies can incorporate conditional generation, classifier-free guidance, and unpaired data-to-data translation.
The simulation-free training supports scalability to larger problems.
Performance remains competitive with flows, diffusions, and Schrödinger bridges in experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This control formulation may reduce training instability by avoiding the need for simulation steps during optimization.
Straight paths could lower sampling variance compared to the curved trajectories common in diffusion models.
The approach might extend naturally to other measure transport tasks outside generative modeling, such as domain adaptation.

Load-bearing premise

That policies from the approximated value functions are sufficiently close to the true optimal control policy to produce straight paths and practical robustness.

What would settle it

Measuring the average deviation from straight lines in trajectories sampled from a trained VDT policy; large curvature or non-linear paths would indicate the approximation fails to deliver the claimed transport properties.

Figures

Figures reproduced from arXiv: 2605.22507 by Adrian M\"uller, Gergely Neu, Pablo Moreno-Mu\~noz.

**Figure 2.** Figure 2: Value functions and value-driven transport policies in a two-dimensional example, plotted [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Few-step generation with a learned VDT model. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Experiments on MNIST: conditional generation and data translation. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Downscaled images and their deblurred counterparts produced by VDT, [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗

**Figure 6.** Figure 6: Downscaled images and their deblurred counterparts produced by VDT, [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗

**Figure 7.** Figure 7: Downscaled images and their deblurred counterparts produced by VDT, [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗

**Figure 8.** Figure 8: Forward sampling from a VDT policy from EMNIST to MNIST. [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

**Figure 9.** Figure 9: Reverse sampling from a VDT policy from MNIST to EMNIST. [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗

**Figure 10.** Figure 10: MNIST digits generated by CFG with various guidance scales: [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗

read the original abstract

We propose a new framework for generative modeling based on a discrete-time stochastic control formulation of measure transport. Adapting classic results from control theory, we formulate our problem as a linear program whose dual variables correspond to the \emph{optimal value function} of the control problem, which directly encodes the optimal control policy. Exploiting this LP formulation, we develop an efficient simulation-free primal-dual algorithm for computing approximately optimal value functions and the associated \emph{value-driven transport} (VDT) policies which approximate the true optimal policy. We show that well-trained VDT policies enjoy numerous favorable properties in comparison with other state-of-the-art methods based on flows, diffusions, or Schr\"odinger bridges: they lead to straight transport paths which can be simulated quickly and robustly, and can be enhanced in all the same ways as diffusion and flow-based models (e.g., conditional generation, classifier-free guidance, unpaired data-to-data translation are all easy to incorporate). We evaluate our methodology in a range of experiments, with results that indicate strong performance and good potential for scalability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a generative modeling framework called Value-Driven Transport (VDT) that reformulates measure transport as a discrete-time stochastic control problem. It casts this as a linear program whose dual variables correspond to the optimal value function, which encodes the optimal control policy. A simulation-free primal-dual algorithm is proposed to compute approximate value functions and the resulting VDT policies. The central claims are that well-trained VDT policies produce straight transport paths that can be simulated quickly and robustly, and that these policies support the same enhancements as diffusion and flow models (conditional generation, classifier-free guidance, unpaired translation). Experiments are reported to indicate competitive performance and scalability potential.

Significance. If the approximation guarantees and empirical claims hold, the work would provide a useful alternative to flow-, diffusion-, and Schrödinger-bridge-based generative models by enabling straight-line paths with reduced simulation cost and improved robustness. The LP-dual construction and simulation-free training are clear strengths that adapt classic stochastic control results to this setting. The ease of incorporating conditional and guidance mechanisms is a practical advantage. These elements could influence future work on efficient transport-based generation if the policy approximation quality is rigorously established.

major comments (2)

[§4] §4 (primal-dual algorithm): no explicit convergence rates, discretization error bounds, or approximation guarantees are provided for how closely the learned value functions and induced policies approach the true optimal control policy. The central claim that VDT policies yield straight transport paths and simulation robustness rests on this approximation being sufficiently accurate; without quantitative bounds on step size, iteration count, or function-class capacity, it is possible for deviations to produce curved paths or require corrective simulation, undermining the stated advantages over flows and diffusions.
[§5] §5 (experiments): the reported results compare VDT to baselines but do not include ablation studies or quantitative metrics (e.g., path straightness measured by integrated curvature or simulation variance) that directly test whether the learned policies achieve the claimed straight paths and robustness. This weakens the link between the algorithmic construction and the favorable properties asserted in the abstract.

minor comments (2)

[§3] Notation for the discrete-time control problem and the LP dual could be clarified with an explicit statement of the continuous-time limit and how the policy is recovered from the value function.
[§1] The abstract and introduction would benefit from a brief comparison table or paragraph situating VDT relative to recent optimal-transport and Schrödinger-bridge generative models.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive and insightful comments on our manuscript. We provide point-by-point responses to the major comments below.

read point-by-point responses

Referee: [§4] §4 (primal-dual algorithm): no explicit convergence rates, discretization error bounds, or approximation guarantees are provided for how closely the learned value functions and induced policies approach the true optimal control policy. The central claim that VDT policies yield straight transport paths and simulation robustness rests on this approximation being sufficiently accurate; without quantitative bounds on step size, iteration count, or function-class capacity, it is possible for deviations to produce curved paths or require corrective simulation, undermining the stated advantages over flows and diffusions.

Authors: The manuscript builds on the exact equivalence between the linear program and the optimal control problem, which guarantees straight transport paths for the true optimal value function. For the approximate primal-dual algorithm with neural network parameterization, we do not provide explicit convergence rates or error bounds in the current version. This is a valid observation, and we will revise the paper to include a discussion section addressing approximation quality, potential sources of error, and their implications for path straightness, drawing on related literature in approximate dynamic programming and stochastic control. However, establishing rigorous quantitative bounds for this specific setting would constitute a significant extension of the theoretical analysis. revision: partial
Referee: [§5] §5 (experiments): the reported results compare VDT to baselines but do not include ablation studies or quantitative metrics (e.g., path straightness measured by integrated curvature or simulation variance) that directly test whether the learned policies achieve the claimed straight paths and robustness. This weakens the link between the algorithmic construction and the favorable properties asserted in the abstract.

Authors: We agree that incorporating quantitative metrics for path straightness and simulation robustness, as well as ablation studies, would provide stronger empirical support for the claimed advantages. The current experiments emphasize generative quality and comparisons to baselines, with qualitative evidence of straight paths. In the revised manuscript, we will add these quantitative evaluations and ablations to directly validate the straight-path and robustness properties. revision: yes

standing simulated objections not resolved

Deriving explicit convergence rates, discretization error bounds, or approximation guarantees for the primal-dual algorithm with neural network function approximation.

Circularity Check

0 steps flagged

No significant circularity; derivation adapts external control theory

full rationale

The paper formulates measure transport as a linear program whose dual encodes the optimal value function from stochastic control theory, then introduces a primal-dual algorithm to approximate the associated policies. The claimed straight transport paths and simulation robustness are presented as consequences of approximating the optimal control policy in this LP setting, drawing on classic external results rather than any fitted parameter renamed as a prediction or any self-referential definition. No load-bearing step reduces by construction to the paper's own inputs or prior self-citations; the central claims rest on the LP-dual construction and the new algorithm, which remain independent of the target generative properties.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review based on abstract only; limited visibility into specific assumptions or parameters.

axioms (1)

standard math Classic results from control theory on stochastic control problems and their linear programming formulations hold and can be adapted to measure transport.
The paper states it adapts these results to formulate the generative modeling problem as an LP.

invented entities (1)

Value-driven transport (VDT) policies no independent evidence
purpose: Approximate optimal control policies obtained from the dual of the transport LP.
Introduced as the output of the primal-dual algorithm that directly encodes the optimal policy via the value function.

pith-pipeline@v0.9.0 · 5720 in / 1302 out tokens · 36509 ms · 2026-05-22T07:55:00.607397+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

optimal value function V⋆ encodes optimal policy via V⋆h(x) = min_y [ (H+1)/2 ∥x−y∥² + V⋆h+1(y) ]; VDT policy πh(x;V)=x−(1/(H+1))∇xVh(x) produces straight paths
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery / Peano structure unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

discrete-time dynamic OT as LP with flow constraints; duality yields value functions

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 6 internal anchors

[1]

Albergo and Eric Vanden-Eijnden

Michael S. Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. In The Eleventh International Conference on Learning Representations, 2023

work page 2023
[2]

Albergo, Nicholas M

Michael S. Albergo, Nicholas M. Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions. Journal of Machine Learning Research, 26(209): 1–80, 2025

work page 2025
[3]

Gradient flows: in metric spaces and in the space of probability measures

Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré. Gradient flows: in metric spaces and in the space of probability measures. Springer, 2005

work page 2005
[4]

Logistic Q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, and Gergely Neu. Logistic Q-learning. In AI & Statistics, pages 3610–3618, 2021

work page 2021
[5]

Richard E. Bellman. Dynamic Programming. Princeton University Press, Princeton, New Jersey, 1957

work page 1957
[6]

A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem

Jean-David Benamou and Yann Brenier. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numerische Mathematik, 84(3):375–393, 2000

work page 2000
[7]

Bertsekas

Dimitri P. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientific, Belmont, MA, 3 edition, 2007

work page 2007
[8]

Chamon, Mohammad R

Luiz F. Chamon, Mohammad R. Karimi, and Anna Korba. Constrained sampling with primal- dual Langevin monte carlo. Advances in Neural Information Processing Systems, 37:29285– 29323, 2024

work page 2024
[9]

Neural ordinary differential equations

Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018

work page 2018
[10]

Emnist: Extending mnist to handwritten letters

Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre Van Schaik. Emnist: Extending mnist to handwritten letters. In 2017 international joint conference on neural networks (IJCNN), pages 2921–2926. IEEE, 2017

work page 2017
[11]

Diffusion Schrödinger bridge with applications to score-based generative modeling

Valentin De Bortoli, James Thornton, Jeremy Heng, and Arnaud Doucet. Diffusion Schrödinger bridge with applications to score-based generative modeling. Advances in neural information processing systems, 34:17695–17709, 2021

work page 2021
[12]

Schrödinger bridge flow for unpaired data translation

Valentin De Bortoli, Iryna Korshunova, Andriy Mnih, and Arnaud Doucet. Schrödinger bridge flow for unpaired data translation. Advances in Neural Information Processing Systems, 37: 103384–103441, 2024

work page 2024
[13]

de Farias and Benjamin Van Roy

Daniela P. de Farias and Benjamin Van Roy. The linear programming approach to approximate dynamic programming. Operations Research, 51(6):850–865, 2003

work page 2003
[14]

Les problèmes de décisions séquentielles

Guy de Ghellinck. Les problèmes de décisions séquentielles. Cahiers du Centre d’Études de Recherche Opérationnelle, 2:161–179, 1960

work page 1960
[15]

Eric V . Denardo. On linear programming in a Markov decision problem.Management Science, 16(5):281–288, 1970. 10

work page 1970
[16]

A probabilistic production and inventory problem

Francois d’Epenoux. A probabilistic production and inventory problem. Management Science, 10(1):98–108, 1963

work page 1963
[17]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021

work page 2021
[18]

Light and optimal schrödinger bridge matching

Nikita Gushchin, Sergei Kholkin, Evgeny Burnaev, and Alexander Korotin. Light and optimal schrödinger bridge matching. In Forty-first International Conference on Machine Learning (ICML), 2024

work page 2024
[19]

Adversarial Schrödinger bridge matching

Nikita Gushchin, Daniil Selikhanovych, Sergei Kholkin, Evgeny Burnaev, and Alexander Korotin. Adversarial Schrödinger bridge matching. Advances in Neural Information Processing Systems, 37:89612–89651, 2024

work page 2024
[20]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[21]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020
[22]

Ronald A. Howard. Dynamic Programming and Markov Processes. The MIT Press, Cambridge, MA, 1960

work page 1960
[23]

The variational formulation of the fokker– planck equation

Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formulation of the fokker– planck equation. SIAM journal on mathematical analysis, 29(1):1–17, 1998

work page 1998
[24]

Elucidating the design space of diffusion-based generative models

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. Advances in neural information processing systems, 35: 26565–26577, 2022

work page 2022
[25]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Interna- tional Conference on Learning Representations (ICLR), 2015

work page 2015
[26]

The Principles of Diffusion Models

Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, and Stefano Ermon. The principles of diffusion models. arXiv preprint arXiv:2510.21890, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

The MNIST database of handwritten digits

Yann LeCun and Corinna Cortes. The MNIST database of handwritten digits. http: // yann. lecun. com/ exdb/ mnist/

work page
[28]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2023

work page 2023
[29]

Flow Matching Guide and Code

Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky T.Q. Chen, David Lopez-Paz, Heli Ben-Hamu, and Itai Gat. Flow matching guide and code. arXiv preprint arXiv:2412.06264, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[30]

Theodorou, Weili Nie, and Anima Anandkumar

Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A. Theodorou, Weili Nie, and Anima Anandkumar. I2sb: image-to-image schrödinger bridge. In Proceedings of the 40th International Conference on Machine Learning, pages 22042–22062, 2023

work page 2023
[31]

Rectified Flow: A Marginal Preserving Approach to Optimal Transport

Qiang Liu. Rectified flow: A marginal preserving approach to optimal transport. arXiv preprint arXiv:2209.14577, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[32]

Flow straight and fast: Learning to generate and transfer data with rectified flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. In The Eleventh International Conference on Learning Representations (ICLR), 2023

work page 2023
[33]

Mehta, Sean P

Fan Lu, Prashant G. Mehta, Sean P. Meyn, and Gergely Neu. Convex q-learning. In 2021 American Control Conference (ACC), pages 4749–4756. IEEE, 2021

work page 2021
[34]

Alan S. Manne. Linear programming and sequential decisions. Management Science, 6(3): 259–267, 1960. 11

work page 1960
[35]

Robert J. McCann. A convexity principle for interacting gases. Advances in mathematics, 128 (1):153–179, 1997

work page 1997
[36]

Action matching: Learning stochastic dynamics from samples

Kirill Neklyudov, Rob Brekelmans, Daniel Severo, and Alireza Makhzani. Action matching: Learning stochastic dynamics from samples. In International conference on machine learning, pages 25858–25889, 2023

work page 2023
[37]

Offline rl via feature-occupancy gradient ascent

Gergely Neu and Nneka Okolo. Offline rl via feature-occupancy gradient ascent. InInternational Conference on Artificial Intelligence and Statistics, pages 3637–3645, 2025

work page 2025
[38]

A unified view of entropy-regularized Markov decision processes

Gergely Neu, Anders Jonsson, and Vicenç Gómez. A unified view of entropy-regularized Markov decision processes. arXiv preprint arXiv:1705.07798, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[39]

Improved denoising diffusion probabilistic models

Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pages 8162–8171, 2021

work page 2021
[40]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011

work page 2011
[41]

arXiv, ://arxiv.org/abs/2512.06797, arXiv:2512.06797 [math], doi:10.48550/arXiv.2512.06797

Gabriel Peyré. Optimal and diffusion transports in machine learning. arXiv preprint arXiv:2512.06797, 2025

work page arXiv 2025
[42]

Optimal transport for machine learners

Gabriel Peyré. Optimal transport for machine learners. arXiv preprint arXiv:2505.06589, 2025

work page arXiv 2025
[43]

Puterman

Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, April 1994

work page 1994
[44]

Variational inference with normalizing flows

Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In International conference on machine learning, pages 1530–1538, 2015

work page 2015
[45]

The wasserstein proximal gradient algorithm

Adil Salim, Anna Korba, and Giulia Luise. The wasserstein proximal gradient algorithm. Advances in Neural Information Processing Systems, 33:12356–12366, 2020

work page 2020
[46]

Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling

Filippo Santambrogio. Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling. Progress in Nonlinear Differential Equations and Their Applications. Birkhäuser Cham, 2015

work page 2015
[47]

L1 and L∞ theory

Filippo Santambrogio. L1 and L∞ theory. In Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling, pages 87–119. Springer, 2015

work page 2015
[48]

Über die Umkehrung der Naturgesetze

Erwin Schrödinger. Über die Umkehrung der Naturgesetze. Sitzungsberichte der Preußischen Akademie der Wissenschaften. Physikalisch-mathematische Klasse, pages 144–153, 1931

work page 1931
[49]

Schweitzer and Abraham Seidman

Paul J. Schweitzer and Abraham Seidman. Generalized polynomial approximations in Marko- vian decision processes. J. of Math. Anal. and Appl., 110:568–582, 1985

work page 1985
[50]

On duality theory of conic linear problems

Alexander Shapiro. On duality theory of conic linear problems. Nonconvex Optimization and its Applications, 57:135–155, 2001

work page 2001
[51]

Diffusion Schrödinger bridge matching

Yuyang Shi, Valentin De Bortoli, Andrew Campbell, and Arnaud Doucet. Diffusion Schrödinger bridge matching. Advances in neural information processing systems, 36:62183–62223, 2023

work page 2023
[52]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265, 2015

work page 2015
[53]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021

work page 2021
[54]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020. 12

work page internal anchor Pith review Pith/arXiv arXiv 2011
[55]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction (second edition). online draft, 2018

work page 2018
[56]

Introduction to optimal transport

Matthew Thorpe. Introduction to optimal transport

work page
[57]

Improving and generalizing flow-based generative models with minibatch optimal transport

Alexander Tong, Kilian Fatras, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector- Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport. Transactions on Machine Learning Research, 2024

work page 2024
[58]

Simulation-free schrödinger bridges via score and flow matching

Alexander Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yanlei Zhang, Guillaume Huguet, Guy Wolf, and Yoshua Bengio. Simulation-free schrödinger bridges via score and flow matching. In International Conference on Artificial Intelligence and Statistics , pages 1279–1287, 2024

work page 2024
[59]

Topics in optimal transportation, volume 58

Cédric Villani. Topics in optimal transportation, volume 58. American Mathematical Soc., 2003

work page 2003
[60]

Scipy 1.0: fundamental algorithms for scientific computing in python

Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Courna- peau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. Scipy 1.0: fundamental algorithms for scientific computing in python. Nature methods, 17(3):261–272, 2020

work page 2020
[61]

Bayesian learning via stochastic gradient langevin dynamics

Max Welling and Yee Whye Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11) , pages 681–688, 2011

work page 2011
[62]

entropic OT

Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Diffusion models: A comprehensive survey of methods and applications. ACM computing surveys, 56(4):1–39, 2023. 13 Appendix Contents A Related work 14 B Discrete-time dynamic optimal transport 15 B.1 Definitions of OT problems . . . . . . . ...

work page 2023
[63]

moons”: We use the standard “two moons

By strong duality (Lemma B.3) and the fact that (πH)#νsrc = νtgt by feasibility, we have H + 1 2 HX h=0 Z ∥π⋆ h(x) − x∥2 d(πh−1#νsrc)(x) = Z V ⋆ 0 (x)dνsrc(x) − Z V ⋆ H+1(x)dνtgt = Z V ⋆ 0 (x)dνsrc(x) − Z V ⋆ H+1(x)d((πH)#νsrc)(x) = Z (V ⋆ 0 (x) − V ⋆ H+1(πH(x))dνsrc(x) = HX h=0 Z V ⋆ h (πh−1(x)) − V ⋆ h+1(π⋆ h(πh−1(x))) dνsrc(x) = HX h=0 Z V ⋆ h (x) − V ...

work page 2061

[1] [1]

Albergo and Eric Vanden-Eijnden

Michael S. Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. In The Eleventh International Conference on Learning Representations, 2023

work page 2023

[2] [2]

Albergo, Nicholas M

Michael S. Albergo, Nicholas M. Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions. Journal of Machine Learning Research, 26(209): 1–80, 2025

work page 2025

[3] [3]

Gradient flows: in metric spaces and in the space of probability measures

Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré. Gradient flows: in metric spaces and in the space of probability measures. Springer, 2005

work page 2005

[4] [4]

Logistic Q-learning

Joan Bas-Serrano, Sebastian Curi, Andreas Krause, and Gergely Neu. Logistic Q-learning. In AI & Statistics, pages 3610–3618, 2021

work page 2021

[5] [5]

Richard E. Bellman. Dynamic Programming. Princeton University Press, Princeton, New Jersey, 1957

work page 1957

[6] [6]

A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem

Jean-David Benamou and Yann Brenier. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numerische Mathematik, 84(3):375–393, 2000

work page 2000

[7] [7]

Bertsekas

Dimitri P. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientific, Belmont, MA, 3 edition, 2007

work page 2007

[8] [8]

Chamon, Mohammad R

Luiz F. Chamon, Mohammad R. Karimi, and Anna Korba. Constrained sampling with primal- dual Langevin monte carlo. Advances in Neural Information Processing Systems, 37:29285– 29323, 2024

work page 2024

[9] [9]

Neural ordinary differential equations

Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018

work page 2018

[10] [10]

Emnist: Extending mnist to handwritten letters

Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre Van Schaik. Emnist: Extending mnist to handwritten letters. In 2017 international joint conference on neural networks (IJCNN), pages 2921–2926. IEEE, 2017

work page 2017

[11] [11]

Diffusion Schrödinger bridge with applications to score-based generative modeling

Valentin De Bortoli, James Thornton, Jeremy Heng, and Arnaud Doucet. Diffusion Schrödinger bridge with applications to score-based generative modeling. Advances in neural information processing systems, 34:17695–17709, 2021

work page 2021

[12] [12]

Schrödinger bridge flow for unpaired data translation

Valentin De Bortoli, Iryna Korshunova, Andriy Mnih, and Arnaud Doucet. Schrödinger bridge flow for unpaired data translation. Advances in Neural Information Processing Systems, 37: 103384–103441, 2024

work page 2024

[13] [13]

de Farias and Benjamin Van Roy

Daniela P. de Farias and Benjamin Van Roy. The linear programming approach to approximate dynamic programming. Operations Research, 51(6):850–865, 2003

work page 2003

[14] [14]

Les problèmes de décisions séquentielles

Guy de Ghellinck. Les problèmes de décisions séquentielles. Cahiers du Centre d’Études de Recherche Opérationnelle, 2:161–179, 1960

work page 1960

[15] [15]

Eric V . Denardo. On linear programming in a Markov decision problem.Management Science, 16(5):281–288, 1970. 10

work page 1970

[16] [16]

A probabilistic production and inventory problem

Francois d’Epenoux. A probabilistic production and inventory problem. Management Science, 10(1):98–108, 1963

work page 1963

[17] [17]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021

work page 2021

[18] [18]

Light and optimal schrödinger bridge matching

Nikita Gushchin, Sergei Kholkin, Evgeny Burnaev, and Alexander Korotin. Light and optimal schrödinger bridge matching. In Forty-first International Conference on Machine Learning (ICML), 2024

work page 2024

[19] [19]

Adversarial Schrödinger bridge matching

Nikita Gushchin, Daniil Selikhanovych, Sergei Kholkin, Evgeny Burnaev, and Alexander Korotin. Adversarial Schrödinger bridge matching. Advances in Neural Information Processing Systems, 37:89612–89651, 2024

work page 2024

[20] [20]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[21] [21]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020

[22] [22]

Ronald A. Howard. Dynamic Programming and Markov Processes. The MIT Press, Cambridge, MA, 1960

work page 1960

[23] [23]

The variational formulation of the fokker– planck equation

Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formulation of the fokker– planck equation. SIAM journal on mathematical analysis, 29(1):1–17, 1998

work page 1998

[24] [24]

Elucidating the design space of diffusion-based generative models

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. Advances in neural information processing systems, 35: 26565–26577, 2022

work page 2022

[25] [25]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Interna- tional Conference on Learning Representations (ICLR), 2015

work page 2015

[26] [26]

The Principles of Diffusion Models

Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, and Stefano Ermon. The principles of diffusion models. arXiv preprint arXiv:2510.21890, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[27] [27]

The MNIST database of handwritten digits

Yann LeCun and Corinna Cortes. The MNIST database of handwritten digits. http: // yann. lecun. com/ exdb/ mnist/

work page

[28] [28]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2023

work page 2023

[29] [29]

Flow Matching Guide and Code

Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky T.Q. Chen, David Lopez-Paz, Heli Ben-Hamu, and Itai Gat. Flow matching guide and code. arXiv preprint arXiv:2412.06264, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[30] [30]

Theodorou, Weili Nie, and Anima Anandkumar

Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A. Theodorou, Weili Nie, and Anima Anandkumar. I2sb: image-to-image schrödinger bridge. In Proceedings of the 40th International Conference on Machine Learning, pages 22042–22062, 2023

work page 2023

[31] [31]

Rectified Flow: A Marginal Preserving Approach to Optimal Transport

Qiang Liu. Rectified flow: A marginal preserving approach to optimal transport. arXiv preprint arXiv:2209.14577, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[32] [32]

Flow straight and fast: Learning to generate and transfer data with rectified flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. In The Eleventh International Conference on Learning Representations (ICLR), 2023

work page 2023

[33] [33]

Mehta, Sean P

Fan Lu, Prashant G. Mehta, Sean P. Meyn, and Gergely Neu. Convex q-learning. In 2021 American Control Conference (ACC), pages 4749–4756. IEEE, 2021

work page 2021

[34] [34]

Alan S. Manne. Linear programming and sequential decisions. Management Science, 6(3): 259–267, 1960. 11

work page 1960

[35] [35]

Robert J. McCann. A convexity principle for interacting gases. Advances in mathematics, 128 (1):153–179, 1997

work page 1997

[36] [36]

Action matching: Learning stochastic dynamics from samples

Kirill Neklyudov, Rob Brekelmans, Daniel Severo, and Alireza Makhzani. Action matching: Learning stochastic dynamics from samples. In International conference on machine learning, pages 25858–25889, 2023

work page 2023

[37] [37]

Offline rl via feature-occupancy gradient ascent

Gergely Neu and Nneka Okolo. Offline rl via feature-occupancy gradient ascent. InInternational Conference on Artificial Intelligence and Statistics, pages 3637–3645, 2025

work page 2025

[38] [38]

A unified view of entropy-regularized Markov decision processes

Gergely Neu, Anders Jonsson, and Vicenç Gómez. A unified view of entropy-regularized Markov decision processes. arXiv preprint arXiv:1705.07798, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[39] [39]

Improved denoising diffusion probabilistic models

Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pages 8162–8171, 2021

work page 2021

[40] [40]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011

work page 2011

[41] [41]

arXiv, ://arxiv.org/abs/2512.06797, arXiv:2512.06797 [math], doi:10.48550/arXiv.2512.06797

Gabriel Peyré. Optimal and diffusion transports in machine learning. arXiv preprint arXiv:2512.06797, 2025

work page arXiv 2025

[42] [42]

Optimal transport for machine learners

Gabriel Peyré. Optimal transport for machine learners. arXiv preprint arXiv:2505.06589, 2025

work page arXiv 2025

[43] [43]

Puterman

Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, April 1994

work page 1994

[44] [44]

Variational inference with normalizing flows

Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In International conference on machine learning, pages 1530–1538, 2015

work page 2015

[45] [45]

The wasserstein proximal gradient algorithm

Adil Salim, Anna Korba, and Giulia Luise. The wasserstein proximal gradient algorithm. Advances in Neural Information Processing Systems, 33:12356–12366, 2020

work page 2020

[46] [46]

Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling

Filippo Santambrogio. Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling. Progress in Nonlinear Differential Equations and Their Applications. Birkhäuser Cham, 2015

work page 2015

[47] [47]

L1 and L∞ theory

Filippo Santambrogio. L1 and L∞ theory. In Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling, pages 87–119. Springer, 2015

work page 2015

[48] [48]

Über die Umkehrung der Naturgesetze

Erwin Schrödinger. Über die Umkehrung der Naturgesetze. Sitzungsberichte der Preußischen Akademie der Wissenschaften. Physikalisch-mathematische Klasse, pages 144–153, 1931

work page 1931

[49] [49]

Schweitzer and Abraham Seidman

Paul J. Schweitzer and Abraham Seidman. Generalized polynomial approximations in Marko- vian decision processes. J. of Math. Anal. and Appl., 110:568–582, 1985

work page 1985

[50] [50]

On duality theory of conic linear problems

Alexander Shapiro. On duality theory of conic linear problems. Nonconvex Optimization and its Applications, 57:135–155, 2001

work page 2001

[51] [51]

Diffusion Schrödinger bridge matching

Yuyang Shi, Valentin De Bortoli, Andrew Campbell, and Arnaud Doucet. Diffusion Schrödinger bridge matching. Advances in neural information processing systems, 36:62183–62223, 2023

work page 2023

[52] [52]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265, 2015

work page 2015

[53] [53]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021

work page 2021

[54] [54]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020. 12

work page internal anchor Pith review Pith/arXiv arXiv 2011

[55] [55]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction (second edition). online draft, 2018

work page 2018

[56] [56]

Introduction to optimal transport

Matthew Thorpe. Introduction to optimal transport

work page

[57] [57]

Improving and generalizing flow-based generative models with minibatch optimal transport

Alexander Tong, Kilian Fatras, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector- Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport. Transactions on Machine Learning Research, 2024

work page 2024

[58] [58]

Simulation-free schrödinger bridges via score and flow matching

Alexander Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yanlei Zhang, Guillaume Huguet, Guy Wolf, and Yoshua Bengio. Simulation-free schrödinger bridges via score and flow matching. In International Conference on Artificial Intelligence and Statistics , pages 1279–1287, 2024

work page 2024

[59] [59]

Topics in optimal transportation, volume 58

Cédric Villani. Topics in optimal transportation, volume 58. American Mathematical Soc., 2003

work page 2003

[60] [60]

Scipy 1.0: fundamental algorithms for scientific computing in python

Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Courna- peau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. Scipy 1.0: fundamental algorithms for scientific computing in python. Nature methods, 17(3):261–272, 2020

work page 2020

[61] [61]

Bayesian learning via stochastic gradient langevin dynamics

Max Welling and Yee Whye Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11) , pages 681–688, 2011

work page 2011

[62] [62]

entropic OT

Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Diffusion models: A comprehensive survey of methods and applications. ACM computing surveys, 56(4):1–39, 2023. 13 Appendix Contents A Related work 14 B Discrete-time dynamic optimal transport 15 B.1 Definitions of OT problems . . . . . . . ...

work page 2023

[63] [63]

moons”: We use the standard “two moons

By strong duality (Lemma B.3) and the fact that (πH)#νsrc = νtgt by feasibility, we have H + 1 2 HX h=0 Z ∥π⋆ h(x) − x∥2 d(πh−1#νsrc)(x) = Z V ⋆ 0 (x)dνsrc(x) − Z V ⋆ H+1(x)dνtgt = Z V ⋆ 0 (x)dνsrc(x) − Z V ⋆ H+1(x)d((πH)#νsrc)(x) = Z (V ⋆ 0 (x) − V ⋆ H+1(πH(x))dνsrc(x) = HX h=0 Z V ⋆ h (πh−1(x)) − V ⋆ h+1(π⋆ h(πh−1(x))) dνsrc(x) = HX h=0 Z V ⋆ h (x) − V ...

work page 2061