pith. sign in

arxiv: 2605.22507 · v1 · pith:T7J6U6BWnew · submitted 2026-05-21 · 💻 cs.LG · stat.ML

Generative Modeling by Value-Driven Transport

Pith reviewed 2026-05-22 07:55 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords generative modelingvalue-driven transportmeasure transportstochastic controlprimal-dual algorithmstraight pathsoptimal transportdiffusion models
0
0 comments X

The pith

Generative modeling can be recast as optimal control for measure transport, yielding straight-path policies from value functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a new framework for generative modeling by formulating measure transport as a discrete-time stochastic control problem. Adapting classic control theory results, they pose the problem as a linear program whose dual variables correspond to the optimal value function that directly encodes the optimal control policy. A simulation-free primal-dual algorithm then computes approximate value functions and the associated value-driven transport policies. Well-trained VDT policies produce straight transport paths that support fast and robust simulation while allowing the same enhancements as diffusion and flow models, such as conditional generation and classifier-free guidance. Experiments indicate competitive performance with potential for scalability.

Core claim

By adapting results from control theory, the measure transport problem is posed as a linear program whose dual variables correspond to the optimal value function of the control problem, which directly encodes the optimal control policy. An efficient simulation-free primal-dual algorithm computes approximately optimal value functions and the resulting value-driven transport policies that approximate the true optimal policy for generative modeling.

What carries the argument

The primal-dual algorithm approximating the optimal value function of the stochastic control formulation of measure transport, which directly defines the value-driven transport policy.

If this is right

  • Transport occurs along straight paths, enabling quick and robust simulation of the generative process.
  • VDT policies can incorporate conditional generation, classifier-free guidance, and unpaired data-to-data translation.
  • The simulation-free training supports scalability to larger problems.
  • Performance remains competitive with flows, diffusions, and Schrödinger bridges in experiments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This control formulation may reduce training instability by avoiding the need for simulation steps during optimization.
  • Straight paths could lower sampling variance compared to the curved trajectories common in diffusion models.
  • The approach might extend naturally to other measure transport tasks outside generative modeling, such as domain adaptation.

Load-bearing premise

That policies from the approximated value functions are sufficiently close to the true optimal control policy to produce straight paths and practical robustness.

What would settle it

Measuring the average deviation from straight lines in trajectories sampled from a trained VDT policy; large curvature or non-linear paths would indicate the approximation fails to deliver the claimed transport properties.

Figures

Figures reproduced from arXiv: 2605.22507 by Adrian M\"uller, Gergely Neu, Pablo Moreno-Mu\~noz.

Figure 1
Figure 1. Figure 1: Continuous- and discrete-time transport plans. Continuous-time plans are defined as smooth [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Value functions and value-driven transport policies in a two-dimensional example, plotted [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Few-step generation with a learned VDT model. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Experiments on MNIST: conditional generation and data translation. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Downscaled images and their deblurred counterparts produced by VDT, [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Downscaled images and their deblurred counterparts produced by VDT, [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Downscaled images and their deblurred counterparts produced by VDT, [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Forward sampling from a VDT policy from EMNIST to MNIST. [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Reverse sampling from a VDT policy from MNIST to EMNIST. [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: MNIST digits generated by CFG with various guidance scales: [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗
read the original abstract

We propose a new framework for generative modeling based on a discrete-time stochastic control formulation of measure transport. Adapting classic results from control theory, we formulate our problem as a linear program whose dual variables correspond to the \emph{optimal value function} of the control problem, which directly encodes the optimal control policy. Exploiting this LP formulation, we develop an efficient simulation-free primal-dual algorithm for computing approximately optimal value functions and the associated \emph{value-driven transport} (VDT) policies which approximate the true optimal policy. We show that well-trained VDT policies enjoy numerous favorable properties in comparison with other state-of-the-art methods based on flows, diffusions, or Schr\"odinger bridges: they lead to straight transport paths which can be simulated quickly and robustly, and can be enhanced in all the same ways as diffusion and flow-based models (e.g., conditional generation, classifier-free guidance, unpaired data-to-data translation are all easy to incorporate). We evaluate our methodology in a range of experiments, with results that indicate strong performance and good potential for scalability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a generative modeling framework called Value-Driven Transport (VDT) that reformulates measure transport as a discrete-time stochastic control problem. It casts this as a linear program whose dual variables correspond to the optimal value function, which encodes the optimal control policy. A simulation-free primal-dual algorithm is proposed to compute approximate value functions and the resulting VDT policies. The central claims are that well-trained VDT policies produce straight transport paths that can be simulated quickly and robustly, and that these policies support the same enhancements as diffusion and flow models (conditional generation, classifier-free guidance, unpaired translation). Experiments are reported to indicate competitive performance and scalability potential.

Significance. If the approximation guarantees and empirical claims hold, the work would provide a useful alternative to flow-, diffusion-, and Schrödinger-bridge-based generative models by enabling straight-line paths with reduced simulation cost and improved robustness. The LP-dual construction and simulation-free training are clear strengths that adapt classic stochastic control results to this setting. The ease of incorporating conditional and guidance mechanisms is a practical advantage. These elements could influence future work on efficient transport-based generation if the policy approximation quality is rigorously established.

major comments (2)
  1. [§4] §4 (primal-dual algorithm): no explicit convergence rates, discretization error bounds, or approximation guarantees are provided for how closely the learned value functions and induced policies approach the true optimal control policy. The central claim that VDT policies yield straight transport paths and simulation robustness rests on this approximation being sufficiently accurate; without quantitative bounds on step size, iteration count, or function-class capacity, it is possible for deviations to produce curved paths or require corrective simulation, undermining the stated advantages over flows and diffusions.
  2. [§5] §5 (experiments): the reported results compare VDT to baselines but do not include ablation studies or quantitative metrics (e.g., path straightness measured by integrated curvature or simulation variance) that directly test whether the learned policies achieve the claimed straight paths and robustness. This weakens the link between the algorithmic construction and the favorable properties asserted in the abstract.
minor comments (2)
  1. [§3] Notation for the discrete-time control problem and the LP dual could be clarified with an explicit statement of the continuous-time limit and how the policy is recovered from the value function.
  2. [§1] The abstract and introduction would benefit from a brief comparison table or paragraph situating VDT relative to recent optimal-transport and Schrödinger-bridge generative models.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive and insightful comments on our manuscript. We provide point-by-point responses to the major comments below.

read point-by-point responses
  1. Referee: [§4] §4 (primal-dual algorithm): no explicit convergence rates, discretization error bounds, or approximation guarantees are provided for how closely the learned value functions and induced policies approach the true optimal control policy. The central claim that VDT policies yield straight transport paths and simulation robustness rests on this approximation being sufficiently accurate; without quantitative bounds on step size, iteration count, or function-class capacity, it is possible for deviations to produce curved paths or require corrective simulation, undermining the stated advantages over flows and diffusions.

    Authors: The manuscript builds on the exact equivalence between the linear program and the optimal control problem, which guarantees straight transport paths for the true optimal value function. For the approximate primal-dual algorithm with neural network parameterization, we do not provide explicit convergence rates or error bounds in the current version. This is a valid observation, and we will revise the paper to include a discussion section addressing approximation quality, potential sources of error, and their implications for path straightness, drawing on related literature in approximate dynamic programming and stochastic control. However, establishing rigorous quantitative bounds for this specific setting would constitute a significant extension of the theoretical analysis. revision: partial

  2. Referee: [§5] §5 (experiments): the reported results compare VDT to baselines but do not include ablation studies or quantitative metrics (e.g., path straightness measured by integrated curvature or simulation variance) that directly test whether the learned policies achieve the claimed straight paths and robustness. This weakens the link between the algorithmic construction and the favorable properties asserted in the abstract.

    Authors: We agree that incorporating quantitative metrics for path straightness and simulation robustness, as well as ablation studies, would provide stronger empirical support for the claimed advantages. The current experiments emphasize generative quality and comparisons to baselines, with qualitative evidence of straight paths. In the revised manuscript, we will add these quantitative evaluations and ablations to directly validate the straight-path and robustness properties. revision: yes

standing simulated objections not resolved
  • Deriving explicit convergence rates, discretization error bounds, or approximation guarantees for the primal-dual algorithm with neural network function approximation.

Circularity Check

0 steps flagged

No significant circularity; derivation adapts external control theory

full rationale

The paper formulates measure transport as a linear program whose dual encodes the optimal value function from stochastic control theory, then introduces a primal-dual algorithm to approximate the associated policies. The claimed straight transport paths and simulation robustness are presented as consequences of approximating the optimal control policy in this LP setting, drawing on classic external results rather than any fitted parameter renamed as a prediction or any self-referential definition. No load-bearing step reduces by construction to the paper's own inputs or prior self-citations; the central claims rest on the LP-dual construction and the new algorithm, which remain independent of the target generative properties.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review based on abstract only; limited visibility into specific assumptions or parameters.

axioms (1)
  • standard math Classic results from control theory on stochastic control problems and their linear programming formulations hold and can be adapted to measure transport.
    The paper states it adapts these results to formulate the generative modeling problem as an LP.
invented entities (1)
  • Value-driven transport (VDT) policies no independent evidence
    purpose: Approximate optimal control policies obtained from the dual of the transport LP.
    Introduced as the output of the primal-dual algorithm that directly encodes the optimal policy via the value function.

pith-pipeline@v0.9.0 · 5720 in / 1302 out tokens · 36509 ms · 2026-05-22T07:55:00.607397+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 6 internal anchors

  1. [1]

    Albergo and Eric Vanden-Eijnden

    Michael S. Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. In The Eleventh International Conference on Learning Representations, 2023

  2. [2]

    Albergo, Nicholas M

    Michael S. Albergo, Nicholas M. Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions. Journal of Machine Learning Research, 26(209): 1–80, 2025

  3. [3]

    Gradient flows: in metric spaces and in the space of probability measures

    Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré. Gradient flows: in metric spaces and in the space of probability measures. Springer, 2005

  4. [4]

    Logistic Q-learning

    Joan Bas-Serrano, Sebastian Curi, Andreas Krause, and Gergely Neu. Logistic Q-learning. In AI & Statistics, pages 3610–3618, 2021

  5. [5]

    Richard E. Bellman. Dynamic Programming. Princeton University Press, Princeton, New Jersey, 1957

  6. [6]

    A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem

    Jean-David Benamou and Yann Brenier. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numerische Mathematik, 84(3):375–393, 2000

  7. [7]

    Bertsekas

    Dimitri P. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientific, Belmont, MA, 3 edition, 2007

  8. [8]

    Chamon, Mohammad R

    Luiz F. Chamon, Mohammad R. Karimi, and Anna Korba. Constrained sampling with primal- dual Langevin monte carlo. Advances in Neural Information Processing Systems, 37:29285– 29323, 2024

  9. [9]

    Neural ordinary differential equations

    Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018

  10. [10]

    Emnist: Extending mnist to handwritten letters

    Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre Van Schaik. Emnist: Extending mnist to handwritten letters. In 2017 international joint conference on neural networks (IJCNN), pages 2921–2926. IEEE, 2017

  11. [11]

    Diffusion Schrödinger bridge with applications to score-based generative modeling

    Valentin De Bortoli, James Thornton, Jeremy Heng, and Arnaud Doucet. Diffusion Schrödinger bridge with applications to score-based generative modeling. Advances in neural information processing systems, 34:17695–17709, 2021

  12. [12]

    Schrödinger bridge flow for unpaired data translation

    Valentin De Bortoli, Iryna Korshunova, Andriy Mnih, and Arnaud Doucet. Schrödinger bridge flow for unpaired data translation. Advances in Neural Information Processing Systems, 37: 103384–103441, 2024

  13. [13]

    de Farias and Benjamin Van Roy

    Daniela P. de Farias and Benjamin Van Roy. The linear programming approach to approximate dynamic programming. Operations Research, 51(6):850–865, 2003

  14. [14]

    Les problèmes de décisions séquentielles

    Guy de Ghellinck. Les problèmes de décisions séquentielles. Cahiers du Centre d’Études de Recherche Opérationnelle, 2:161–179, 1960

  15. [15]

    Eric V . Denardo. On linear programming in a Markov decision problem.Management Science, 16(5):281–288, 1970. 10

  16. [16]

    A probabilistic production and inventory problem

    Francois d’Epenoux. A probabilistic production and inventory problem. Management Science, 10(1):98–108, 1963

  17. [17]

    Diffusion models beat gans on image synthesis

    Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021

  18. [18]

    Light and optimal schrödinger bridge matching

    Nikita Gushchin, Sergei Kholkin, Evgeny Burnaev, and Alexander Korotin. Light and optimal schrödinger bridge matching. In Forty-first International Conference on Machine Learning (ICML), 2024

  19. [19]

    Adversarial Schrödinger bridge matching

    Nikita Gushchin, Daniil Selikhanovych, Sergei Kholkin, Evgeny Burnaev, and Alexander Korotin. Adversarial Schrödinger bridge matching. Advances in Neural Information Processing Systems, 37:89612–89651, 2024

  20. [20]

    Classifier-Free Diffusion Guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022

  21. [21]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020

  22. [22]

    Ronald A. Howard. Dynamic Programming and Markov Processes. The MIT Press, Cambridge, MA, 1960

  23. [23]

    The variational formulation of the fokker– planck equation

    Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formulation of the fokker– planck equation. SIAM journal on mathematical analysis, 29(1):1–17, 1998

  24. [24]

    Elucidating the design space of diffusion-based generative models

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. Advances in neural information processing systems, 35: 26565–26577, 2022

  25. [25]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Interna- tional Conference on Learning Representations (ICLR), 2015

  26. [26]

    The Principles of Diffusion Models

    Chieh-Hsin Lai, Yang Song, Dongjun Kim, Yuki Mitsufuji, and Stefano Ermon. The principles of diffusion models. arXiv preprint arXiv:2510.21890, 2025

  27. [27]

    The MNIST database of handwritten digits

    Yann LeCun and Corinna Cortes. The MNIST database of handwritten digits. http: // yann. lecun. com/ exdb/ mnist/

  28. [28]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2023

  29. [29]

    Flow Matching Guide and Code

    Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky T.Q. Chen, David Lopez-Paz, Heli Ben-Hamu, and Itai Gat. Flow matching guide and code. arXiv preprint arXiv:2412.06264, 2024

  30. [30]

    Theodorou, Weili Nie, and Anima Anandkumar

    Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A. Theodorou, Weili Nie, and Anima Anandkumar. I2sb: image-to-image schrödinger bridge. In Proceedings of the 40th International Conference on Machine Learning, pages 22042–22062, 2023

  31. [31]

    Rectified Flow: A Marginal Preserving Approach to Optimal Transport

    Qiang Liu. Rectified flow: A marginal preserving approach to optimal transport. arXiv preprint arXiv:2209.14577, 2022

  32. [32]

    Flow straight and fast: Learning to generate and transfer data with rectified flow

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. In The Eleventh International Conference on Learning Representations (ICLR), 2023

  33. [33]

    Mehta, Sean P

    Fan Lu, Prashant G. Mehta, Sean P. Meyn, and Gergely Neu. Convex q-learning. In 2021 American Control Conference (ACC), pages 4749–4756. IEEE, 2021

  34. [34]

    Alan S. Manne. Linear programming and sequential decisions. Management Science, 6(3): 259–267, 1960. 11

  35. [35]

    Robert J. McCann. A convexity principle for interacting gases. Advances in mathematics, 128 (1):153–179, 1997

  36. [36]

    Action matching: Learning stochastic dynamics from samples

    Kirill Neklyudov, Rob Brekelmans, Daniel Severo, and Alireza Makhzani. Action matching: Learning stochastic dynamics from samples. In International conference on machine learning, pages 25858–25889, 2023

  37. [37]

    Offline rl via feature-occupancy gradient ascent

    Gergely Neu and Nneka Okolo. Offline rl via feature-occupancy gradient ascent. InInternational Conference on Artificial Intelligence and Statistics, pages 3637–3645, 2025

  38. [38]

    A unified view of entropy-regularized Markov decision processes

    Gergely Neu, Anders Jonsson, and Vicenç Gómez. A unified view of entropy-regularized Markov decision processes. arXiv preprint arXiv:1705.07798, 2017

  39. [39]

    Improved denoising diffusion probabilistic models

    Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pages 8162–8171, 2021

  40. [40]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011

  41. [41]

    arXiv, ://arxiv.org/abs/2512.06797, arXiv:2512.06797 [math], doi:10.48550/arXiv.2512.06797

    Gabriel Peyré. Optimal and diffusion transports in machine learning. arXiv preprint arXiv:2512.06797, 2025

  42. [42]

    Optimal transport for machine learners

    Gabriel Peyré. Optimal transport for machine learners. arXiv preprint arXiv:2505.06589, 2025

  43. [43]

    Puterman

    Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, April 1994

  44. [44]

    Variational inference with normalizing flows

    Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In International conference on machine learning, pages 1530–1538, 2015

  45. [45]

    The wasserstein proximal gradient algorithm

    Adil Salim, Anna Korba, and Giulia Luise. The wasserstein proximal gradient algorithm. Advances in Neural Information Processing Systems, 33:12356–12366, 2020

  46. [46]

    Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling

    Filippo Santambrogio. Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling. Progress in Nonlinear Differential Equations and Their Applications. Birkhäuser Cham, 2015

  47. [47]

    L1 and L∞ theory

    Filippo Santambrogio. L1 and L∞ theory. In Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling, pages 87–119. Springer, 2015

  48. [48]

    Über die Umkehrung der Naturgesetze

    Erwin Schrödinger. Über die Umkehrung der Naturgesetze. Sitzungsberichte der Preußischen Akademie der Wissenschaften. Physikalisch-mathematische Klasse, pages 144–153, 1931

  49. [49]

    Schweitzer and Abraham Seidman

    Paul J. Schweitzer and Abraham Seidman. Generalized polynomial approximations in Marko- vian decision processes. J. of Math. Anal. and Appl., 110:568–582, 1985

  50. [50]

    On duality theory of conic linear problems

    Alexander Shapiro. On duality theory of conic linear problems. Nonconvex Optimization and its Applications, 57:135–155, 2001

  51. [51]

    Diffusion Schrödinger bridge matching

    Yuyang Shi, Valentin De Bortoli, Andrew Campbell, and Arnaud Doucet. Diffusion Schrödinger bridge matching. Advances in neural information processing systems, 36:62183–62223, 2023

  52. [52]

    Deep unsuper- vised learning using nonequilibrium thermodynamics

    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265, 2015

  53. [53]

    Denoising diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021

  54. [54]

    Score-Based Generative Modeling through Stochastic Differential Equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020. 12

  55. [55]

    Sutton and Andrew G

    Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction (second edition). online draft, 2018

  56. [56]

    Introduction to optimal transport

    Matthew Thorpe. Introduction to optimal transport

  57. [57]

    Improving and generalizing flow-based generative models with minibatch optimal transport

    Alexander Tong, Kilian Fatras, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector- Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport. Transactions on Machine Learning Research, 2024

  58. [58]

    Simulation-free schrödinger bridges via score and flow matching

    Alexander Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yanlei Zhang, Guillaume Huguet, Guy Wolf, and Yoshua Bengio. Simulation-free schrödinger bridges via score and flow matching. In International Conference on Artificial Intelligence and Statistics , pages 1279–1287, 2024

  59. [59]

    Topics in optimal transportation, volume 58

    Cédric Villani. Topics in optimal transportation, volume 58. American Mathematical Soc., 2003

  60. [60]

    Scipy 1.0: fundamental algorithms for scientific computing in python

    Pauli Virtanen, Ralf Gommers, Travis E Oliphant, Matt Haberland, Tyler Reddy, David Courna- peau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, et al. Scipy 1.0: fundamental algorithms for scientific computing in python. Nature methods, 17(3):261–272, 2020

  61. [61]

    Bayesian learning via stochastic gradient langevin dynamics

    Max Welling and Yee Whye Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11) , pages 681–688, 2011

  62. [62]

    entropic OT

    Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Diffusion models: A comprehensive survey of methods and applications. ACM computing surveys, 56(4):1–39, 2023. 13 Appendix Contents A Related work 14 B Discrete-time dynamic optimal transport 15 B.1 Definitions of OT problems . . . . . . . ...

  63. [63]

    moons”: We use the standard “two moons

    By strong duality (Lemma B.3) and the fact that (πH)#νsrc = νtgt by feasibility, we have H + 1 2 HX h=0 Z ∥π⋆ h(x) − x∥2 d(πh−1#νsrc)(x) = Z V ⋆ 0 (x)dνsrc(x) − Z V ⋆ H+1(x)dνtgt = Z V ⋆ 0 (x)dνsrc(x) − Z V ⋆ H+1(x)d((πH)#νsrc)(x) = Z (V ⋆ 0 (x) − V ⋆ H+1(πH(x))dνsrc(x) = HX h=0 Z V ⋆ h (πh−1(x)) − V ⋆ h+1(π⋆ h(πh−1(x))) dνsrc(x) = HX h=0 Z V ⋆ h (x) − V ...