Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans

Christian Wald; Gabriele Steidl

arxiv: 2501.16839 · v7 · submitted 2025-01-28 · 💻 cs.LG · math.PR

Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans

Christian Wald , Gabriele Steidl This is my paper

Pith reviewed 2026-05-23 04:35 UTC · model grok-4.3

classification 💻 cs.LG math.PR

keywords flow matchingvelocity fieldsWasserstein geometrytransport plansMarkov kernelsstochastic processesgenerative modelsinverse problems

0 comments

The pith

Velocity fields in flow matching can be characterized and learned from transport plans, Markov kernels, or stochastic processes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This review paper examines mathematical ways to obtain the velocity fields that drive flow matching models from a latent distribution to a target one. It focuses on absolutely continuous curves in Wasserstein geometry and shows that the fields arise from couplings between the distributions, from Markov kernels, and from stochastic processes. The latter two frameworks contain the coupling method but allow more general constructions. Readers would care because the characterizations supply concrete training objectives and extend the technique to Bayesian inverse problems through conditional Wasserstein distances.

Core claim

The paper shows how the velocity fields can be characterized and learned via i) transport plans (couplings) between latent and target distributions, ii) Markov kernels and iii) stochastic processes, where the latter two include the coupling approach, but are in general broader. It further demonstrates that flow matching can solve Bayesian inverse problems when conditional Wasserstein distances are defined, and contrasts the approach with continuous normalizing flows and score matching.

What carries the argument

Velocity fields of absolutely continuous curves in the Wasserstein geometry, obtained from transport plans, Markov kernels, or stochastic processes.

If this is right

A single velocity field admits equivalent representations as the expectation under a coupling, under a Markov kernel, or under a stochastic process.
Markov kernels and stochastic processes supply strictly larger classes of admissible velocity fields than couplings alone.
Flow matching directly yields solvers for Bayesian inverse problems once conditional Wasserstein distances are introduced.
Continuous normalizing flows and score matching constitute alternative routes to the same velocity-field learning problem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The three characterizations could be combined inside one training loop to improve numerical stability when data are limited.
The perspective may clarify why flow matching scales better than some competing generative methods on high-dimensional data.
Generalizing the same velocity-field constructions beyond Euclidean Wasserstein space could cover manifold-valued or discrete distributions.

Load-bearing premise

The velocity fields correspond to absolutely continuous curves in the Wasserstein geometry.

What would settle it

A concrete transport plan, Markov kernel, or stochastic process whose induced velocity field, when integrated in the ODE, fails to transport samples from the latent distribution to the target distribution.

Figures

Figures reproduced from arXiv: 2501.16839 by Christian Wald, Gabriele Steidl.

**Figure 2.** Figure 2: Similarly, we define for a measure α ∈ P(R d × R d ) with marginal π 2 ♯ α = ν the disintegration of α with respect to π 2 as α = α y ×y ν. The notation of disintegration is directly related to Markov kernels. A Markov kernel is a map K : R d × B(R d ) → R such that i) K(x, ·) is a probability measure on R d for every x ∈ R d , and ii) K(·, B) is a Borel measurable map for every B ∈ B(R d ). 11 [PITH_FULL… view at source ↗

**Figure 2.** Figure 2: Disintegration of the measure α ∈ P(R × R) (left). Measures α −0.3 ∈ P(R) (middle, green) and α 0.2 ∈ P(R) (right, red). Hence, given a probability measure µ ∈ P(R d ), we can define a new measure α := α x ×x µ ∈ P(R d × R d ) by Z Rd×Rd f(x, y) dα(x, y) := Z Rd Z Rd f(x, y) dK(x, ·)(y)dµ(x) for all measurable, bounded functions f. Identifying α x (B) with K(x, B), we see that conversely, {α x}x is the dis… view at source ↗

**Figure 3.** Figure 3: Plan/Coupling of two discrete measures µ and ν (left) and Markov kernel/disintegration (right) with row and column sums. 2.4 Couplings and Wasserstein Distance Let P2(R d ) := {µ ∈ P(R d ) : Z Rd ∥x∥ 2 dµ < ∞} be the probability measures with finite second moments. For µ, ν ∈ P2(R d ), we define the set of plans or couplings with marginals µ and ν by Γ(µ, ν) := n α ∈ P(R d × R d ) : π 1 ♯ α = µ, π2 ♯ α = … view at source ↗

**Figure 4.** Figure 4: Curve induced by α = (Id, T)♯µ0 from µ0 = δx0 , resp. µ0 = 1 2 (δx0 + δx1 ) to µ1 = 1 2 (δy0 + δy1 ). In (c), at the crossing time s of the path, there does not exist a map Ts that induces an element in Γo(µs, µ1). Red arrows: vector fields computed via (23) . First of all, curves induced by plans are narrowly continuous, as the following lemma shows. Lemma 4.3. Let µ0, µ1 ∈ P2(R d ) and α ∈ Γ(µ0, µ1). The… view at source ↗

**Figure 5.** Figure 5: Illustration to Example 4.9. Both vector fields generate the same curves [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

**Figure 6.** Figure 6: Curve and vector field associated to α = µ0 × µ1, where µ0 = 1 2 δx0 + 1 2 δx1 and µ1 = 1 3 δy0 + 2 3 δy1 . Vectors are scaled by 0.2 for better visibility. We have already seen that vector fields vt associated to optimal plans are minimal ones, meaning that vt ∈ Tµt . This is in general not true for an independent coupling α = µ0×µ1 with an arbitrary µ0, see [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗

**Figure 7.** Figure 7: Trajectories of points from a vector field [PITH_FULL_IMAGE:figures/full_fig_p044_7.png] view at source ↗

**Figure 8.** Figure 8: Single trajectory for a flow matching model trained on cat images. For [PITH_FULL_IMAGE:figures/full_fig_p044_8.png] view at source ↗

**Figure 9.** Figure 9: Consider η = 1 2 δ0 + 1 2 δ1, µn = 1 2 δ0,n + 1 2 δ1,0 and νn = 1 2 δ0,0 + 1 2 δ1,n. Then W2(µn, νn) = 1 is the length of −→ and −→ indicates the optimal transport map. Furthermore W2,η(µn, νn) = n is the length of −→ and −→ indicates the optimal transport for W2,η. Note that ii) means, for any f ∈ Cb(R d ), that Z (Rm×Rd) 2 f dα = Z Rm Z Rm×Rd×Rd f(w1, x1, w2, x2) dδw1 (w2)dαw1 (x1, x2)dη(w1) = Z Rm Z Rd×… view at source ↗

**Figure 10.** Figure 10: Consider η = 1 2 δ0+ 1 2 δ1, µ0 = 1 2 δ0,5+ 1 2 δ1,0 and µ1 = 1 2 δ0,0 + 1 2 δ1,5. (a) Geodesic with respect to W2, green: µ 1 2 . (b) Geodesic with respect to W2,η, green: µ 1 2 . 9.2 Almost Conditional Couplings One drawback of the space Pη(R m × R d ) is that we can in general not approximate µ ∈ Pη(R m × R d ) by an empirical measure, if η is not empirical. In other words, an empirical approximation µ… view at source ↗

**Figure 11.** Figure 11: (Bayesian) Flow matching on Cifar10. α ∈ Γ(PY ×N (0, I5), PY,X) by sampling xi ∼ PX, computing yi := f(xi) +ξi ∼ PY for ξi ∼ N (0, 0.1 Id) and zi ∼ N (0,Id). We then use (yi , zi , yi , xi) in order to approximate α and to compute CFM(θ). Batching in i and t is done as in Algorithm 1. Training minibatch OT Bayesian flow matching with respect to W2,100. Here we use Algorithm 2 for µ0 = PY × N (0,Id) and µ1… view at source ↗

**Figure 12.** Figure 12: Bayesian flow matching for an inverse problem. [PITH_FULL_IMAGE:figures/full_fig_p053_12.png] view at source ↗

**Figure 13.** Figure 13: Trajectories of points from a vector field [PITH_FULL_IMAGE:figures/full_fig_p060_13.png] view at source ↗

read the original abstract

Among generative neural models, flow matching techniques stand out for their simple applicability and good scaling properties. Here, velocity fields of curves connecting a simple latent and a target distribution are learned. Then the corresponding ordinary differential equation can be used to sample from a target distribution, starting in samples from the latent one. This paper reviews from a mathematical point of view different techniques to learn the velocity fields of absolutely continuous curves in the Wasserstein geometry. We show how the velocity fields can be characterized and learned via i) transport plans (couplings) between latent and target distributions, ii) Markov kernels and iii) stochastic processes, where the latter two include the coupling approach, but are in general broader. Besides this main goal, we show how flow matching can be used for solving Bayesian inverse problems, where the definition of conditional Wasserstein distances plays a central role. Finally, we briefly address continuous normalizing flows and score matching techniques, which approach the learning of velocity fields of curves from other directions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This review organizes flow matching around transport plans, kernels, and processes with a side note on inverse problems, but adds no new results.

read the letter

The main thing to know is that this is a review paper. It lays out three routes to characterize and learn velocity fields for absolutely continuous curves in Wasserstein space: transport plans between latent and target distributions, Markov kernels, and stochastic processes. The last two are described as strictly containing the coupling approach while being more general. It also sketches how flow matching might apply to Bayesian inverse problems through conditional Wasserstein distances and briefly contrasts the approach with continuous normalizing flows and score matching.

Referee Report

0 major / 2 minor

Summary. The paper reviews mathematical techniques for characterizing and learning velocity fields of absolutely continuous curves in the Wasserstein geometry for flow matching generative models. It presents three routes—transport plans (couplings) between latent and target distributions, Markov kernels, and stochastic processes (with the latter two containing the coupling approach but being broader)—and applies the framework to Bayesian inverse problems via conditional Wasserstein distances, while briefly contrasting with continuous normalizing flows and score matching.

Significance. As a review synthesizing characterizations of velocity fields, the manuscript provides a unified perspective that could clarify relationships among flow matching variants and support applications in generative modeling and inverse problems. The explicit inclusion of Markov kernels and stochastic processes as strictly more general than couplings is a useful organizing principle if the derivations hold under standard Wasserstein assumptions.

minor comments (2)

[Abstract] The abstract and introduction could more explicitly state the target audience (e.g., machine learning practitioners versus measure theorists) to help readers gauge the level of technical detail.
[Introduction] Notation for the velocity field v_t and the curve μ_t is introduced without a dedicated preliminary section; a short notation table would improve readability for the characterizations in Sections 3–5.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, including its significance as a unifying review of flow matching characterizations and its recommendation to accept. No major comments were raised that require specific responses or revisions.

Circularity Check

0 steps flagged

No significant circularity: review of established characterizations

full rationale

This is a review paper that surveys existing mathematical characterizations of velocity fields for absolutely continuous curves in Wasserstein space via transport plans, Markov kernels, and stochastic processes. No new derivations, parameter fits, or uniqueness theorems are introduced that could reduce to self-definition or self-citation chains. All load-bearing steps rely on standard Wasserstein geometry results external to the paper, with the central claim being a comparative organization of known approaches rather than a self-referential prediction or ansatz.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a review and does not introduce new free parameters, axioms, or invented entities in the provided abstract.

pith-pipeline@v0.9.0 · 5693 in / 1048 out tokens · 23698 ms · 2026-05-23T04:35:11.260875+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

velocity fields of absolutely continuous curves in the Wasserstein geometry... via i) transport plans (couplings)... ii) Markov kernels... iii) stochastic processes
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 3.3... absolutely continuous if and only if there exists a Borel measurable vector field v... (CE)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 4 internal anchors

[1]

M. S. Albergo, N. M. Boffi, and E. Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.arXiv preprint arXiv:2303.08797, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Buildingnormalizingflowswithstochastic interpolants

M.S.AlbergoandE.Vanden-Eijnden. Buildingnormalizingflowswithstochastic interpolants. InThe Eleventh International Conference on Learning Represen- tations, 2023

work page 2023
[3]

Ambrosio, E

L. Ambrosio, E. Brué, and D. Semola.Lectures on Optimal Transport. UNI- TEXT. Springer International Publishing, 2021

work page 2021
[4]

Ambrosio, N

L. Ambrosio, N. Gigli, and G. Savaré.Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2005

work page 2005
[5]

Ardizzone, J

L. Ardizzone, J. Kruse, C. Rother, and U. Köthe. Analyzing inverse problems with invertible neural networks. In7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019

work page 2019
[6]

Barboni, G

R. Barboni, G. Peyré, and F.-X. Vialard. Understanding the training of in- finitely deep and wide resnets with conditional optimal transport.arXiv preprint arXiv:2403.12887, 2024

work page arXiv 2024
[7]

Bertrand, R

Q. Bertrand, R. Emonet, A. Gagneux, S. Martin, and M. Massias. A visual dive into conditional flow matching.https://dl.heeere.com/ conditional-flow-matching/blog/conditional-flow-matching/

work page
[8]

V. I. Bogachev and M. A. S. Ruas.Measure Theory, volume 1. Springer, 2007

work page 2007
[9]

Chemseddine, P

J. Chemseddine, P. Hagemann, C. Wald, and G. Steidl. Conditional Wasser- stein distances with applications in Bayesian OT flow matching.arXiv preprint arXiv:2403.18705, 2024. 62

work page arXiv 2024
[10]

R. Chen, J. Behrmann, D. K. Duvenaud, and J.-H. Jacobsen. Residual flows for invertible generative modeling. InAdvances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

work page 2019
[11]

Chen, Y.Rubanova, J

R. Chen, Y.Rubanova, J. Bettencourt, and D. Duvenaud. Neural ordinary differ- ential equations.Advances in Neural Information Processing Systems, 31, 2018

work page 2018
[12]

R. T. Q. Chen. torchdiffeq, 2018

work page 2018
[13]

Daras, A

G. Daras, A. G. Dimakis, and C. Daskalakis. Consistent diffusion meets tweedie: Training exact ambient diffusion models with noisy data.arXiv preprint arXiv:2404.10177, 2024

work page arXiv 2024
[14]

L. Dinh, J. Sohl-Dickstein, and S. Bengio. Density estimation using real NVP. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017

work page 2017
[15]

N. Gigli. On the geometry of the space of probability measures endowed with the quadratic Optimal Transport distance.PhD Thesis, 2008. cvgmt preprint

work page 2008
[16]

González-Sanz and S

A. González-Sanz and S. Sheng. Linearization of monge-amp\ere equations and data science applications.arXiv preprint arXiv:2408.06534, 2024

work page arXiv 2024
[17]

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets.Advances in Neural Information Processing Systems, pages 2672–2680, 2014

work page 2014
[18]

Hagemann, J

P. Hagemann, J. Hertrich, and G. Steidl. Generalized normalizing flows via Markov chains. InNon-local Data Interactions: Foundations and Applications. Cambridge University Press, 2022

work page 2022
[19]

Hagemann, J

P. Hagemann, J. Hertrich, and G. Steidl. Stochastic normalizing flows for inverse problems: A Markov chains viewpoint.SIAM/ASA Journal on Uncertainty Quantification, 10(3):1162–1190, 2022

work page 2022
[20]

Hagemann and S

P. Hagemann and S. Neumayer. Stabilizing invertible neural networks using mixture models.Inverse Problems, 37(8), 2021

work page 2021
[21]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recogni- tion. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

work page 2016
[22]

Holderrieth, M

P. Holderrieth, M. Havasi, J. Yim, N. Shaul, I. Gat, T. Jaakkola, B. Karrer, R. T. Q. Chen, and Y. Lipman. Generator matching: Generative modeling with arbitrary markov processes. InICLR, 2025. 63

work page 2025
[23]

Hosseini, A

B. Hosseini, A. W. Hsu, and A. Taghvaei. Conditional optimal transport on function spaces.arXiv preprint arXiv:2311.05672, 2024

work page arXiv 2024
[24]

T. Jahn, J. Chemseddine, P. Hagemann, C. Wald, and G. Steidl. Trajectory generator matching for time series.arXiv preprint arXiv:2505.23215, 2025

work page arXiv 2025
[25]

Kallenberg and O

O. Kallenberg and O. Kallenberg.Foundations of Modern Probability, volume 2. Springer, 1997

work page 1997
[26]

Dynamicconditionaloptimaltransport through simulation-free flows.arXiv preprint arXiv:2404.04240, 2024

G.Kerrigan, G.Migliorini, andP.Smyth. Dynamicconditionaloptimaltransport through simulation-free flows.arXiv preprint arXiv:2404.04240, 2024

work page arXiv 2024
[27]

D. P. Kingma and M. Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[28]

B. R. Kloeckner. Extensions with shrinking fibers.Ergodic Theory and Dynam- ical Systems, 41(6):1795–1834, 2021

work page 2021
[29]

Krizhevsky, G

A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009
[30]

Lipman, R

Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[31]

Q. Liu. Rectified flow: A marginal preserving approach to optimal transport. arXiv preprint arXiv:2209.14577, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[32]

X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[33]

Martin, A

S. Martin, A. Gagneux, P. Hagemann, and G. Steidl. PnP-flow: Plug-and-play image restoration with flow matching.ICLR, 2025

work page 2025
[34]

Peszek and D

J. Peszek and D. Poyato. Heterogeneous gradient flows in the topology of fibered optimal transport.Calculus of Variations and Partial Differential Equations, 62(9):258, 2023

work page 2023
[35]

Peyré, M

G. Peyré, M. Cuturi, et al. Computational optimal transport: With applications todatascience.Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019

work page 2019
[36]

Plonka, D

G. Plonka, D. Potts, G. Steidl, and M. Tasche.Numerical Fourier Analysis. Applied and Numerical Harmonic Analysis. Birkhäuser, second edition, 2023

work page 2023
[37]

M. Poli, S. Massaroli, A. Yamashita, H. Asama, J. Park, and S. Ermon. Torch- dyn: Implicit models and neural numerical methods in pytorch. 64

work page
[38]

Santambrogio

F. Santambrogio. Optimal Transport for applied mathematicians.Birkäuser, 2015

work page 2015
[39]

I. Schurov. Adjoint state method, backpropagation and neural odes.https: //ilya.schurov.com/post/adjoint-method/

work page
[40]

Sohl-Dickstein, E

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli. Deep unsu- pervised learning using nonequilibrium thermodynamics. In F. Bach and D. Blei, editors,Proceedings of the 32nd International Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR

work page 2015
[41]

Y. Song, C. Durkan, I. Murray, and S. Ermon. Maximum likelihood training of score-based diffusion models. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors,Advances in Neural Information Processing Systems, 2021

work page 2021
[42]

Generative Modeling by Estimating Gradients of the Data Distribution

Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution.ArXiv 1907.05600, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[43]

Teschl.Ordinary Differential Equations and Dynamical Systems, volume 140

G. Teschl.Ordinary Differential Equations and Dynamical Systems, volume 140. American Mathematical Society, 2024

work page 2024
[44]

A. Tong, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, K. Fatras, G. Wolf, and Y. Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport. InICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023

work page 2023
[45]

P. Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011

work page 2011
[46]

H. Wu, J. Köhler, and F. Noé. Stochastic normalizing flows. In H. Larochelle, M. A. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems 2020, 2020

work page 2020
[47]

Zhang, P

Y. Zhang, P. Yu, Y. Zhu, Y. Chang, F. Gao, Y. N. Wu, and O. Leong. Flow priors for linear inverse problems via iterative corrupted trajectory matching. arXiv preprint arXiv:2405.18816, 2024. A Proof of Theorem 3.1 Recall that a familyAof subsets of a setXis calledmonotone class, if • ⋃∞ i=1An∈Afor every increasing sequenceAi∈A, and 65 • ⋂∞ i=1Ai∈Afor ever...

work page arXiv 2024

[1] [1]

M. S. Albergo, N. M. Boffi, and E. Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.arXiv preprint arXiv:2303.08797, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Buildingnormalizingflowswithstochastic interpolants

M.S.AlbergoandE.Vanden-Eijnden. Buildingnormalizingflowswithstochastic interpolants. InThe Eleventh International Conference on Learning Represen- tations, 2023

work page 2023

[3] [3]

Ambrosio, E

L. Ambrosio, E. Brué, and D. Semola.Lectures on Optimal Transport. UNI- TEXT. Springer International Publishing, 2021

work page 2021

[4] [4]

Ambrosio, N

L. Ambrosio, N. Gigli, and G. Savaré.Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2005

work page 2005

[5] [5]

Ardizzone, J

L. Ardizzone, J. Kruse, C. Rother, and U. Köthe. Analyzing inverse problems with invertible neural networks. In7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019

work page 2019

[6] [6]

Barboni, G

R. Barboni, G. Peyré, and F.-X. Vialard. Understanding the training of in- finitely deep and wide resnets with conditional optimal transport.arXiv preprint arXiv:2403.12887, 2024

work page arXiv 2024

[7] [7]

Bertrand, R

Q. Bertrand, R. Emonet, A. Gagneux, S. Martin, and M. Massias. A visual dive into conditional flow matching.https://dl.heeere.com/ conditional-flow-matching/blog/conditional-flow-matching/

work page

[8] [8]

V. I. Bogachev and M. A. S. Ruas.Measure Theory, volume 1. Springer, 2007

work page 2007

[9] [9]

Chemseddine, P

J. Chemseddine, P. Hagemann, C. Wald, and G. Steidl. Conditional Wasser- stein distances with applications in Bayesian OT flow matching.arXiv preprint arXiv:2403.18705, 2024. 62

work page arXiv 2024

[10] [10]

R. Chen, J. Behrmann, D. K. Duvenaud, and J.-H. Jacobsen. Residual flows for invertible generative modeling. InAdvances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

work page 2019

[11] [11]

Chen, Y.Rubanova, J

R. Chen, Y.Rubanova, J. Bettencourt, and D. Duvenaud. Neural ordinary differ- ential equations.Advances in Neural Information Processing Systems, 31, 2018

work page 2018

[12] [12]

R. T. Q. Chen. torchdiffeq, 2018

work page 2018

[13] [13]

Daras, A

G. Daras, A. G. Dimakis, and C. Daskalakis. Consistent diffusion meets tweedie: Training exact ambient diffusion models with noisy data.arXiv preprint arXiv:2404.10177, 2024

work page arXiv 2024

[14] [14]

L. Dinh, J. Sohl-Dickstein, and S. Bengio. Density estimation using real NVP. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017

work page 2017

[15] [15]

N. Gigli. On the geometry of the space of probability measures endowed with the quadratic Optimal Transport distance.PhD Thesis, 2008. cvgmt preprint

work page 2008

[16] [16]

González-Sanz and S

A. González-Sanz and S. Sheng. Linearization of monge-amp\ere equations and data science applications.arXiv preprint arXiv:2408.06534, 2024

work page arXiv 2024

[17] [17]

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets.Advances in Neural Information Processing Systems, pages 2672–2680, 2014

work page 2014

[18] [18]

Hagemann, J

P. Hagemann, J. Hertrich, and G. Steidl. Generalized normalizing flows via Markov chains. InNon-local Data Interactions: Foundations and Applications. Cambridge University Press, 2022

work page 2022

[19] [19]

Hagemann, J

P. Hagemann, J. Hertrich, and G. Steidl. Stochastic normalizing flows for inverse problems: A Markov chains viewpoint.SIAM/ASA Journal on Uncertainty Quantification, 10(3):1162–1190, 2022

work page 2022

[20] [20]

Hagemann and S

P. Hagemann and S. Neumayer. Stabilizing invertible neural networks using mixture models.Inverse Problems, 37(8), 2021

work page 2021

[21] [21]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recogni- tion. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

work page 2016

[22] [22]

Holderrieth, M

P. Holderrieth, M. Havasi, J. Yim, N. Shaul, I. Gat, T. Jaakkola, B. Karrer, R. T. Q. Chen, and Y. Lipman. Generator matching: Generative modeling with arbitrary markov processes. InICLR, 2025. 63

work page 2025

[23] [23]

Hosseini, A

B. Hosseini, A. W. Hsu, and A. Taghvaei. Conditional optimal transport on function spaces.arXiv preprint arXiv:2311.05672, 2024

work page arXiv 2024

[24] [24]

T. Jahn, J. Chemseddine, P. Hagemann, C. Wald, and G. Steidl. Trajectory generator matching for time series.arXiv preprint arXiv:2505.23215, 2025

work page arXiv 2025

[25] [25]

Kallenberg and O

O. Kallenberg and O. Kallenberg.Foundations of Modern Probability, volume 2. Springer, 1997

work page 1997

[26] [26]

Dynamicconditionaloptimaltransport through simulation-free flows.arXiv preprint arXiv:2404.04240, 2024

G.Kerrigan, G.Migliorini, andP.Smyth. Dynamicconditionaloptimaltransport through simulation-free flows.arXiv preprint arXiv:2404.04240, 2024

work page arXiv 2024

[27] [27]

D. P. Kingma and M. Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[28] [28]

B. R. Kloeckner. Extensions with shrinking fibers.Ergodic Theory and Dynam- ical Systems, 41(6):1795–1834, 2021

work page 2021

[29] [29]

Krizhevsky, G

A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009

[30] [30]

Lipman, R

Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023

[31] [31]

Q. Liu. Rectified flow: A marginal preserving approach to optimal transport. arXiv preprint arXiv:2209.14577, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[32] [32]

X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023

[33] [33]

Martin, A

S. Martin, A. Gagneux, P. Hagemann, and G. Steidl. PnP-flow: Plug-and-play image restoration with flow matching.ICLR, 2025

work page 2025

[34] [34]

Peszek and D

J. Peszek and D. Poyato. Heterogeneous gradient flows in the topology of fibered optimal transport.Calculus of Variations and Partial Differential Equations, 62(9):258, 2023

work page 2023

[35] [35]

Peyré, M

G. Peyré, M. Cuturi, et al. Computational optimal transport: With applications todatascience.Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019

work page 2019

[36] [36]

Plonka, D

G. Plonka, D. Potts, G. Steidl, and M. Tasche.Numerical Fourier Analysis. Applied and Numerical Harmonic Analysis. Birkhäuser, second edition, 2023

work page 2023

[37] [37]

M. Poli, S. Massaroli, A. Yamashita, H. Asama, J. Park, and S. Ermon. Torch- dyn: Implicit models and neural numerical methods in pytorch. 64

work page

[38] [38]

Santambrogio

F. Santambrogio. Optimal Transport for applied mathematicians.Birkäuser, 2015

work page 2015

[39] [39]

I. Schurov. Adjoint state method, backpropagation and neural odes.https: //ilya.schurov.com/post/adjoint-method/

work page

[40] [40]

Sohl-Dickstein, E

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli. Deep unsu- pervised learning using nonequilibrium thermodynamics. In F. Bach and D. Blei, editors,Proceedings of the 32nd International Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR

work page 2015

[41] [41]

Y. Song, C. Durkan, I. Murray, and S. Ermon. Maximum likelihood training of score-based diffusion models. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors,Advances in Neural Information Processing Systems, 2021

work page 2021

[42] [42]

Generative Modeling by Estimating Gradients of the Data Distribution

Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution.ArXiv 1907.05600, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[43] [43]

Teschl.Ordinary Differential Equations and Dynamical Systems, volume 140

G. Teschl.Ordinary Differential Equations and Dynamical Systems, volume 140. American Mathematical Society, 2024

work page 2024

[44] [44]

A. Tong, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, K. Fatras, G. Wolf, and Y. Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport. InICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023

work page 2023

[45] [45]

P. Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011

work page 2011

[46] [46]

H. Wu, J. Köhler, and F. Noé. Stochastic normalizing flows. In H. Larochelle, M. A. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems 2020, 2020

work page 2020

[47] [47]

Zhang, P

Y. Zhang, P. Yu, Y. Zhu, Y. Chang, F. Gao, Y. N. Wu, and O. Leong. Flow priors for linear inverse problems via iterative corrupted trajectory matching. arXiv preprint arXiv:2405.18816, 2024. A Proof of Theorem 3.1 Recall that a familyAof subsets of a setXis calledmonotone class, if • ⋃∞ i=1An∈Afor every increasing sequenceAi∈A, and 65 • ⋂∞ i=1Ai∈Afor ever...

work page arXiv 2024