pith. sign in

arxiv: 2501.16839 · v7 · submitted 2025-01-28 · 💻 cs.LG · math.PR

Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans

Pith reviewed 2026-05-23 04:35 UTC · model grok-4.3

classification 💻 cs.LG math.PR
keywords flow matchingvelocity fieldsWasserstein geometrytransport plansMarkov kernelsstochastic processesgenerative modelsinverse problems
0
0 comments X

The pith

Velocity fields in flow matching can be characterized and learned from transport plans, Markov kernels, or stochastic processes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This review paper examines mathematical ways to obtain the velocity fields that drive flow matching models from a latent distribution to a target one. It focuses on absolutely continuous curves in Wasserstein geometry and shows that the fields arise from couplings between the distributions, from Markov kernels, and from stochastic processes. The latter two frameworks contain the coupling method but allow more general constructions. Readers would care because the characterizations supply concrete training objectives and extend the technique to Bayesian inverse problems through conditional Wasserstein distances.

Core claim

The paper shows how the velocity fields can be characterized and learned via i) transport plans (couplings) between latent and target distributions, ii) Markov kernels and iii) stochastic processes, where the latter two include the coupling approach, but are in general broader. It further demonstrates that flow matching can solve Bayesian inverse problems when conditional Wasserstein distances are defined, and contrasts the approach with continuous normalizing flows and score matching.

What carries the argument

Velocity fields of absolutely continuous curves in the Wasserstein geometry, obtained from transport plans, Markov kernels, or stochastic processes.

If this is right

  • A single velocity field admits equivalent representations as the expectation under a coupling, under a Markov kernel, or under a stochastic process.
  • Markov kernels and stochastic processes supply strictly larger classes of admissible velocity fields than couplings alone.
  • Flow matching directly yields solvers for Bayesian inverse problems once conditional Wasserstein distances are introduced.
  • Continuous normalizing flows and score matching constitute alternative routes to the same velocity-field learning problem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The three characterizations could be combined inside one training loop to improve numerical stability when data are limited.
  • The perspective may clarify why flow matching scales better than some competing generative methods on high-dimensional data.
  • Generalizing the same velocity-field constructions beyond Euclidean Wasserstein space could cover manifold-valued or discrete distributions.

Load-bearing premise

The velocity fields correspond to absolutely continuous curves in the Wasserstein geometry.

What would settle it

A concrete transport plan, Markov kernel, or stochastic process whose induced velocity field, when integrated in the ODE, fails to transport samples from the latent distribution to the target distribution.

Figures

Figures reproduced from arXiv: 2501.16839 by Christian Wald, Gabriele Steidl.

Figure 1
Figure 1. Figure 1: Illustration of a curve from the standard Gaussian distribution to a Gaussian [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Similarly, we define for a measure α ∈ P(R d × R d ) with marginal π 2 ♯ α = ν the disintegration of α with respect to π 2 as α = α y ×y ν. The notation of disintegration is directly related to Markov kernels. A Markov kernel is a map K : R d × B(R d ) → R such that i) K(x, ·) is a probability measure on R d for every x ∈ R d , and ii) K(·, B) is a Borel measurable map for every B ∈ B(R d ). 11 [PITH_FULL… view at source ↗
Figure 2
Figure 2. Figure 2: Disintegration of the measure α ∈ P(R × R) (left). Measures α −0.3 ∈ P(R) (middle, green) and α 0.2 ∈ P(R) (right, red). Hence, given a probability measure µ ∈ P(R d ), we can define a new measure α := α x ×x µ ∈ P(R d × R d ) by Z Rd×Rd f(x, y) dα(x, y) := Z Rd Z Rd f(x, y) dK(x, ·)(y)dµ(x) for all measurable, bounded functions f. Identifying α x (B) with K(x, B), we see that conversely, {α x}x is the dis… view at source ↗
Figure 3
Figure 3. Figure 3: Plan/Coupling of two discrete measures µ and ν (left) and Markov ker￾nel/disintegration (right) with row and column sums. 2.4 Couplings and Wasserstein Distance Let P2(R d ) := {µ ∈ P(R d ) : Z Rd ∥x∥ 2 dµ < ∞} be the probability measures with finite second moments. For µ, ν ∈ P2(R d ), we define the set of plans or couplings with marginals µ and ν by Γ(µ, ν) := n α ∈ P(R d × R d ) : π 1 ♯ α = µ, π2 ♯ α = … view at source ↗
Figure 4
Figure 4. Figure 4: Curve induced by α = (Id, T)♯µ0 from µ0 = δx0 , resp. µ0 = 1 2 (δx0 + δx1 ) to µ1 = 1 2 (δy0 + δy1 ). In (c), at the crossing time s of the path, there does not exist a map Ts that induces an element in Γo(µs, µ1). Red arrows: vector fields computed via (23) . First of all, curves induced by plans are narrowly continuous, as the following lemma shows. Lemma 4.3. Let µ0, µ1 ∈ P2(R d ) and α ∈ Γ(µ0, µ1). The… view at source ↗
Figure 5
Figure 5. Figure 5: Illustration to Example 4.9. Both vector fields generate the same curves [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Curve and vector field associated to α = µ0 × µ1, where µ0 = 1 2 δx0 + 1 2 δx1 and µ1 = 1 3 δy0 + 2 3 δy1 . Vectors are scaled by 0.2 for better visibility. We have already seen that vector fields vt associated to optimal plans are minimal ones, meaning that vt ∈ Tµt . This is in general not true for an independent coupling α = µ0×µ1 with an arbitrary µ0, see [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Trajectories of points from a vector field [PITH_FULL_IMAGE:figures/full_fig_p044_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Single trajectory for a flow matching model trained on cat images. For [PITH_FULL_IMAGE:figures/full_fig_p044_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Consider η = 1 2 δ0 + 1 2 δ1, µn = 1 2 δ0,n + 1 2 δ1,0 and νn = 1 2 δ0,0 + 1 2 δ1,n. Then W2(µn, νn) = 1 is the length of −→ and −→ indicates the optimal transport map. Furthermore W2,η(µn, νn) = n is the length of −→ and −→ indicates the optimal transport for W2,η. Note that ii) means, for any f ∈ Cb(R d ), that Z (Rm×Rd) 2 f dα = Z Rm Z Rm×Rd×Rd f(w1, x1, w2, x2) dδw1 (w2)dαw1 (x1, x2)dη(w1) = Z Rm Z Rd×… view at source ↗
Figure 10
Figure 10. Figure 10: Consider η = 1 2 δ0+ 1 2 δ1, µ0 = 1 2 δ0,5+ 1 2 δ1,0 and µ1 = 1 2 δ0,0 + 1 2 δ1,5. (a) Geodesic with respect to W2, green: µ 1 2 . (b) Geodesic with respect to W2,η, green: µ 1 2 . 9.2 Almost Conditional Couplings One drawback of the space Pη(R m × R d ) is that we can in general not approximate µ ∈ Pη(R m × R d ) by an empirical measure, if η is not empirical. In other words, an empirical approximation µ… view at source ↗
Figure 11
Figure 11. Figure 11: (Bayesian) Flow matching on Cifar10. α ∈ Γ(PY ×N (0, I5), PY,X) by sampling xi ∼ PX, computing yi := f(xi) +ξi ∼ PY for ξi ∼ N (0, 0.1 Id) and zi ∼ N (0,Id). We then use (yi , zi , yi , xi) in order to approximate α and to compute CFM(θ). Batching in i and t is done as in Algorithm 1. Training minibatch OT Bayesian flow matching with respect to W2,100. Here we use Algorithm 2 for µ0 = PY × N (0,Id) and µ1… view at source ↗
Figure 12
Figure 12. Figure 12: Bayesian flow matching for an inverse problem. [PITH_FULL_IMAGE:figures/full_fig_p053_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Trajectories of points from a vector field [PITH_FULL_IMAGE:figures/full_fig_p060_13.png] view at source ↗
read the original abstract

Among generative neural models, flow matching techniques stand out for their simple applicability and good scaling properties. Here, velocity fields of curves connecting a simple latent and a target distribution are learned. Then the corresponding ordinary differential equation can be used to sample from a target distribution, starting in samples from the latent one. This paper reviews from a mathematical point of view different techniques to learn the velocity fields of absolutely continuous curves in the Wasserstein geometry. We show how the velocity fields can be characterized and learned via i) transport plans (couplings) between latent and target distributions, ii) Markov kernels and iii) stochastic processes, where the latter two include the coupling approach, but are in general broader. Besides this main goal, we show how flow matching can be used for solving Bayesian inverse problems, where the definition of conditional Wasserstein distances plays a central role. Finally, we briefly address continuous normalizing flows and score matching techniques, which approach the learning of velocity fields of curves from other directions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper reviews mathematical techniques for characterizing and learning velocity fields of absolutely continuous curves in the Wasserstein geometry for flow matching generative models. It presents three routes—transport plans (couplings) between latent and target distributions, Markov kernels, and stochastic processes (with the latter two containing the coupling approach but being broader)—and applies the framework to Bayesian inverse problems via conditional Wasserstein distances, while briefly contrasting with continuous normalizing flows and score matching.

Significance. As a review synthesizing characterizations of velocity fields, the manuscript provides a unified perspective that could clarify relationships among flow matching variants and support applications in generative modeling and inverse problems. The explicit inclusion of Markov kernels and stochastic processes as strictly more general than couplings is a useful organizing principle if the derivations hold under standard Wasserstein assumptions.

minor comments (2)
  1. [Abstract] The abstract and introduction could more explicitly state the target audience (e.g., machine learning practitioners versus measure theorists) to help readers gauge the level of technical detail.
  2. [Introduction] Notation for the velocity field v_t and the curve μ_t is introduced without a dedicated preliminary section; a short notation table would improve readability for the characterizations in Sections 3–5.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, including its significance as a unifying review of flow matching characterizations and its recommendation to accept. No major comments were raised that require specific responses or revisions.

Circularity Check

0 steps flagged

No significant circularity: review of established characterizations

full rationale

This is a review paper that surveys existing mathematical characterizations of velocity fields for absolutely continuous curves in Wasserstein space via transport plans, Markov kernels, and stochastic processes. No new derivations, parameter fits, or uniqueness theorems are introduced that could reduce to self-definition or self-citation chains. All load-bearing steps rely on standard Wasserstein geometry results external to the paper, with the central claim being a comparative organization of known approaches rather than a self-referential prediction or ansatz.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a review and does not introduce new free parameters, axioms, or invented entities in the provided abstract.

pith-pipeline@v0.9.0 · 5693 in / 1048 out tokens · 23698 ms · 2026-05-23T04:35:11.260875+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 4 internal anchors

  1. [1]

    M. S. Albergo, N. M. Boffi, and E. Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.arXiv preprint arXiv:2303.08797, 2023

  2. [2]

    Buildingnormalizingflowswithstochastic interpolants

    M.S.AlbergoandE.Vanden-Eijnden. Buildingnormalizingflowswithstochastic interpolants. InThe Eleventh International Conference on Learning Represen- tations, 2023

  3. [3]

    Ambrosio, E

    L. Ambrosio, E. Brué, and D. Semola.Lectures on Optimal Transport. UNI- TEXT. Springer International Publishing, 2021

  4. [4]

    Ambrosio, N

    L. Ambrosio, N. Gigli, and G. Savaré.Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2005

  5. [5]

    Ardizzone, J

    L. Ardizzone, J. Kruse, C. Rother, and U. Köthe. Analyzing inverse problems with invertible neural networks. In7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019

  6. [6]

    Barboni, G

    R. Barboni, G. Peyré, and F.-X. Vialard. Understanding the training of in- finitely deep and wide resnets with conditional optimal transport.arXiv preprint arXiv:2403.12887, 2024

  7. [7]

    Bertrand, R

    Q. Bertrand, R. Emonet, A. Gagneux, S. Martin, and M. Massias. A visual dive into conditional flow matching.https://dl.heeere.com/ conditional-flow-matching/blog/conditional-flow-matching/

  8. [8]

    V. I. Bogachev and M. A. S. Ruas.Measure Theory, volume 1. Springer, 2007

  9. [9]

    Chemseddine, P

    J. Chemseddine, P. Hagemann, C. Wald, and G. Steidl. Conditional Wasser- stein distances with applications in Bayesian OT flow matching.arXiv preprint arXiv:2403.18705, 2024. 62

  10. [10]

    R. Chen, J. Behrmann, D. K. Duvenaud, and J.-H. Jacobsen. Residual flows for invertible generative modeling. InAdvances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019

  11. [11]

    Chen, Y.Rubanova, J

    R. Chen, Y.Rubanova, J. Bettencourt, and D. Duvenaud. Neural ordinary differ- ential equations.Advances in Neural Information Processing Systems, 31, 2018

  12. [12]

    R. T. Q. Chen. torchdiffeq, 2018

  13. [13]

    Daras, A

    G. Daras, A. G. Dimakis, and C. Daskalakis. Consistent diffusion meets tweedie: Training exact ambient diffusion models with noisy data.arXiv preprint arXiv:2404.10177, 2024

  14. [14]

    L. Dinh, J. Sohl-Dickstein, and S. Bengio. Density estimation using real NVP. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017

  15. [15]

    N. Gigli. On the geometry of the space of probability measures endowed with the quadratic Optimal Transport distance.PhD Thesis, 2008. cvgmt preprint

  16. [16]

    González-Sanz and S

    A. González-Sanz and S. Sheng. Linearization of monge-amp\ere equations and data science applications.arXiv preprint arXiv:2408.06534, 2024

  17. [17]

    I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets.Advances in Neural Information Processing Systems, pages 2672–2680, 2014

  18. [18]

    Hagemann, J

    P. Hagemann, J. Hertrich, and G. Steidl. Generalized normalizing flows via Markov chains. InNon-local Data Interactions: Foundations and Applications. Cambridge University Press, 2022

  19. [19]

    Hagemann, J

    P. Hagemann, J. Hertrich, and G. Steidl. Stochastic normalizing flows for inverse problems: A Markov chains viewpoint.SIAM/ASA Journal on Uncertainty Quantification, 10(3):1162–1190, 2022

  20. [20]

    Hagemann and S

    P. Hagemann and S. Neumayer. Stabilizing invertible neural networks using mixture models.Inverse Problems, 37(8), 2021

  21. [21]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recogni- tion. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016

  22. [22]

    Holderrieth, M

    P. Holderrieth, M. Havasi, J. Yim, N. Shaul, I. Gat, T. Jaakkola, B. Karrer, R. T. Q. Chen, and Y. Lipman. Generator matching: Generative modeling with arbitrary markov processes. InICLR, 2025. 63

  23. [23]

    Hosseini, A

    B. Hosseini, A. W. Hsu, and A. Taghvaei. Conditional optimal transport on function spaces.arXiv preprint arXiv:2311.05672, 2024

  24. [24]

    T. Jahn, J. Chemseddine, P. Hagemann, C. Wald, and G. Steidl. Trajectory generator matching for time series.arXiv preprint arXiv:2505.23215, 2025

  25. [25]

    Kallenberg and O

    O. Kallenberg and O. Kallenberg.Foundations of Modern Probability, volume 2. Springer, 1997

  26. [26]

    Dynamicconditionaloptimaltransport through simulation-free flows.arXiv preprint arXiv:2404.04240, 2024

    G.Kerrigan, G.Migliorini, andP.Smyth. Dynamicconditionaloptimaltransport through simulation-free flows.arXiv preprint arXiv:2404.04240, 2024

  27. [27]

    D. P. Kingma and M. Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

  28. [28]

    B. R. Kloeckner. Extensions with shrinking fibers.Ergodic Theory and Dynam- ical Systems, 41(6):1795–1834, 2021

  29. [29]

    Krizhevsky, G

    A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. 2009

  30. [30]

    Lipman, R

    Y. Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023

  31. [31]

    Q. Liu. Rectified flow: A marginal preserving approach to optimal transport. arXiv preprint arXiv:2209.14577, 2022

  32. [32]

    X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations, 2023

  33. [33]

    Martin, A

    S. Martin, A. Gagneux, P. Hagemann, and G. Steidl. PnP-flow: Plug-and-play image restoration with flow matching.ICLR, 2025

  34. [34]

    Peszek and D

    J. Peszek and D. Poyato. Heterogeneous gradient flows in the topology of fibered optimal transport.Calculus of Variations and Partial Differential Equations, 62(9):258, 2023

  35. [35]

    Peyré, M

    G. Peyré, M. Cuturi, et al. Computational optimal transport: With applications todatascience.Foundations and Trends®in Machine Learning, 11(5-6):355–607, 2019

  36. [36]

    Plonka, D

    G. Plonka, D. Potts, G. Steidl, and M. Tasche.Numerical Fourier Analysis. Applied and Numerical Harmonic Analysis. Birkhäuser, second edition, 2023

  37. [37]

    M. Poli, S. Massaroli, A. Yamashita, H. Asama, J. Park, and S. Ermon. Torch- dyn: Implicit models and neural numerical methods in pytorch. 64

  38. [38]

    Santambrogio

    F. Santambrogio. Optimal Transport for applied mathematicians.Birkäuser, 2015

  39. [39]

    I. Schurov. Adjoint state method, backpropagation and neural odes.https: //ilya.schurov.com/post/adjoint-method/

  40. [40]

    Sohl-Dickstein, E

    J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli. Deep unsu- pervised learning using nonequilibrium thermodynamics. In F. Bach and D. Blei, editors,Proceedings of the 32nd International Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR

  41. [41]

    Y. Song, C. Durkan, I. Murray, and S. Ermon. Maximum likelihood training of score-based diffusion models. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors,Advances in Neural Information Processing Systems, 2021

  42. [42]

    Generative Modeling by Estimating Gradients of the Data Distribution

    Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution.ArXiv 1907.05600, 2019

  43. [43]

    Teschl.Ordinary Differential Equations and Dynamical Systems, volume 140

    G. Teschl.Ordinary Differential Equations and Dynamical Systems, volume 140. American Mathematical Society, 2024

  44. [44]

    A. Tong, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, K. Fatras, G. Wolf, and Y. Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport. InICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023

  45. [45]

    P. Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011

  46. [46]

    H. Wu, J. Köhler, and F. Noé. Stochastic normalizing flows. In H. Larochelle, M. A. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems 2020, 2020

  47. [47]

    Zhang, P

    Y. Zhang, P. Yu, Y. Zhu, Y. Chang, F. Gao, Y. N. Wu, and O. Leong. Flow priors for linear inverse problems via iterative corrupted trajectory matching. arXiv preprint arXiv:2405.18816, 2024. A Proof of Theorem 3.1 Recall that a familyAof subsets of a setXis calledmonotone class, if • ⋃∞ i=1An∈Afor every increasing sequenceAi∈A, and 65 • ⋂∞ i=1Ai∈Afor ever...