Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans
Pith reviewed 2026-05-23 04:35 UTC · model grok-4.3
The pith
Velocity fields in flow matching can be characterized and learned from transport plans, Markov kernels, or stochastic processes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper shows how the velocity fields can be characterized and learned via i) transport plans (couplings) between latent and target distributions, ii) Markov kernels and iii) stochastic processes, where the latter two include the coupling approach, but are in general broader. It further demonstrates that flow matching can solve Bayesian inverse problems when conditional Wasserstein distances are defined, and contrasts the approach with continuous normalizing flows and score matching.
What carries the argument
Velocity fields of absolutely continuous curves in the Wasserstein geometry, obtained from transport plans, Markov kernels, or stochastic processes.
If this is right
- A single velocity field admits equivalent representations as the expectation under a coupling, under a Markov kernel, or under a stochastic process.
- Markov kernels and stochastic processes supply strictly larger classes of admissible velocity fields than couplings alone.
- Flow matching directly yields solvers for Bayesian inverse problems once conditional Wasserstein distances are introduced.
- Continuous normalizing flows and score matching constitute alternative routes to the same velocity-field learning problem.
Where Pith is reading between the lines
- The three characterizations could be combined inside one training loop to improve numerical stability when data are limited.
- The perspective may clarify why flow matching scales better than some competing generative methods on high-dimensional data.
- Generalizing the same velocity-field constructions beyond Euclidean Wasserstein space could cover manifold-valued or discrete distributions.
Load-bearing premise
The velocity fields correspond to absolutely continuous curves in the Wasserstein geometry.
What would settle it
A concrete transport plan, Markov kernel, or stochastic process whose induced velocity field, when integrated in the ODE, fails to transport samples from the latent distribution to the target distribution.
Figures
read the original abstract
Among generative neural models, flow matching techniques stand out for their simple applicability and good scaling properties. Here, velocity fields of curves connecting a simple latent and a target distribution are learned. Then the corresponding ordinary differential equation can be used to sample from a target distribution, starting in samples from the latent one. This paper reviews from a mathematical point of view different techniques to learn the velocity fields of absolutely continuous curves in the Wasserstein geometry. We show how the velocity fields can be characterized and learned via i) transport plans (couplings) between latent and target distributions, ii) Markov kernels and iii) stochastic processes, where the latter two include the coupling approach, but are in general broader. Besides this main goal, we show how flow matching can be used for solving Bayesian inverse problems, where the definition of conditional Wasserstein distances plays a central role. Finally, we briefly address continuous normalizing flows and score matching techniques, which approach the learning of velocity fields of curves from other directions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reviews mathematical techniques for characterizing and learning velocity fields of absolutely continuous curves in the Wasserstein geometry for flow matching generative models. It presents three routes—transport plans (couplings) between latent and target distributions, Markov kernels, and stochastic processes (with the latter two containing the coupling approach but being broader)—and applies the framework to Bayesian inverse problems via conditional Wasserstein distances, while briefly contrasting with continuous normalizing flows and score matching.
Significance. As a review synthesizing characterizations of velocity fields, the manuscript provides a unified perspective that could clarify relationships among flow matching variants and support applications in generative modeling and inverse problems. The explicit inclusion of Markov kernels and stochastic processes as strictly more general than couplings is a useful organizing principle if the derivations hold under standard Wasserstein assumptions.
minor comments (2)
- [Abstract] The abstract and introduction could more explicitly state the target audience (e.g., machine learning practitioners versus measure theorists) to help readers gauge the level of technical detail.
- [Introduction] Notation for the velocity field v_t and the curve μ_t is introduced without a dedicated preliminary section; a short notation table would improve readability for the characterizations in Sections 3–5.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript, including its significance as a unifying review of flow matching characterizations and its recommendation to accept. No major comments were raised that require specific responses or revisions.
Circularity Check
No significant circularity: review of established characterizations
full rationale
This is a review paper that surveys existing mathematical characterizations of velocity fields for absolutely continuous curves in Wasserstein space via transport plans, Markov kernels, and stochastic processes. No new derivations, parameter fits, or uniqueness theorems are introduced that could reduce to self-definition or self-citation chains. All load-bearing steps rely on standard Wasserstein geometry results external to the paper, with the central claim being a comparative organization of known approaches rather than a self-referential prediction or ansatz.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
velocity fields of absolutely continuous curves in the Wasserstein geometry... via i) transport plans (couplings)... ii) Markov kernels... iii) stochastic processes
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 3.3... absolutely continuous if and only if there exists a Borel measurable vector field v... (CE)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
M. S. Albergo, N. M. Boffi, and E. Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.arXiv preprint arXiv:2303.08797, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Buildingnormalizingflowswithstochastic interpolants
M.S.AlbergoandE.Vanden-Eijnden. Buildingnormalizingflowswithstochastic interpolants. InThe Eleventh International Conference on Learning Represen- tations, 2023
work page 2023
-
[3]
L. Ambrosio, E. Brué, and D. Semola.Lectures on Optimal Transport. UNI- TEXT. Springer International Publishing, 2021
work page 2021
-
[4]
L. Ambrosio, N. Gigli, and G. Savaré.Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2005
work page 2005
-
[5]
L. Ardizzone, J. Kruse, C. Rother, and U. Köthe. Analyzing inverse problems with invertible neural networks. In7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019
work page 2019
-
[6]
R. Barboni, G. Peyré, and F.-X. Vialard. Understanding the training of in- finitely deep and wide resnets with conditional optimal transport.arXiv preprint arXiv:2403.12887, 2024
-
[7]
Q. Bertrand, R. Emonet, A. Gagneux, S. Martin, and M. Massias. A visual dive into conditional flow matching.https://dl.heeere.com/ conditional-flow-matching/blog/conditional-flow-matching/
-
[8]
V. I. Bogachev and M. A. S. Ruas.Measure Theory, volume 1. Springer, 2007
work page 2007
-
[9]
J. Chemseddine, P. Hagemann, C. Wald, and G. Steidl. Conditional Wasser- stein distances with applications in Bayesian OT flow matching.arXiv preprint arXiv:2403.18705, 2024. 62
-
[10]
R. Chen, J. Behrmann, D. K. Duvenaud, and J.-H. Jacobsen. Residual flows for invertible generative modeling. InAdvances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019
work page 2019
-
[11]
R. Chen, Y.Rubanova, J. Bettencourt, and D. Duvenaud. Neural ordinary differ- ential equations.Advances in Neural Information Processing Systems, 31, 2018
work page 2018
-
[12]
R. T. Q. Chen. torchdiffeq, 2018
work page 2018
- [13]
-
[14]
L. Dinh, J. Sohl-Dickstein, and S. Bengio. Density estimation using real NVP. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017
work page 2017
-
[15]
N. Gigli. On the geometry of the space of probability measures endowed with the quadratic Optimal Transport distance.PhD Thesis, 2008. cvgmt preprint
work page 2008
-
[16]
A. González-Sanz and S. Sheng. Linearization of monge-amp\ere equations and data science applications.arXiv preprint arXiv:2408.06534, 2024
-
[17]
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets.Advances in Neural Information Processing Systems, pages 2672–2680, 2014
work page 2014
-
[18]
P. Hagemann, J. Hertrich, and G. Steidl. Generalized normalizing flows via Markov chains. InNon-local Data Interactions: Foundations and Applications. Cambridge University Press, 2022
work page 2022
-
[19]
P. Hagemann, J. Hertrich, and G. Steidl. Stochastic normalizing flows for inverse problems: A Markov chains viewpoint.SIAM/ASA Journal on Uncertainty Quantification, 10(3):1162–1190, 2022
work page 2022
-
[20]
P. Hagemann and S. Neumayer. Stabilizing invertible neural networks using mixture models.Inverse Problems, 37(8), 2021
work page 2021
-
[21]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recogni- tion. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016
work page 2016
-
[22]
P. Holderrieth, M. Havasi, J. Yim, N. Shaul, I. Gat, T. Jaakkola, B. Karrer, R. T. Q. Chen, and Y. Lipman. Generator matching: Generative modeling with arbitrary markov processes. InICLR, 2025. 63
work page 2025
-
[23]
B. Hosseini, A. W. Hsu, and A. Taghvaei. Conditional optimal transport on function spaces.arXiv preprint arXiv:2311.05672, 2024
- [24]
-
[25]
O. Kallenberg and O. Kallenberg.Foundations of Modern Probability, volume 2. Springer, 1997
work page 1997
-
[26]
G.Kerrigan, G.Migliorini, andP.Smyth. Dynamicconditionaloptimaltransport through simulation-free flows.arXiv preprint arXiv:2404.04240, 2024
-
[27]
D. P. Kingma and M. Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[28]
B. R. Kloeckner. Extensions with shrinking fibers.Ergodic Theory and Dynam- ical Systems, 41(6):1795–1834, 2021
work page 2021
-
[29]
A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. 2009
work page 2009
- [30]
-
[31]
Q. Liu. Rectified flow: A marginal preserving approach to optimal transport. arXiv preprint arXiv:2209.14577, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[32]
X. Liu, C. Gong, and Q. Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations, 2023
work page 2023
- [33]
-
[34]
J. Peszek and D. Poyato. Heterogeneous gradient flows in the topology of fibered optimal transport.Calculus of Variations and Partial Differential Equations, 62(9):258, 2023
work page 2023
- [35]
- [36]
-
[37]
M. Poli, S. Massaroli, A. Yamashita, H. Asama, J. Park, and S. Ermon. Torch- dyn: Implicit models and neural numerical methods in pytorch. 64
-
[38]
F. Santambrogio. Optimal Transport for applied mathematicians.Birkäuser, 2015
work page 2015
-
[39]
I. Schurov. Adjoint state method, backpropagation and neural odes.https: //ilya.schurov.com/post/adjoint-method/
-
[40]
J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli. Deep unsu- pervised learning using nonequilibrium thermodynamics. In F. Bach and D. Blei, editors,Proceedings of the 32nd International Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR
work page 2015
-
[41]
Y. Song, C. Durkan, I. Murray, and S. Ermon. Maximum likelihood training of score-based diffusion models. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors,Advances in Neural Information Processing Systems, 2021
work page 2021
-
[42]
Generative Modeling by Estimating Gradients of the Data Distribution
Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution.ArXiv 1907.05600, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[43]
Teschl.Ordinary Differential Equations and Dynamical Systems, volume 140
G. Teschl.Ordinary Differential Equations and Dynamical Systems, volume 140. American Mathematical Society, 2024
work page 2024
-
[44]
A. Tong, N. Malkin, G. Huguet, Y. Zhang, J. Rector-Brooks, K. Fatras, G. Wolf, and Y. Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport. InICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023
work page 2023
-
[45]
P. Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661–1674, 2011
work page 2011
-
[46]
H. Wu, J. Köhler, and F. Noé. Stochastic normalizing flows. In H. Larochelle, M. A. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems 2020, 2020
work page 2020
-
[47]
Y. Zhang, P. Yu, Y. Zhu, Y. Chang, F. Gao, Y. N. Wu, and O. Leong. Flow priors for linear inverse problems via iterative corrupted trajectory matching. arXiv preprint arXiv:2405.18816, 2024. A Proof of Theorem 3.1 Recall that a familyAof subsets of a setXis calledmonotone class, if • ⋃∞ i=1An∈Afor every increasing sequenceAi∈A, and 65 • ⋂∞ i=1Ai∈Afor ever...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.