pith. machine review for the scientific record. sign in

arxiv: 2605.12174 · v1 · submitted 2026-05-12 · 💻 cs.LG · math.PR

Recognition: 2 theorem links

· Lean Theorem

Expected Batch Optimal Transport Plans and Consequences for Flow Matching

Authors on Pith no claims yet

Pith reviewed 2026-05-13 07:13 UTC · model grok-4.3

classification 💻 cs.LG math.PR
keywords optimal transportflow matchingminibatchsemidiscretevelocity fieldgenerative modelsconvergence ratesbatch consistency
0
0 comments X

The pith

Averaging optimal transport plans over random minibatches produces a population coupling that converges to the true OT plan and induces unique flows in flow matching.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formalizes the coupling obtained by averaging optimal transport plans computed on independent random minibatches of fixed size k as the expected batch OT plan. It proves that this averaged plan converges to the exact OT plan as the batch size grows, with explicit rates on the transport cost bias in the semidiscrete setting where the source is continuous and the target is discrete. This population-level coupling yields a velocity field regular enough to guarantee a unique flow from source to target, which straightens paths and lowers numerical integration cost in flow matching. The authors quantify the batch-size versus integration-error tradeoff both in a simple two-atom model and through synthetic and image experiments.

Core claim

The expected batch OT plan, formed by averaging empirical OT plans over independent minibatches of size k, is consistent with the true OT plan in the large-batch limit. In the semidiscrete regime, both the bias in transport cost and the plan itself converge at explicit rates. This averaged coupling induces a velocity field in flow matching that is sufficiently regular to define a unique flow from the continuous source distribution to the discrete target distribution.

What carries the argument

The expected batch OT plan π̄_k, defined as the average of empirical optimal transport plans computed independently on random minibatches of size k.

If this is right

  • The population coupling from averaged minibatch OT produces a velocity field regular enough to guarantee a unique flow in flow matching.
  • Explicit convergence rates hold for both transport-cost bias and the plan itself to the true OT plan in the semidiscrete setting.
  • Batch size and numerical integration accuracy trade off in a quantifiable way, as verified in the two-atom model and in image experiments.
  • Repeated minibatch OT can serve as a practical surrogate that inherits the straightening benefits of full OT while remaining computationally tractable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practitioners could choose moderate batch sizes that balance convergence speed with per-step cost without losing the uniqueness of the resulting flow.
  • The same averaging construction might stabilize other path-straightening methods that rely on approximate couplings between continuous and discrete measures.
  • One could test whether the derived rates predict the minimal batch size needed to keep integration error below a target threshold on new datasets.
  • If the target later becomes continuous, the uniqueness guarantee may fail, suggesting a need for additional regularization.

Load-bearing premise

The target distribution is discrete while the source is continuous, and both measures possess enough regularity for the induced velocity field to be well-defined and unique.

What would settle it

A direct computation in the two-atom model showing that the averaged minibatch plan fails to converge to the true OT plan or that the induced flow becomes non-unique when batch size k is increased.

Figures

Figures reproduced from arXiv: 2605.12174 by Julie Delon, Kimia Nadjahi, Samuel Bo\"it\'e.

Figure 1
Figure 1. Figure 1: Transport plans between N (0, 1) and a mixture of two symmetric Gaussians. As k grows, πk approaches the OT plan π ⋆ (shown as a line) (a–c). (d) shows the entropically regularized plan. Before stating our main result, we provide an illustration in [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Numerical rates in Gaussian-to-discrete experiments. The expected batch OT cost bias decays [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The terminal map ϕ πk induced by the flow partitions R 2 into cells that approach the semidiscrete OT Laguerre partition. The case k = 1 corresponds to the independent coupling. In this example, µ = N (0, I2) and ν is uniformly supported on 7 fixed atoms, shown as colored points. The velocity field points from x to the posterior mean mπ t (x), which always lies in the convex hull of the target atoms. Obser… view at source ↗
Figure 4
Figure 4. Figure 4: Posterior concentration increases with OT batch size [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Level sets of the Euler integration error [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: In a Gaussian-to-discrete setting with 100 atoms in dimension 20, the numerical integration error is consistent with a 1/n scaling in the number of Euler steps n and a 1/ √ k scaling in the OT batch size k. Error bars show ±1 standard error over sampled initial conditions. We observe a similar qualitative asymmetry in higher-dimensional semidiscrete examples with larger finite support. Although we were not… view at source ↗
Figure 7
Figure 7. Figure 7: Image experiment. The expected batch OT cost decreases steadily up to k = 8192 (left). Increasing k improves FID mainly in the low-NFE regime, while the effect weakens or reverses at large NFE (right). Error bands show ±1 empirical standard deviation over training seeds. exact OT batches exceeded our computational budget. For each dataset and each k, we report FID over six training seeds [PITH_FULL_IMAGE:… view at source ↗
Figure 8
Figure 8. Figure 8: Couplings πε and π ⋆ ε in Π(µε, ν). The values of Sε and S ⋆ ε do not matter on the µε-negligible sets R × {0} and {0} × R. Denote by πε = (id, Sε)♯µε and π ⋆ ε = (id, S⋆ ε )♯µε the corresponding transport plans. Then δε = Z R2×R2 ∥x − y∥ 2 dπε(x, y) − W2 2 (µε, ν) = 1 4ε Z Kε [PITH_FULL_IMAGE:figures/full_fig_p031_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The velocity field u π1 t described in Theorem B.3 exhibits jump discontinuities. Hence, for every compact K ⊆ R d , supz∈K R τ 0 ∥ut(z)∥ dt < +∞. Moreover, the Lipschitz constant L of Theorem 4.2(1) is integrable on [0, τ ]. Therefore, applying (Pierret et al., 2026, Theorem 2.7) on [0, τ ] gives the existence and uniqueness of ϕ πk t , and the property (ϕ πk t )♯µ = ρt on [0, τ ]. The continuity of x 7→ … view at source ↗
Figure 10
Figure 10. Figure 10: Numerical verification of the two asymptotic regimes in Theorem [PITH_FULL_IMAGE:figures/full_fig_p048_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Training curves on CIFAR-10 for a representative subset of OT batch sizes; training loss [PITH_FULL_IMAGE:figures/full_fig_p053_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Generated CIFAR-10 samples for k ∈ {1, 2, 4, . . . , 8192} at NFE = 10. 54 [PITH_FULL_IMAGE:figures/full_fig_p054_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Generated CIFAR-10 samples for k ∈ {1, 2, 4, . . . , 8192} at NFE = 100. 55 [PITH_FULL_IMAGE:figures/full_fig_p055_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Generated SVHN samples for k ∈ {1, 2, 4, . . . , 8192} at NFE = 5. 56 [PITH_FULL_IMAGE:figures/full_fig_p056_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Generated SVHN samples for k ∈ {1, 2, 4, . . . , 8192} at NFE = 100. 57 [PITH_FULL_IMAGE:figures/full_fig_p057_15.png] view at source ↗
read the original abstract

Solving optimal transport (OT) on random minibatches is a common surrogate for exact OT in large-scale learning. In flow matching (FM), this surrogate is used to obtain OT-like couplings that can straighten probability paths and reduce numerical integration cost. Yet, the population-level coupling induced by repeated minibatch OT remains only partially understood. We formalize this coupling as the expected batch OT plan $\overline{\pi}_{k}$, obtained by averaging empirical OT plans over independent minibatches of size $k$. We then establish its large-batch consistency and, in the semidiscrete case relevant to generative modeling, derive rates for both the transport-cost bias and the convergence of $\overline{\pi}_{k}$ to the OT plan. For FM, this yields a population coupling whose induced velocity field is regular enough to define a unique flow from the source to the discrete target. We finally quantify how OT batch size interacts with numerical integration in a tractable two-atom model and in synthetic and image experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper formalizes the expected batch OT plan π̄_k as the average of empirical OT plans computed on independent minibatches of size k. It establishes large-batch consistency of this plan, derives explicit rates for transport-cost bias and convergence to the true OT plan in the semidiscrete (continuous source, discrete target) setting, shows that the resulting population coupling induces a sufficiently regular velocity field to guarantee a unique flow in flow matching, and quantifies the interaction between OT batch size and numerical integration error via a two-atom model plus synthetic and image experiments.

Significance. If the derivations are correct, the work supplies a missing population-level analysis of minibatch OT surrogates that are already used in flow-matching pipelines. The explicit bias and convergence rates, together with the uniqueness result for the induced flow, would give practitioners a principled way to choose batch sizes that balance straightening of probability paths against integration cost. The two-atom model provides a clean, falsifiable testbed for the theory.

major comments (1)
  1. [Abstract and §4] Abstract and §4 (semidiscrete analysis): the claim that the induced velocity field is 'regular enough to define a unique flow' is established only under the semidiscrete assumption with discrete target. Standard flow-matching targets (e.g., image distributions) are continuous-continuous; the Lipschitz or uniqueness properties used in the proof do not automatically carry over, so the relevance statement for generative modeling requires either an extension or an explicit scope limitation.
minor comments (3)
  1. Notation: the symbol π̄_k is introduced in the abstract but its precise definition (expectation over which measure?) should be restated at the beginning of the main theoretical section for readers who skip the abstract.
  2. Experiments: the two-atom model is described as 'tractable,' yet the precise closed-form expressions for the bias and integration error are not displayed; adding them would make the numerical verification easier to follow.
  3. References: the paper cites standard OT and flow-matching works, but should include a brief pointer to recent analyses of minibatch OT bias (e.g., in the context of Wasserstein GANs) to situate the new rates.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We agree that the scope of the uniqueness result should be stated more explicitly and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (semidiscrete analysis): the claim that the induced velocity field is 'regular enough to define a unique flow' is established only under the semidiscrete assumption with discrete target. Standard flow-matching targets (e.g., image distributions) are continuous-continuous; the Lipschitz or uniqueness properties used in the proof do not automatically carry over, so the relevance statement for generative modeling requires either an extension or an explicit scope limitation.

    Authors: We agree that the Lipschitz regularity and uniqueness of the flow are established only under the semidiscrete (continuous source, discrete target) assumption. While the manuscript already frames the semidiscrete setting as relevant to generative modeling—because practical FM pipelines typically operate on finite samples from the target distribution—we acknowledge that this does not automatically extend to fully continuous-continuous targets without further analysis. We will revise the abstract and §4 to (i) explicitly limit the uniqueness claim to the semidiscrete case and (ii) add a sentence noting that extension to continuous-continuous targets remains future work. This change will be made without altering the technical results. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper defines the expected batch OT plan as the average of empirical plans over independent minibatches and proceeds to prove large-batch consistency plus explicit bias and convergence rates in the semidiscrete setting. These steps rest on standard optimal-transport arguments and measure-theoretic analysis rather than any self-referential definition, fitted parameter renamed as a prediction, or load-bearing self-citation. The subsequent claim that the induced velocity field is sufficiently regular to yield a unique flow follows directly from the derived regularity properties and does not reduce to an ansatz or prior result supplied only by the same authors. The derivation chain is therefore self-contained against external OT theory.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard existence and uniqueness results from optimal transport theory together with the semidiscrete modeling assumption common in generative modeling; no free parameters or new postulated entities are introduced.

axioms (2)
  • standard math Existence and uniqueness of optimal transport plans between probability measures under standard regularity conditions
    Invoked to define the population coupling and the induced velocity field.
  • domain assumption Semidiscrete setting (continuous source measure, discrete target measure) is appropriate for the generative modeling application
    Used to derive the specific convergence rates and flow uniqueness.

pith-pipeline@v0.9.0 · 5470 in / 1524 out tokens · 65574 ms · 2026-05-13T07:13:26.719349+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

  1. [1]

    S., Goldstein, M., Boffi, N

    Albergo, M. S., Goldstein, M., Boffi, N. M., Ranganath, R., and Vanden-Eijnden, E. (2024). Stochastic interpolants with data-dependent couplings. In International Conference on Machine Learning , pages 921--937. PMLR

  2. [2]

    Ambrosio, L., Gigli, N., and Savar \'e , G. (2005). Gradient flows: in metric spaces and in the space of probability measures . Springer

  3. [3]

    and Kitagawa, J

    Bansil, M. and Kitagawa, J. (2022). Quantitative stability in the geometry of semi-discrete optimal transport. International Mathematics Research Notices , 2022(10):7354--7389

  4. [4]

    Bertrand, Q., Gagneux, A., Massias, M., and Emonet, R. (2025). On the closed-form of flow matching: Generalization does not arise from target stochasticity. In NeurIPS 2025

  5. [5]

    and Ledoux, M

    Bobkov, S. and Ledoux, M. (2019). One-dimensional empirical measures, order statistics, and Kantorovich transport distances , volume 261. American Mathematical Society

  6. [6]

    and Massart, P

    Boucheron, S. and Massart, P. (2011). A high-dimensional Wilks phenomenon. Probability theory and related fields , 150(3):405--433

  7. [7]

    Chen, R. T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. (2018). Neural ordinary differential equations. Advances in Neural Information Processing Systems

  8. [8]

    Del Barrio, E., Gonz \'a lez Sanz, A., and Loubes, J.-M. (2024). Central limit theorems for semi-discrete Wasserstein distances. Bernoulli , 30(1):554--580

  9. [9]

    Dodson, N., Gao, X., Wang, Q., Wang, Y., and Wan, Z. (2026). Two calm ends and the wild middle: A geometric picture of memorization in diffusion models. arXiv preprint arXiv:2602.17846

  10. [10]

    Fatras, K., Sejourne, T., Flamary, R., and Courty, N. (2021a). Unbalanced minibatch optimal transport; applications to domain adaptation. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning , volume 139 of Proceedings of Machine Learning Research , pages 3186--3197. PMLR

  11. [11]

    Fatras, K., Zine, Y., Flamary, R., Gribonval, R., and Courty, N. (2020). Learning with minibatch Wasserstein : asymptotic and gradient properties. In Chiappa, S. and Calandra, R., editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , volume 108 of Proceedings of Machine Learning Research , pages 2131...

  12. [12]

    Fatras, K., Zine, Y., Majewski, S., Flamary, R., Gribonval, R., and Courty, N. (2021b). Minibatch optimal transport distances; analysis and applications. arXiv preprint arXiv:2101.01792

  13. [13]

    Z., Boisbunon, A., Chambon, S., Chapel, L., Corenflos, A., Fatras, K., Fournier, N., Gautheron, L., Gayraud, N

    Flamary, R., Courty, N., Gramfort, A., Alaya, M. Z., Boisbunon, A., Chambon, S., Chapel, L., Corenflos, A., Fatras, K., Fournier, N., Gautheron, L., Gayraud, N. T., Janati, H., Rakotomamonjy, A., Redko, I., Rolet, A., Schutz, A., Seguy, V., Sutherland, D. J., Tavenard, R., Tong, A., and Vayer, T. (2021). POT : Python Optimal Transport . Journal of Machine...

  14. [14]

    and Guillin, A

    Fournier, N. and Guillin, A. (2015). On the rate of convergence in Wasserstein distance of the empirical measure. Probability Theory and Related Fields , 162(3):707--738

  15. [15]

    R., Millman, K

    Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del R \' i o, J. F., Wiebe, M., Peterson, P., G \' e rard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., a...

  16. [16]

    Hertrich, J., Chambolle, A., and Delon, J. (2025). On the relation between rectified flows and optimal transport. arXiv preprint arXiv:2505.19712

  17. [17]

    Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. (2017). GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Advances in Neural Information Processing Systems , 30

  18. [18]

    Hundrieser, S., Staudt, T., and Munk, A. (2024). Empirical optimal transport between different measures adapts to lower complexity. Annales de l'Institut Henri Poincaré, Probabilités et Statistiques , 60(2):824--846

  19. [19]

    Kitagawa, J., M \'e rigot, Q., and Thibert, B. (2016). Convergence of a Newton algorithm for semi-discrete optimal transport. Journal of the European Mathematical Society

  20. [20]

    Klatt, M., Munk, A., and Zemel, Y. (2022). Limit laws for empirical optimal solutions in random linear programs. Annals of Operations Research , 315(1):251--278

  21. [21]

    Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny images. Technical report, University of Toronto, Toronto, ON, Canada

  22. [22]

    T., Ben-Hamu, H., Nickel, M., and Le, M

    Lipman, Y., Chen, R. T., Ben-Hamu, H., Nickel, M., and Le, M. (2023). Flow matching for generative modeling. In 11th International Conference on Learning Representations, ICLR 2023

  23. [23]

    Liu, X., Gong, C., and Liu, Q. (2022). Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003

  24. [24]

    Y., Klein, M., and Cuturi, M

    Mousavi-Hosseini, A., Zhang, S. Y., Klein, M., and Cuturi, M. (2025). Flow matching with semidiscrete couplings. arXiv preprint arXiv:2509.25519

  25. [25]

    Y., et al

    Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A. Y., et al. (2011). Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning , page 4. Granada

  26. [26]

    Parmar, G., Zhang, R., and Zhu, J.-Y. (2022). On aliased resizing and surprising subtleties in GAN evaluation. In CVPR

  27. [27]

    Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., K\" o pf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). PyTorch : an imperative style, high-performance deep learning library. In Proceedings ...

  28. [28]

    and Cuturi, M

    Peyr \'e , G. and Cuturi, M. (2019). Computational optimal transport: With applications to data science . Now Foundations and Trends

  29. [29]

    Pierret, E., Tosel, V., Delon, J., and Newson, A. (2026). Flow Matching for Applied Mathematicians . HAL preprint hal-05538982

  30. [30]

    Poli, M., Massaroli, S., Yamashita, A., Asama, H., Park, J., and Ermon, S. (2021). TorchDyn : Implicit models and neural numerical methods in PyTorch . https://github.com/DiffEqML/torchdyn

  31. [31]

    Pooladian, A.-A., Ben-Hamu, H., Domingo-Enrich, C., Amos, B., Lipman, Y., and Chen, R. T. (2023a). Multisample flow matching: Straightening flows with minibatch couplings. ICML 2023

  32. [32]

    Pooladian, A.-A., Divol, V., and Niles-Weed, J. (2023b). Minimax estimation of discontinuous optimal transport maps: The semi-discrete case. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J., editors, Proceedings of the 40th International Conference on Machine Learning , volume 202 of Proceedings of Machine Learning Resea...

  33. [33]

    Santambrogio, F. (2015). Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling . Springer International Publishing

  34. [34]

    and Hundrieser, S

    Staudt, T. and Hundrieser, S. (2025). Convergence of empirical optimal transport in unbounded settings. Bernoulli , 31(3):1929--1954

  35. [35]

    Tong, A., Fatras, K., Malkin, N., Huguet, G., Zhang, Y., Rector-Brooks, J., Wolf, G., and Bengio, Y. (2024). Improving and generalizing flow-based generative models with minibatch optimal transport. Transactions on Machine Learning Research

  36. [36]

    and Wellner, J

    van der Vaart, A. and Wellner, J. (1996). Weak Convergence and Empirical Processes. With Applications to Statistics . New York: Springer

  37. [37]

    E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt , S

    Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt , S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C. J., Polat, \.I ., Feng, Y., Moore, E. W., VanderPlas , J., Laxalde, D., Perktold,...

  38. [38]

    Wan, Z., Wang, Q., Mishne, G., and Wang, Y. (2025). Elucidating flow matching ODE dynamics via data geometry and denoisers. In Forty-second International Conference on Machine Learning

  39. [39]

    Zhang, S., Mousavi-Hosseini, A., Klein, M., and Cuturi, M. (2025). On fitting flow models with large Sinkhorn couplings. arXiv preprint arXiv:2506.05526