arxiv: 2605.12951 · v1 · submitted 2026-05-13 · 📊 stat.ML · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Coreset-Induced Conditional Velocity Flow Matching

Xiao Wang , Zihua She , Jianxi Su

Authors on Pith no claims yet

Pith reviewed 2026-05-14 18:52 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords flow matchingcoresetWasserstein distancegenerative modelconditional velocityrectified flowGaussian mixture surrogate

0 comments

The pith

A coreset-derived Gaussian mixture surrogate replaces isotropic noise in conditional velocity flow matching and equals the target-surrogate Wasserstein gap as transport cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that hierarchical rectified flow matching can start from a data-compressed surrogate instead of pure Gaussian noise. The surrogate is built by compressing the target velocity distribution into weighted atoms via entropic Sinkhorn and lifting them to a closed-form Gaussian mixture. A lightweight correction flow then only refines the residual gap rather than learning the full noise-to-target map. Under an explicit compression assumption the surrogate transport cost equals the Wasserstein distance between target and surrogate, while the standard noise source carries a dimension-dependent lower bound. The conditional second-moment excess of the training target stays small whenever the surrogate matches the true conditional velocity law in mean and covariance.

Core claim

Under the explicit compression assumption the surrogate transport cost equals the target-surrogate Wasserstein gap, whereas the isotropic-noise analogue is bounded below by a term that scales with dimension; the conditional second moment of the direct surrogate-source target has a source-dependent excess that vanishes when the surrogate conditional law is close to the true law in mean and covariance.

What carries the argument

The coreset-induced conditional velocity law: a closed-form Gaussian mixture obtained by lifting an entropic Sinkhorn coreset of weighted atoms from the target velocity distribution.

If this is right

The inner flow learns only a residual correction instead of a full noise-to-data map, enabling competitive few-step sampling on MNIST, CIFAR-10, ImageNet-32 and CelebA-HQ.
The training target’s conditional second-moment excess remains small once the surrogate matches the true conditional velocity law in first and second moments.
The noise-source lower bound disappears once the source is replaced by the data-informed Gaussian mixture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same coreset construction could be inserted into other conditional flow or diffusion pipelines that currently start from isotropic noise.
If the compression assumption holds in higher-dimensional or non-image modalities, the method would reduce the number of function evaluations needed for high-quality samples.
The explicit equality between surrogate transport cost and Wasserstein gap supplies a new diagnostic for choosing coreset size.

Load-bearing premise

The coreset-derived Gaussian mixture approximates the target velocity distribution closely enough that the remaining residual can be corrected by a lightweight flow.

What would settle it

Measure the Wasserstein gap between target and surrogate on a dataset where the Sinkhorn coreset is deliberately made coarser; if generation quality collapses to standard flow-matching levels, the equality claim fails.

Figures

Figures reproduced from arXiv: 2605.12951 by Jianxi Su, Xiao Wang, Zihua She.

**Figure 2.** Figure 2: Uncurated samples used for the reported FID pools: first [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Toy benchmarks across five synthetic targets. Each panel compares (columns, left-to-right): [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗

**Figure 4.** Figure 4: Conditional-velocity advantage on ring-6. Left: the true conditional velocity law [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗

**Figure 5.** Figure 5: Empirical illustration of the surrogate-gap decomposition on the ring-6 target. For each [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: MNIST Stage III samples at L ∈ {10, 20, 50} correction steps. Quality sharpens from L=10 to L=20 and continues improving at L=50, mirroring the FID column. CIFAR, L=10 (FID 13.18) L=20 (FID 9.28) L=50 (FID 7.78) [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: CIFAR-10 Stage III samples across NFE budgets at [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: ImageNet-32 Stage III samples across NFE budgets. [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗

**Figure 9.** Figure 9: 10 × 10 uncurated CelebA-HQ 256 samples from the CCVFM-L L=50 generator (FID28k=4.17). The grid spans identities, lighting, expression, head pose, hair colour, and presence of glasses/accessories without any per-sample selection; the diversity confirms the surrogate πe propagates the full CelebA-HQ identity manifold and not just a few high-density modes. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗

**Figure 10.** Figure 10: 2 × 5 uncurated panel of CelebA-HQ samples, larger per-image resolution. Reproduces the panel used as [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗

**Figure 11.** Figure 11: Stage II→III progression on CelebA-HQ. Panel (a) shows one-NFE samples drawn directly from the closed-form GMM surrogate πe in DC-AE latent space and decoded; the global colour and pose distributions are correct but per-pixel detail is blurry. Panel (b) shows the same seeds after the correction flow integrates the residual for L=50 inner steps; faces sharpen, eyes/teeth gain high-frequency structure, and … view at source ↗

**Figure 12.** Figure 12: Memorization probe (CelebA-HQ). For each of several generated CCVFM samples (left column), we [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗

**Figure 13.** Figure 13: Goodness-of-fit diagnostics in Inception feature space (MNIST, [PITH_FULL_IMAGE:figures/full_fig_p032_13.png] view at source ↗

**Figure 14.** Figure 14: Cumulative-distribution visualizations of the [PITH_FULL_IMAGE:figures/full_fig_p034_14.png] view at source ↗

**Figure 15.** Figure 15: Symmetry diagnostic: Ge→Te vs. Te→Ge and Ge→Tr vs. Tr→Ge distance distributions, Inception feature space. Symmetry of both pairs (KS< 0.031 in the main-paper table) indicates no mode dropping and no coverage deficit. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗

**Figure 16.** Figure 16: Pixel-space CDF visualization, complementary to Figure 14. [PITH_FULL_IMAGE:figures/full_fig_p035_16.png] view at source ↗

**Figure 17.** Figure 17: Pixel-space version of the three-panel diagnostic (histogram + memorization scatter + P/R bars) [PITH_FULL_IMAGE:figures/full_fig_p035_17.png] view at source ↗

read the original abstract

We propose Coreset-Induced Conditional Velocity Flow Matching (CCVFM), a generative model that augments hierarchical rectified flow with a data-informed source distribution. Hierarchical flow matching models the full conditional velocity law in velocity space, but its inner flow is asked to transport isotropic Gaussian noise to a multimodal target velocity distribution from scratch. Our key observation is that this inner source can be replaced by a closed-form surrogate built from a coreset of the target. CCVFM first compresses the target into weighted atoms using an entropic Sinkhorn coreset and lifts them to a Gaussian mixture. The induced conditional velocity law is then a closed-form Gaussian mixture that can be sampled without a learned neural sampler. A lightweight correction flow, trained from this exact surrogate source, then refines the remaining surrogate-to-target residual rather than learning an entire noise-to-data map. We prove that the surrogate transport cost equals the target--surrogate Wasserstein gap under an explicit compression assumption, whereas the noise-source analogue has a dimension-scale lower bound. We further characterize the conditional second moment of the direct surrogate-source training target and show that its source-dependent excess is small when the surrogate conditional law is close to the true conditional velocity law in mean and covariance. Empirically, on MNIST, CIFAR-10, ImageNet-32, and CelebA-HQ, the proposed method reaches competitive few-step generation under matched architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Coreset-Induced Conditional Velocity Flow Matching (CCVFM), which augments hierarchical rectified flow matching by replacing the isotropic Gaussian noise source with a closed-form Gaussian mixture surrogate derived from an entropic Sinkhorn coreset of the target velocity distribution. It claims to prove that the surrogate transport cost equals the target-surrogate Wasserstein gap under an explicit compression assumption (whereas the noise-source analogue has a dimension-scale lower bound), characterizes the conditional second moment of the direct surrogate-source training target showing small source-dependent excess when the surrogate is close in mean and covariance, and reports competitive few-step generation results on MNIST, CIFAR-10, ImageNet-32, and CelebA-HQ under matched architectures.

Significance. If the proofs hold and the compression assumption is tight in practice, the approach could provide a principled reduction in the complexity of learning full noise-to-data maps in flow-based models by leveraging a data-informed surrogate source, with the explicit transport-cost equality and second-moment analysis offering analytical advantages. The closed-form surrogate and competitive empirical results on standard datasets are potential strengths for reproducibility, though overall significance hinges on verifying the assumption beyond the stated equality.

major comments (2)

[Theory section] Theory section (proofs of transport cost equality and second-moment characterization): The equality between surrogate transport cost and target-surrogate Wasserstein gap is derived precisely under the explicit compression assumption, but no quantitative bound is supplied on coreset size or residual size needed to keep the excess negligible for multimodal high-dimensional velocity distributions; this is load-bearing for the central claim that the correction flow remains lightweight.
[Experiments section] Experiments section: Competitive results are reported on four datasets, but the evaluation lacks error-bar details, ablation studies on coreset size, and specification of the Sinkhorn regularization parameter; without these, it is difficult to isolate the contribution of the coreset-induced surrogate versus the correction network.

minor comments (2)

[Method] The lifting step from weighted coreset atoms to the Gaussian mixture surrogate would benefit from an explicit formula or pseudocode in the main text to clarify sampling without a learned neural network.
[Notation] Notation for the conditional velocity law and the 'direct surrogate-source training target' could be unified across the abstract and theory to avoid minor ambiguity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's detailed feedback on our manuscript. We have carefully considered the major comments regarding the theory and experiments sections. Below, we provide point-by-point responses and indicate the revisions we plan to make in the revised manuscript.

read point-by-point responses

Referee: [Theory section] Theory section (proofs of transport cost equality and second-moment characterization): The equality between surrogate transport cost and target-surrogate Wasserstein gap is derived precisely under the explicit compression assumption, but no quantitative bound is supplied on coreset size or residual size needed to keep the excess negligible for multimodal high-dimensional velocity distributions; this is load-bearing for the central claim that the correction flow remains lightweight.

Authors: We thank the referee for highlighting this aspect. The proofs are indeed derived under the explicit compression assumption, which we state clearly in the manuscript. While we do not provide quantitative bounds on the coreset size in the current version, the assumption allows us to equate the costs exactly when it holds. In practice, we select the coreset size to achieve a small residual as measured by the Wasserstein gap in our experiments. We agree that adding a discussion on how the coreset size affects the excess and empirical guidelines for choosing it would strengthen the paper. We will revise the theory section to include such a discussion and note the dependence on the assumption more prominently. revision: partial
Referee: [Experiments section] Experiments section: Competitive results are reported on four datasets, but the evaluation lacks error-bar details, ablation studies on coreset size, and specification of the Sinkhorn regularization parameter; without these, it is difficult to isolate the contribution of the coreset-induced surrogate versus the correction network.

Authors: We acknowledge these omissions in the experimental evaluation. In the revised manuscript, we will include error bars computed from multiple independent runs for the reported metrics. We will also add ablation studies varying the coreset size to demonstrate its impact on generation quality and training efficiency. Additionally, we will specify the Sinkhorn regularization parameter used in all experiments. These additions should help clarify the contribution of the surrogate source. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations rest on explicit assumptions and independent identities

full rationale

The paper derives the surrogate transport cost equaling the target-surrogate Wasserstein gap under a stated explicit compression assumption, and characterizes conditional second-moment excess via mean/covariance closeness. These steps are mathematical identities conditioned on the assumption rather than self-definitional reductions, fitted parameters renamed as predictions, or load-bearing self-citations. The coreset is constructed data-driven via Sinkhorn but the claimed equalities and characterizations follow from the assumption without circular redefinition of the target. No uniqueness theorems or ansatzes are smuggled via self-citation in the provided derivation chain.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on one domain assumption about coreset compression quality and two free parameters that control the surrogate construction; no new physical entities are postulated.

free parameters (2)

coreset size
Number of weighted atoms chosen to compress the target velocity distribution; controls fidelity of the Gaussian mixture surrogate.
Sinkhorn regularization parameter
Entropic regularization strength in the Sinkhorn algorithm used to compute coreset weights.

axioms (1)

domain assumption The target conditional velocity distribution admits a useful approximation by a finite Gaussian mixture lifted from an entropic Sinkhorn coreset.
Invoked to guarantee that the surrogate transport cost equals the Wasserstein gap and that the residual correction remains lightweight.

invented entities (1)

Coreset-induced Gaussian mixture surrogate source no independent evidence
purpose: Data-informed replacement for isotropic Gaussian noise in the inner hierarchical flow.
Constructed from the coreset; no independent falsifiable evidence supplied beyond the compression assumption.

pith-pipeline@v0.9.0 · 5549 in / 1559 out tokens · 65637 ms · 2026-05-14T18:52:26.806431+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove that the surrogate transport cost equals the target–surrogate Wasserstein gap under an explicit compression assumption
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the induced conditional velocity law is then a closed-form Gaussian mixture

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 2 internal anchors

[1]

Stochastic interpolants: A unifying framework for flows and diffusions.Journal of Machine Learning Research, 26(209):1–80, 2025

Michael Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.Journal of Machine Learning Research, 26(209):1–80, 2025

work page 2025
[2]

Reconstructing training data with informed adversaries

Borja Balle, Giovanni Cherubin, and Jamie Hayes. Reconstructing training data with informed adversaries. In IEEE Symposium on Security and Privacy (S&P), 2022

work page 2022
[3]

Pros and cons of GAN evaluation measures

Ali Borji. Pros and cons of GAN evaluation measures. InComputer Vision and Image Understanding, 2019

work page 2019
[4]

Extracting training data from diffusion models

Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramer, Borja Balle, Daphne Ippolito, and Eric Wallace. Extracting training data from diffusion models. In32nd USENIX Security Symposium (USENIX Security 23), 2023

work page 2023
[5]

A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets

Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. A downsampled variant of ImageNet as an alternative to the CIFAR datasets. InarXiv preprint arXiv:1707.08819, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[6]

Wasserstein measure coresets.arXiv preprint arXiv:1805.07412, 2018

Sebastian Claici, Aude Genevay, and Justin Solomon. Wasserstein measure coresets.arXiv preprint arXiv:1805.07412, 2018

work page arXiv 2018
[7]

Sinkhorn distances: Lightspeed computation of optimal transport.Advances in Neural Information Processing Systems, 2013

Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport.Advances in Neural Information Processing Systems, 2013

work page 2013
[8]

Density estimation using Real NVP

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using Real NVP. InProceedings of the International Conference on Learning Representations (ICLR), 2017

work page 2017
[9]

Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InAdvances in Neural Information Processing Systems, 2014

work page 2014
[10]

Springer, 2000

Siegfried Graf and Harald Luschgy.Foundations of Quantization for Probability Distributions, volume 1730 of Lecture Notes in Mathematics. Springer, 2000

work page 2000
[11]

Pengsheng Guo and Alexander G. Schwing. Variational rectified flow matching. InInternational Conference on Machine Learning, 2025

work page 2025
[12]

GANs trained by a two time-scale update rule converge to a local Nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. InAdvances in Neural Information Processing Systems, 2017

work page 2017
[13]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, 2020

work page 2020
[14]

Elucidating the design space of diffusion-based generative models

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. InAdvances in Neural Information Processing Systems, 2022

work page 2022
[15]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[16]

Improved precision and recall metric for assessing generative models

Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved precision and recall metric for assessing generative models. InAdvances in Neural Information Processing Systems, 2019

work page 2019
[17]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InInternational Conference on Learning Representations, 2023

work page 2023
[18]

Flow straight and fast: Learning to generate and transfer data with rectified flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InInternational Conference on Learning Representations, 2023

work page 2023
[19]

A non-parametric test to detect data-copying in generative models

Casey Meehan, Kamalika Chaudhuri, and Sanjoy Dasgupta. A non-parametric test to detect data-copying in generative models. InProceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), 2020. 10

work page 2020
[20]

Reliable fidelity and diversity metrics for generative models

Muhammad Ferjad Naeem, Seong Joon Oh, Youngjung Uh, Yunjey Choi, and Jaejun Yoo. Reliable fidelity and diversity metrics for generative models. InInternational Conference on Machine Learning, 2020

work page 2020
[21]

Aram-Alexandre Pooladian, Heli Ben-Hamu, Carles Domingo-Enrich, Ricky T. Q. Chen, and Yaron Lipman. Multisample flow matching: Straightening flows with minibatch couplings.arXiv preprint arXiv:2304.14772, 2023

work page arXiv 2023
[22]

Variational inference with normalizing flows

Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows. InProceedings of the 32nd International Conference on Machine Learning (ICML), 2015

work page 2015
[23]

Mehdi S. M. Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, and Sylvain Gelly. Assessing generative models via precision and recall. InAdvances in Neural Information Processing Systems, 2018

work page 2018
[24]

Progressive distillation for fast sampling of diffusion models

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Conference on Learning Representations, 2022

work page 2022
[25]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score- based generative modeling through stochastic differential equations. InInternational Conference on Learning Representations, 2021

work page 2021
[26]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. InInternational Conference on Machine Learning, 2023

work page 2023
[27]

Improving and generalizing flow-based generative models with minibatch optimal transport

Alexander Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yan Zhang, Guillaume Huguet, Guy Wolf, Yoshua Bengio, and Aaron Courville. Improving and generalizing flow-based generative models with minibatch optimal transport. InTransactions on Machine Learning Research, 2024

work page 2024
[28]

Optimal transport: Old and new.Grundlehren der mathematischen Wissenschaften, 338, 2009

Cédric Villani. Optimal transport: Old and new.Grundlehren der mathematischen Wissenschaften, 338, 2009

work page 2009
[29]

Wasserstein coreset via sinkhorn loss.Transactions on Machine Learning Research, 2025

Haoyun Yin, Yixuan Qiu, and Xiao Wang. Wasserstein coreset via sinkhorn loss.Transactions on Machine Learning Research, 2025. URLhttps://openreview.net/forum?id=DrMCDS88IL

work page 2025
[30]

U-Net width

Yichi Zhang, Yici Yan, Alex Schwing, and Zhizhen Zhao. Towards hierarchical rectified flow. InInternational Conference on Learning Representations, 2025. A Proofs A.1 Notation Throughout, X0 ∼ρ 0 = N (0, Id)and X1 ∼ρ 1 are independent, Xt = (1 −t )X0 + tX1, V = X1 −X 0. The true conditional velocity law isπ(v|x, t ); the surrogate iseπ(v|x, t )induced by ...

work page 2025
[31]

the InceptionV3 pool featureϕ(x) ∈R 2048 for every image inA∪B (we use the same pool features that enter the FID computation)

work page 2048
[32]

the1-nearest-neighbour distance dA→B(a) = minb∈B ∥ϕ(a) −ϕ (b)∥2 for every a∈A , obtained via brute- force exact search (feasible atNpool = 10,000)

work page
[33]

Pixel-space versions use ϕ(x) = vec(x)with L2 norm directly on the784-dimensional pixel vector (no preprocessing)

summary statistics: sample mean, median, the empirical distribution, and for any two ordered pairs (A, B)and(A ′, B′)we report the Kolmogorov–Smirnov statistic and the1-Wasserstein distance between {dA→B(a) :a∈A}and{d A′→B′(a′) :a ′ ∈A ′}. Pixel-space versions use ϕ(x) = vec(x)with L2 norm directly on the784-dimensional pixel vector (no preprocessing). 32...

work page 2000