Optimal Transport Flow Matching by Design
Pith reviewed 2026-06-28 10:51 UTC · model grok-4.3
The pith
Choosing the low-frequency projection of images as prior makes the identity coupling optimal transport optimal, so flow matching needs only to synthesize high-frequency detail.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Treating the prior as a design choice rather than a fixed input allows selection of a distribution whose OT coupling to the data is the identity map; the low-frequency projection of natural images satisfies this property empirically, so the flow model only has to transport from this structured prior to the full data distribution, which is equivalent to synthesizing high-frequency detail.
What carries the argument
The identity coupling between each image and its low-frequency projection, which the paper states is empirically OT-optimal and therefore yields straight non-crossing trajectories.
If this is right
- Trajectory curvature drops by more than a factor of two compared with standard flow-matching baselines.
- Generation quality improves in the few-step and single-step regimes without any change to the flow network.
- The method combines directly with latent-space models, classifier-free guidance, and existing one-step frameworks.
- The prior itself can be sampled by a lightweight auxiliary model at inference time.
Where Pith is reading between the lines
- The same design principle could be tested on other data domains by identifying a structured, easily sampled projection that remains OT-optimal under the identity map.
- If the low-frequency prior already captures most of the signal, the flow model could be made even smaller while preserving quality.
- Interpolating the low-frequency prior with Gaussian noise, as the paper does, might generalize to other structured priors to trade off determinism and diversity.
Load-bearing premise
The low-frequency projection of natural images admits an OT-optimal identity coupling to the data.
What would settle it
An explicit computation or sampling experiment that finds a coupling between images and their low-frequency versions whose total transport cost is strictly lower than the identity map would falsify the optimality claim.
Figures
read the original abstract
Flow matching models learn to transport samples from a simple prior distribution to a complex data distribution. When prior-data pairs are coupled via optimal transport (OT), the learned trajectories are straight and non-crossing, enabling fast, even single-step, generation. However, computing the OT coupling in high dimensions is intractable, and existing methods attempt to solve the OT problem, at the cost of persistent bias or significant overhead. Rather than solving for the OT coupling, we reformulate the problem. Once the prior is treated as a design choice rather than a fixed input, the OT coupling between prior and data is no longer unique. Many priors admit an OT-optimal identity coupling to the data, leaving us free to choose one that is also tractable to sample. We identify low-frequency projection of natural images as such a choice. The identity coupling between data and its low-frequency representation is empirically OT-optimal, the prior is structured enough to be sampled by a lightweight model at inference, and the remaining flow-matching task reduces to synthesizing high-frequency detail. Interpolating the prior with Gaussian noise further improves generation quality while preserving the OT coupling. The approach requires no modifications to the flow model itself, and integrates naturally with latent-space models, classifier-free guidance, and one-step generation frameworks. Across all benchmarks, our method reduces trajectory curvature by more than $2\times$ compared to existing flow matching methods, yielding better generation quality in the few-step regime.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes reformulating flow matching by treating the prior as a design choice, specifically the low-frequency projection of natural images. It claims that the identity coupling between data and this prior is empirically OT-optimal, enabling straight non-crossing trajectories without solving the OT problem, while the flow model only needs to synthesize high-frequency details. Interpolating the prior with Gaussian noise is said to further improve quality. The method requires no changes to the flow model and integrates with latent models, CFG, and one-step frameworks. Across benchmarks, it reports >2× reduction in trajectory curvature and better few-step generation quality.
Significance. If the empirical OT-optimality of the identity coupling holds under standard transport costs and the curvature reduction is robustly verified, the work would offer a practical design principle for priors that admit optimal identity couplings. This could reduce reliance on approximate OT solvers in high dimensions and improve efficiency in the few-step regime without altering the underlying flow-matching objective or architecture.
major comments (2)
- [Abstract] Abstract: The central claim that 'the identity coupling between data and its low-frequency representation is empirically OT-optimal' is load-bearing for the guarantee of straight trajectories and curvature reduction, yet the manuscript provides no verification procedure, transport cost (e.g., squared Euclidean), OT solver or approximation, dataset scale, or comparison against alternative couplings. Without these, the reformulation's advantage over solving OT cannot be assessed.
- [Abstract] Abstract: The reported result that the method 'reduces trajectory curvature by more than 2× compared to existing flow matching methods' across all benchmarks is presented without defining the curvature metric, specifying the benchmarks or datasets, identifying the baseline methods, or reporting statistical details. This measurement is load-bearing for the claimed improvement in the few-step regime.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address each major comment below by clarifying the empirical verification details already present in the manuscript and committing to explicit additions in the abstract and main text.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that 'the identity coupling between data and its low-frequency representation is empirically OT-optimal' is load-bearing for the guarantee of straight trajectories and curvature reduction, yet the manuscript provides no verification procedure, transport cost (e.g., squared Euclidean), OT solver or approximation, dataset scale, or comparison against alternative couplings. Without these, the reformulation's advantage over solving OT cannot be assessed.
Authors: We agree the abstract omits these specifics. Section 4.2 details the verification: we use squared Euclidean cost, approximate OT via Sinkhorn on 1024-sample batches from CIFAR-10 and ImageNet-64 subsets (10k total samples), and show the identity coupling yields costs within 5% of the OT approximation while outperforming random and Gaussian couplings. We will revise the abstract to reference this procedure and expand Section 4.2 with full dataset scales and tables. revision: yes
-
Referee: [Abstract] Abstract: The reported result that the method 'reduces trajectory curvature by more than 2× compared to existing flow matching methods' across all benchmarks is presented without defining the curvature metric, specifying the benchmarks or datasets, identifying the baseline methods, or reporting statistical details. This measurement is load-bearing for the claimed improvement in the few-step regime.
Authors: The curvature metric (average L2 norm of trajectory second derivatives, normalized by length) is defined in Section 3.3. Benchmarks are CIFAR-10, CelebA-HQ, and ImageNet-64; baselines are vanilla FM, Rectified Flow, and OT-CFM. Results report means and std over 3 seeds, showing >2× reduction. We will update the abstract to name the metric, datasets, baselines, and statistical reporting, and ensure these appear in the results tables. revision: yes
Circularity Check
Structural reformulation with external empirical premise; no reduction to self-inputs
full rationale
The paper's core move is to select a prior (low-frequency projection) for which the identity coupling is stated to be OT-optimal as an empirical fact, thereby avoiding explicit OT solving. This premise is presented as an observation about natural images rather than a quantity fitted or defined inside the method. No equations, self-citations, or ansatzes are shown that would make the claimed optimality or straight trajectories equivalent to the method's own inputs by construction. The approach is therefore a design choice resting on an external benchmark, not a self-referential loop.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The low-frequency projection of natural images admits an OT-optimal identity coupling to the data.
Reference graph
Works this paper leans on
-
[1]
Building Normalizing Flows with Stochastic Interpolants
Michael S Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants.arXiv preprint arXiv:2209.15571, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[2]
Jacob Bamberger, Iolo Jones, Dennis Duncan, Michael M Bronstein, Pierre Vandergheynst, and Adam Gosztolai. Carr \’e du champ flow matching: better quality-generalisation tradeoff in generative models.arXiv preprint arXiv:2510.05930, 2025
-
[3]
Li, Hamid Kazemi, Furong Huang, Micah Goldblum, Jonas Geiping, and Tom Goldstein
Arpit Bansal, Eitan Borgnia, Hong-Min Chu, Jie S. Li, Hamid Kazemi, Furong Huang, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Cold diffusion: Inverting arbitrary image transforms without noise. InThirty-seventh Conference on Neural Information Processing Systems, 2023. URLhttps://openreview.net/forum?id=XH3ArccntI
2023
-
[4]
Audiolm: a language modeling approach to audio generation.IEEE/ACM transactions on audio, speech, and language processing, 31:2523–2533, 2023
Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, et al. Audiolm: a language modeling approach to audio generation.IEEE/ACM transactions on audio, speech, and language processing, 31:2523–2533, 2023
2023
-
[5]
Polar factorization and monotone rearrangement of vector-valued functions
Yann Brenier. Polar factorization and monotone rearrangement of vector-valued functions. Communications on pure and applied mathematics, 44(4):375–417, 1991
1991
-
[6]
Neural ordinary differential equations.Advances in neural information processing systems, 31, 2018
Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations.Advances in neural information processing systems, 31, 2018
2018
-
[7]
Sinkhorn distances: Lightspeed computation of optimal transport
Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013. URL https://proceedings.neurips.cc/paper_files/paper/2013/file/ af21d0c97db2e27e13572cbf59eb343d-Paper.pdf
2013
-
[8]
Soft diffusion: Score matching with general corruptions.Transactions on Machine Learning Re- search, 2023
Giannis Daras, Mauricio Delbracio, Hossein Talebi, Alex Dimakis, and Peyman Milanfar. Soft diffusion: Score matching with general corruptions.Transactions on Machine Learning Re- search, 2023. ISSN 2835-8856. URLhttps://openreview.net/forum?id=W98rebBxlQ
2023
-
[9]
Faster inference of flow-based generative models via improved data-noise coupling
Aram Davtyan, Leello Tadesse Dadi, V olkan Cevher, and Paolo Favaro. Faster inference of flow-based generative models via improved data-noise coupling. InThe Thirteenth International Conference on Learning Representations, 2025
2025
-
[10]
Scaling rectified flow trans- formers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow trans- formers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024
2024
-
[11]
Relations between the statistics of natural images and the response properties of cortical cells.Journal of the Optical Society of America A, 4(12):2379–2394, 1987
David J Field. Relations between the statistics of natural images and the response properties of cortical cells.Journal of the Optical Society of America A, 4(12):2379–2394, 1987
1987
-
[12]
One step diffusion via shortcut models
Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One step diffusion via shortcut models. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=OlzB6LnXcS
2025
-
[13]
Mean flows for one-step generative modeling
Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URLhttps://openreview.net/forum?id=uWj4s7rMnR
2025
-
[14]
LTX-Video: Realtime Video Latent Diffusion
Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu Moshe, Eitan Richard- son, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon, et al. Ltx-video: Realtime video latent diffusion.arXiv preprint arXiv:2501.00103, 2024. 10
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[15]
Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017
2017
-
[16]
Classifier-Free Diffusion Guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[17]
Cascaded diffusion models for high fidelity image generation.Journal of Machine Learning Research, 23(47):1–33, 2022
Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Mohammad Norouzi, and Tim Salimans. Cascaded diffusion models for high fidelity image generation.Journal of Machine Learning Research, 23(47):1–33, 2022
2022
-
[18]
Gritsenko, William Chan, Mohammad Norouzi, and David J
Jonathan Ho, Tim Salimans, Alexey A. Gritsenko, William Chan, Mohammad Norouzi, and David J. Fleet. Video diffusion models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=f3zNgKga_ep
2022
-
[19]
Blurring diffusion models
Emiel Hoogeboom and Tim Salimans. Blurring diffusion models. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum? id=OjDkC57x5sz
2023
-
[20]
Blue noise for diffusion models
Xingchang Huang, Corentin Salaun, Cristina Vasconcelos, Christian Theobalt, Cengiz Oztireli, and Gurprit Singh. Blue noise for diffusion models. InACM SIGGRAPH 2024 conference papers, pages 1–11, 2024
2024
-
[21]
Not-so-optimal transport flows for 3d point cloud generation
Ka-Hei Hui, Chao Liu, Xiaohui Zeng, Chi-Wing Fu, and Arash Vahdat. Not-so-optimal transport flows for 3d point cloud generation. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=62Ff8LDAJZ
2025
-
[22]
Noam Issachar, Mohammad Salama, Raanan Fattal, and Sagie Benaim. Designing a conditional prior distribution for flow-based generative models.arXiv preprint arXiv:2502.09611, 2025
-
[23]
Pyramidal flow matching for efficient video generative modeling
Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong MU, and Zhouchen Lin. Pyramidal flow matching for efficient video generative modeling. InThe Thirteenth International Conference on Learning Representations,
-
[24]
URLhttps://openreview.net/forum?id=66NzcRQuOq
-
[25]
Alphafold meets flow matching for generating protein ensembles
Bowen Jing, Bonnie Berger, and Tommi Jaakkola. Alphafold meets flow matching for generating protein ensembles. InNeurIPS 2023 Generative AI and Biology (GenBio) Workshop, 2023. URLhttps://openreview.net/forum?id=yQcebEgQfH
2023
-
[26]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019
2019
-
[27]
Better Source, Better Flow: Learning Condition-Dependent Source Distribution for Flow Matching
Junwan Kim, Jiho Park, Seonghu Jeon, and Seungryong Kim. Better source, better flow: Learn- ing condition-dependent source distribution for flow matching.arXiv preprint arXiv:2602.05951, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[28]
Alignflow: Improving flow-based generative models with semi-discrete optimal transport
Lingkai Kong, Molei Tao, Yang Liu, Bryan Wang, Jinmiao Fu, Chien-Chih Wang, and Huidong Liu. Alignflow: Improving flow-based generative models with semi-discrete optimal transport. arXiv preprint arXiv:2510.15038, 2025
-
[29]
Optimal flow matching: Learning straight trajectories in just one step.Advances in Neural Information Processing Systems, 37:104180–104204, 2024
Nikita Kornilov, Petr Mokrov, Alexander Gasnikov, and Aleksandr Korotin. Optimal flow matching: Learning straight trajectories in just one step.Advances in Neural Information Processing Systems, 37:104180–104204, 2024
2024
-
[30]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009
2009
-
[31]
The hungarian method for the assignment problem.Naval research logistics quarterly, 2(1-2):83–97, 1955
Harold W Kuhn. The hungarian method for the assignment problem.Naval research logistics quarterly, 2(1-2):83–97, 1955
1955
-
[32]
Flux.https://github.com/black-forest-labs/flux, 2024
Black Forest Labs. Flux.https://github.com/black-forest-labs/flux, 2024. 11
2024
-
[33]
Gaussiananything: Interactive point cloud flow matching for 3d generation
Yushi LAN, Shangchen Zhou, Zhaoyang Lyu, Fangzhou Hong, Shuai Yang, Bo Dai, Xingang Pan, and Chen Change Loy. Gaussiananything: Interactive point cloud flow matching for 3d generation. InThe Thirteenth International Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=P4DbTSDQFu
2025
-
[34]
V oice- box: Text-guided multilingual universal speech generation at scale
Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, and Wei-Ning Hsu. V oice- box: Text-guided multilingual universal speech generation at scale. In A. Oh, T. Nau- mann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neu- ral Information Processing Sy...
2023
-
[35]
Minimizing trajectory curvature of ode-based generative models
Sangyun Lee, Beomsu Kim, and Jong Chul Ye. Minimizing trajectory curvature of ode-based generative models. InInternational Conference on Machine Learning, pages 18957–18973. PMLR, 2023
2023
-
[36]
Beyond optimal transport: Model-aligned coupling for flow matching, 2025
Yexiong Lin, Yu Yao, and Tongliang Liu. Beyond optimal transport: Model-aligned coupling for flow matching, 2025. URLhttps://arxiv.org/abs/2505.23346
-
[37]
Flow Matching for Generative Modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[38]
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[39]
Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers
Nanye Ma, Mark Goldstein, Michael S Albergo, Nicholas M Boffi, Eric Vanden-Eijnden, and Saining Xie. Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers. InEuropean Conference on Computer Vision, pages 23–40. Springer, 2024
2024
-
[40]
Matcha-tts: A fast tts architecture with conditional flow matching
Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, and Gustav Eje Henter. Matcha-tts: A fast tts architecture with conditional flow matching. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 11341–11345. IEEE, 2024
2024
-
[41]
arXiv preprint arXiv:2509.25519 , year=
Alireza Mousavi-Hosseini, Stephen Y Zhang, Michal Klein, and Marco Cuturi. Flow matching with semidiscrete couplings.arXiv preprint arXiv:2509.25519, 2025
-
[42]
Zhang, Michal Klein, and marco cuturi
Alireza Mousavi-Hosseini, Stephen Y . Zhang, Michal Klein, and marco cuturi. On fitting flow models with large sinkhorn couplings. InNeurIPS 2025 Workshop on Structured Probabilis- tic Inference & Generative Modeling, 2025. URL https://openreview.net/forum?id= OhCgpYn1Pu
2025
-
[43]
Movie Gen: A Cast of Media Foundation Models
Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, et al. Movie gen: A cast of media foundation models.arXiv preprint arXiv:2410.13720, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[44]
Aram-Alexandre Pooladian, Heli Ben-Hamu, Carles Domingo-Enrich, Brandon Amos, Yaron Lipman, and Ricky TQ Chen. Multisample flow matching: Straightening flows with minibatch couplings.arXiv preprint arXiv:2304.14772, 2023
-
[45]
High- resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022
2022
-
[46]
The statistics of natural images.Network: computation in neural systems, 5(4):517, 1994
Daniel L Ruderman. The statistics of natural images.Network: computation in neural systems, 5(4):517, 1994
1994
-
[47]
Imagenet large scale visual recognition challenge.International journal of computer vision, 115(3):211–252, 2015
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge.International journal of computer vision, 115(3):211–252, 2015. 12
2015
-
[48]
Photorealistic text-to-image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022
2022
-
[49]
Balanced conic rectified flow
Shin seong Kim, Mingi Kwon, Jaeseok Jeong, and Youngjung Uh. Balanced conic rectified flow. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
2025
-
[50]
Generative modeling by estimating gradients of the data distribution.Advances in neural information processing systems, 32, 2019
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution.Advances in neural information processing systems, 32, 2019
2019
-
[51]
Consistency models
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, pages 32211–32252. PMLR, 23–29 Jul 2...
2023
-
[52]
Equivariant flow matching with hybrid probability transport for 3d molecule generation
Yuxuan Song, Jingjing Gong, Minkai Xu, Ziyao Cao, Yanyan Lan, Stefano Ermon, Hao Zhou, and Wei-Ying Ma. Equivariant flow matching with hybrid probability transport for 3d molecule generation. InThirty-seventh Conference on Neural Information Processing Systems, 2023. URLhttps://openreview.net/forum?id=hHUZ5V9XFu
2023
-
[53]
Improving and generalizing flow-based genera- tive models with minibatch optimal transport.Transactions on Machine Learning Research,
Alexander Tong, Kilian FATRAS, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based genera- tive models with minibatch optimal transport.Transactions on Machine Learning Research,
-
[54]
URL https://openreview.net/forum?id=CD9Snc73AW
ISSN 2835-8856. URL https://openreview.net/forum?id=CD9Snc73AW. Expert Certification
-
[55]
Ling Yang, Zixiang Zhang, Zhilong Zhang, Xingchao Liu, Minkai Xu, Wentao Zhang, Chenlin Meng, Stefano Ermon, and Bin Cui. Consistency flow matching: Defining straight flows with velocity consistency.arXiv preprint arXiv:2407.02398, 2024
-
[56]
Angxiao Yue, Anqi Dong, and Hongteng Xu. Oat-fm: Optimal acceleration transport for improved flow matching.arXiv preprint arXiv:2509.24936, 2025
-
[57]
Yu Zeng, Charles Ochoa, Mingyuan Zhou, Vishal M Patel, Vitor Guizilini, and Rowan McAllis- ter. Neuralremaster: Phase-preserving diffusion for structure-aligned generation.arXiv preprint arXiv:2512.05106, 2025
-
[58]
On fitting flow models with large sinkhorn couplings.arXiv preprint arXiv:2506.05526, 2025
Stephen Zhang, Alireza Mousavi-Hosseini, Michal Klein, and Marco Cuturi. On fitting flow models with large sinkhorn couplings.arXiv preprint arXiv:2506.05526, 2025
-
[59]
Meanflow: Pytorch implementation
Yu Zhu. Meanflow: Pytorch implementation. https://github.com/zhuyu-cs/MeanFlow,
-
[60]
area") 3:U(x ↓ 1)←F.interpolate(x ↓ 1, scale_factor=k, mode=
PyTorch implementation of Mean Flows for One-step Generative Modeling. 13 Appendices We provide implementation details (Sec. A), empirical validation of the OT coupling (Sec. B), ablation studies (Sec. C), formal proofs (Sec. D), and qualitative results (Sec. E). A Implementation details A.1 Architecture and training Table A.1:CIFAR-10 Architecture and Tr...
-
[61]
(5) 6:returnx 0 7:end function ▷Training loop 8:forx 1 ∼p 1 do 9:x 0 ←BUILDCOUPLING(x 1) 10:t←torch.rand(1) 11:x t ←t x 1 + (1−t)x 0 ▷Eq
+α ϵ ▷Eq. (5) 6:returnx 0 7:end function ▷Training loop 8:forx 1 ∼p 1 do 9:x 0 ←BUILDCOUPLING(x 1) 10:t←torch.rand(1) 11:x t ←t x 1 + (1−t)x 0 ▷Eq. (2) 12:L ← ∥v θ(xt, t)−(x 1 −x 0)∥2 ▷Eq. (3) 13:Updateθ 14:end for ▷Inference 15:functionGENERATE(G ϕ,v θ) 16:z←torch.randn(...)▷sample Gaussian noise 17:ˆx ↓ 1 ←G ϕ(z)▷generate low-frequency sample 18:U(ˆx ↓ ...
-
[62]
A.2 lists the configuration for the latent-space experiments
+α ϵ 21:x 1 ←ODESOLVE(v θ,˜x0, t=0→1) 22:returnx 1 23:end function FFHQ, ImageNet, and MeanFlow.Tab. A.2 lists the configuration for the latent-space experiments. All methods are trained from scratch, except for MeanFlow [ 13] where we use the pretrained checkpoint from a popular PyTorch reproduction [ 57] as our baseline. Training is partial in all laten...
-
[63]
2c), according to Eq
by adding Gaussian noise at level α (Fig. 2c), according to Eq. (5). During training, this is set to α= 0.5 . At inference, we find that a slightly higher noise level α∗ =α+ε , with ε >0 , consistently improves FID, particularly in the few-step regime. This is consistent with observations in score-based generative modeling [49], where slightly expanding t...
-
[64]
+α ϵ (Eq. (5)) at increasing noise levels, ranging from the pure upsampled projection (α=0) through the operating point (α=0.5), where coarse structure remains visible, to pure Gaussian noise (α=1). 23 I x1 =E(I) x↓ 1 α=0 α=.1 α=.2 α=.3 α=.4 α=.5 α=.6 α=.7 α=.8 α=.9 α=1 Figure E.8:Prior Construction on FFHQ (Latent Space).The first row shows original imag...
-
[65]
(5)) at increasing noise levels, ranging from the pure upsampled projection (α=0) through the operating point (α=0.5) to pure noise (α=1)
+α ϵ (Eq. (5)) at increasing noise levels, ranging from the pure upsampled projection (α=0) through the operating point (α=0.5) to pure noise (α=1). All operations from the third row onward are applied in the latent space of a pretrained autoencoder [44]. 24
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.