On Variance Reduction in Learning Mean Flows

Juanwu Lu; Ziran Wang

arxiv: 2605.09235 · v1 · submitted 2026-05-10 · 💻 cs.LG · cs.AI· stat.ML

On Variance Reduction in Learning Mean Flows

Juanwu Lu , Ziran Wang This is my paper

Pith reviewed 2026-05-12 05:08 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords meanflowvariance reductioncontrol variateflow matchingone-step generationgenerative modelsdiffusion transformer

0 comments

The pith

Correcting the coefficient on the conditional velocity field stabilizes MeanFlow training and improves sample quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

MeanFlow training for one-step generative models suffers from instability, with losses that do not decrease and gradients whose variance grows without bound. The root cause is that the conditional velocity field serves two statistical roles at once in the training objective: it is the target for a regression loss and it also acts as a control variate inside a Monte Carlo estimate of a vector-Jacobian product. The original formulation uses an incorrect scaling for the second role. Deriving the statistically optimal scaling in closed form both explains why several recent ad-hoc fixes work and delivers measurable gains in sample quality. A controlled study on toy problems and on a Diffusion Transformer shows that the corrected coefficient yields up to 54 percent better samples on two-dimensional data and produces steadily improving FID scores across training checkpoints.

Core claim

The paper establishes that the pathology of MeanFlow training originates from an incorrect coefficient multiplying the conditional velocity field inside the loss. This field simultaneously provides the regression target and serves as a Monte Carlo control variate for the Jacobi-vector product; the original loss assigns it the wrong weight for the control-variate term. The authors derive the optimal coefficient in closed form and demonstrate that a range of concurrent stabilization techniques are merely different practical implementations of this same optimum. Empirical sweeps on two-dimensional benchmarks and latent Diffusion Transformers recover the predicted ordering of bias and variance.

What carries the argument

The closed-form optimal coefficient that correctly weights the conditional velocity field when it functions as a control variate in the vector-Jacobian product term of the loss.

Load-bearing premise

The conditional velocity field simultaneously acts as an unbiased regression target and as a Monte Carlo control variate whose coefficient in the loss must be chosen separately from the regression term.

What would settle it

If training with the derived optimal coefficient on the reported two-dimensional benchmarks fails to produce both lower gradient variance and higher sample quality than the original MeanFlow loss, the attribution of the instability to the mis-specified coefficient would be falsified.

Figures

Figures reproduced from arXiv: 2605.09235 by Juanwu Lu, Ziran Wang.

**Figure 1.** Figure 1: Spatial distribution of p Tr(Σv′ |xt) = p Ex0|xt ∥v ′∥ 2 at three timesteps on a twodimensional Gaussian mixture. Conditional variances concentrate in mode-mixing regions. In the original MeanFlow, the stop-gradient operator prevents J from being passed to the optimizer, leading to an empirical non-decreasing loss. Meanwhile, the mean-field difference vanishes at convergence (rθ → 0) and the variance-dri… view at source ↗

**Figure 2.** Figure 2: Empirical total gradient variance Tr(Cov[∇θℓMF]) on six two-dimensional toy datasets with β ∈ {0, 0.25, 0.5, 0.75, 1}. The monotonic decrease of variances with respect to β on almost every dataset aligns with the prediction in Theorem 2. sion target vcond unchanged, exploiting the role asymmetry identified in section 3.1. Appendix C provides details about the full loss, training algorithm, and three propos… view at source ↗

**Figure 3.** Figure 3: Empirical sample-quality measured by sliced Wasserstein- [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Experiment results training DiT-B/4 on ImageNet- [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: 100 class-conditional samples from the β = 0 baseline checkpoint at step 300k (FID 11.37). Same noise seed and class labels as figs. 6 and 7. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

**Figure 6.** Figure 6: 100 class-conditional samples from the β = 0.5 checkpoint at step 300k (FID 12.51). Same noise seed and class labels as figs. 5 and 7. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: 100 class-conditional samples from the β = 1 corner checkpoint at step 300k (FID 23.36). Same noise seed and class labels as figs. 5 and 6. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

read the original abstract

One-step generative modeling has emerged as a leading approach to amortize the inference cost of diffusion and flow-matching models. Among distillation-free methods, MeanFlow training is notoriously unstable, with non-decreasing loss and unbounded gradient variance. In this work, we establish a theory that attributes this pathology to a misuse of the conditional velocity field: it plays two distinct statistical roles in the loss, both as an unbiased regression target and as a Monte Carlo control variate inside a Jacobi-vector product, with the original loss assigning the wrong coefficient to the latter. We derive the optimal coefficient in closed form, and show that a family of fixes in concurrent works corresponds to different practical realizations of the same optimum. A controlled sweep of this coefficient on two-dimensional benchmarks and on a latent Diffusion Transformer recovers the predicted bias-variance ordering. The optimal coefficient yields up to a %54 improvement in sample quality on two-dimensional benchmarks and a monotone FID trend at every matched-step DiT checkpoint. Crucially, the same DiT measurement also reveals a quantitative FID-MSE landscape mismatch: although gradient variance is minimized at an interior coefficient value, the coefficient that minimizes FID prefers the direct use of conditional velocity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives a closed-form coefficient to cut gradient variance in MeanFlow by treating the conditional velocity as both target and control variate, but its own sweeps show FID is lowest when that coefficient is set to direct use of the velocity, not at the variance minimum.

read the letter

The main point is that the authors model the conditional velocity field as serving two statistical jobs in the loss—an unbiased regression target and a Monte Carlo control variate inside the Jacobi-vector product—and solve for the coefficient that minimizes variance in closed form. They also note that several recent practical patches line up as different ways to realize the same optimum. That derivation and unification is the actual new piece here. The controlled sweeps on 2D benchmarks and latent DiT checkpoints recover the expected bias-variance ordering, which gives decent support for the modeling choice. Reporting the mismatch between the variance minimum and the FID minimum is also useful; it shows they looked at what actually matters downstream rather than stopping at the theory quantity. The soft spot is exactly that mismatch. Gradient variance bottoms out at an interior coefficient, yet FID keeps getting better toward the boundary that just uses the conditional velocity directly. This means the sample-quality gains (the 54% figure and the monotone FID trend) are not clearly produced by the variance reduction the theory targets. Other factors in optimization or sampling are likely doing more of the work, so the attribution of improvement to the derived coefficient is weaker than the abstract presents. The paper is aimed at people working on one-step generative models and flow-matching stability. Readers who want a theoretical account of why MeanFlow training blows up, or who are trying to connect empirical fixes, will find the derivation and the honest landscape check worth their time. It deserves a serious referee because the closed-form result is explicit and the reported disconnect between variance and FID is the kind of observation that needs discussion and follow-up experiments.

Referee Report

2 major / 2 minor

Summary. The paper claims that instability in MeanFlow training stems from the conditional velocity field being assigned an incorrect coefficient in the loss, as it simultaneously serves as an unbiased regression target and a Monte Carlo control variate within the Jacobi-vector product. The authors derive the variance-optimal coefficient in closed form, unify several concurrent fixes as alternative realizations of the same optimum, and report that controlled coefficient sweeps on 2D benchmarks and latent DiT models recover the predicted bias-variance ordering, deliver up to 54% sample-quality gains, and produce monotone FID trends, while explicitly noting a quantitative mismatch in which gradient variance is minimized at an interior coefficient but FID is minimized at the boundary value corresponding to direct use of the conditional velocity.

Significance. If the derivation is sound, the work supplies a principled statistical account of MeanFlow pathology and a parameter-free correction that could stabilize one-step generative models while explaining prior heuristics. The closed-form result and the unification of concurrent methods are clear strengths. The reported empirical mismatch between the variance minimum and the FID minimum, however, weakens the causal attribution of quality gains to the proposed coefficient and suggests that unmodeled optimization or sampling dynamics may be responsible for the observed improvements.

major comments (2)

[Abstract] Abstract: the manuscript states that gradient variance is minimized at an interior coefficient while FID is minimized by the boundary value that recovers direct conditional-velocity use. This quantitative mismatch between the quantity optimized by the theory (gradient variance) and the downstream metric (FID/sample quality) means the attribution of the reported 54% improvement and monotone FID trend to the derived coefficient is not fully supported; other factors may drive the gains.
[Theory derivation] The central derivation (presumably §3) models the conditional velocity as playing two distinct statistical roles and derives a closed-form coefficient that corrects only the control-variate role. It is unclear from the provided description whether this correction preserves unbiasedness of the regression target or introduces a new bias term; an explicit expansion of the loss and the Jacobi-vector product is needed to confirm that the optimum does not trade one source of bias for another.

minor comments (2)

[Abstract] The abstract contains a typographical error: '%54' should read '54%'.
[Experiments] Experimental sections should include the precise definition of the coefficient sweep range, the exact DiT checkpoint matching procedure, and raw variance/FID values (not only trends) to allow independent verification of the bias-variance ordering.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address the major comments point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: the manuscript states that gradient variance is minimized at an interior coefficient while FID is minimized by the boundary value that recovers direct conditional-velocity use. This quantitative mismatch between the quantity optimized by the theory (gradient variance) and the downstream metric (FID/sample quality) means the attribution of the reported 54% improvement and monotone FID trend to the derived coefficient is not fully supported; other factors may drive the gains.

Authors: We explicitly document this mismatch in the manuscript to present a complete empirical picture. The theory identifies the variance-minimizing coefficient, and our controlled sweeps on 2D benchmarks and latent DiT models confirm that this choice reduces gradient variance as predicted while delivering up to 54% sample-quality gains and monotone FID trends relative to the original loss. Although the FID minimum occurs at the boundary, the interior optimum still substantially outperforms the baseline, supporting that the coefficient correction mitigates a primary source of instability. We do not claim that variance reduction is the sole driver of FID gains and agree that additional optimization or sampling dynamics may contribute. revision: no
Referee: [Theory derivation] The central derivation (presumably §3) models the conditional velocity as playing two distinct statistical roles and derives a closed-form coefficient that corrects only the control-variate role. It is unclear from the provided description whether this correction preserves unbiasedness of the regression target or introduces a new bias term; an explicit expansion of the loss and the Jacobi-vector product is needed to confirm that the optimum does not trade one source of bias for another.

Authors: Section 3 separates the roles: the conditional velocity remains the unbiased regression target, while the derived coefficient optimizes only its use as a Monte Carlo control variate inside the Jacobi-vector product. The modification is variance-reducing and does not change the expectation of the estimator, thereby preserving unbiasedness of the overall gradient. To make this fully transparent, we will add an appendix containing the explicit expansion of the loss and the Jacobi-vector product term. revision: yes

Circularity Check

0 steps flagged

Closed-form derivation of optimal coefficient is self-contained from identified dual roles

full rationale

The paper identifies the conditional velocity field as playing two distinct statistical roles (unbiased regression target and Monte Carlo control variate in the Jacobi-vector product) and derives the optimal coefficient in closed form directly from this modeling choice. No step reduces the result to a fitted parameter, post-hoc data, or self-citation chain; the derivation is presented as first-principles analysis of the loss, with experiments serving only to validate the predicted bias-variance ordering rather than to construct the coefficient itself. The reported FID-MSE mismatch concerns empirical alignment with downstream metrics but does not render the mathematical derivation circular or equivalent to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions from Monte Carlo control variates and regression in generative modeling; no new free parameters or invented entities are introduced.

axioms (2)

domain assumption Conditional velocity field serves as unbiased regression target in the loss
Core statistical role invoked to explain the original pathology.
domain assumption Conditional velocity field serves as Monte Carlo control variate inside Jacobi-vector product
Second statistical role whose coefficient was misassigned in the original loss.

pith-pipeline@v0.9.0 · 5500 in / 1136 out tokens · 64776 ms · 2026-05-12T05:08:58.010667+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems , volume 33, pages 6840--6851. Curran Associates, Inc., 2020

work page 2020
[2]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations , 2021

work page 2021
[3]

Elucidating the design space of diffusion-based generative models

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems , volume 35, 2022

work page 2022
[4]

Diffusion models beat GAN s on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat GAN s on image synthesis. In Advances in Neural Information Processing Systems , volume 34, 2021

work page 2021
[5]

Scaling rectified flow transformers for high-resolution image synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M \"u ller, Harry Saini, Yam Levi, Dominik Lorenz, Naveen Rafi, Tim Shafir, et al. Scaling rectified flow transformers for high-resolution image synthesis. In Proceedings of the 41st International Conference on Machine Learning , volume 235 of Proceedings of Machine Learning Research , 2024

work page 2024
[6]

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K. Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems , volume 31, 2018

work page 2018
[7]

Variational inference with normalizing flows

Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In Proceedings of the 32nd International Conference on Machine Learning , volume 37 of Proceedings of Machine Learning Research , pages 1530--1538, 2015

work page 2015
[8]

Density estimation using R eal- NVP

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using R eal- NVP . In International Conference on Learning Representations , 2017

work page 2017
[9]

Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. FFJORD : Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations , 2019

work page 2019
[10]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling, 2023

work page 2023
[11]

Improving and generalizing flow-based generative models with minibatch optimal transport, 2024

Alexander Tong, Kilian Fatras, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport, 2024

work page 2024
[12]

Albergo and Eric Vanden-Eijnden

Michael S. Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants, 2023

work page 2023
[13]

Boffi, and Eric Vanden-Eijnden

Michael Albergo, Nicholas M. Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions. Journal of Machine Learning Research , 26(209):1--80, 2025

work page 2025
[14]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems , volume 32, 2019

work page 2019
[15]

Progressive distillation for fast sampling of diffusion models, 2022

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models, 2022

work page 2022
[16]

Freeman, and Taesung Park

Tianwei Yin, Micha \"e l Gharbi, Richard Zhang, Eli Shechtman, Fr \'e do Durand, William T. Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 6613--6623, June 2024

work page 2024
[17]

Tianwei Yin, Micha\" e l Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fr\' e do Durand, and William T. Freeman. Improved distribution matching distillation for fast image synthesis. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems , volume 37, pages 4745...

work page 2024
[18]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Proceedings of the 40th International Conference on Machine Learning , ICML'23. JMLR.org, 2023

work page 2023
[19]

Simplifying, stabilizing and scaling continuous-time consistency models, 2025

Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models, 2025

work page 2025
[20]

Boffi, Michael S

Nicholas M. Boffi, Michael S. Albergo, and Eric Vanden-Eijnden. Flow map matching with stochastic interpolants: A mathematical framework for consistency models, 2025

work page 2025
[21]

Zico Kolter, and Kaiming He

Zhengyang Geng, Mingyang Deng, Xingjian Bai, J. Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling, 2025

work page 2025
[22]

Alphaflow: Understanding and improving meanflow models, 2025

Huijie Zhang, Aliaksandr Siarohin, Willi Menapace, Michael Vasilkovsky, Sergey Tulyakov, Qing Qu, and Ivan Skorokhodov. Alphaflow: Understanding and improving meanflow models, 2025

work page 2025
[23]

Zico Kolter, and Kaiming He

Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J. Zico Kolter, and Kaiming He. Improved mean flows: On the challenges of fastforward generative models, 2025

work page 2025
[24]

Overcoming the curvature bottleneck in meanflow, 2026

Xinxi Zhang, Shiwei Tan, Quang Nguyen, Quan Dao, Ligong Han, Xiaoxiao He, Tunyu Zhang, Chengzhi Mao, Dimitris Metaxas, and Vladimir Pavlovic. Overcoming the curvature bottleneck in meanflow, 2026

work page 2026
[25]

Terminal velocity matching, 2026

Linqi Zhou, Mathias Parger, Ayaan Haque, and Jiaming Song. Terminal velocity matching, 2026

work page 2026
[26]

Functional mean flow in hilbert space, 2025

Zhiqi Li, Yuchen Sun, Greg Turk, and Bo Zhu. Functional mean flow in hilbert space, 2025

work page 2025
[27]

Monte Carlo methods in financial engineering , volume 53

Paul Glasserman. Monte Carlo methods in financial engineering , volume 53. Springer New York, NY, 2003

work page 2003
[28]

Estimation with quadratic loss

William James, Charles Stein, et al. Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability , volume 1, pages 361--379. University of California Press, 1961

work page 1961
[29]

Modular meanflow: Towards stable and scalable one-step generative modeling, 2025

Haochen You, Baojing Liu, and Hongyang He. Modular meanflow: Towards stable and scalable one-step generative modeling, 2025

work page 2025
[30]

Understanding, accelerating, and improving meanflow training, 2025

Jin-Young Kim, Hyojun Go, Lea Bogensperger, Julius Erbach, Nikolai Kalischek, Federico Tombari, Konrad Schindler, and Dominik Narnhofer. Understanding, accelerating, and improving meanflow training, 2025

work page 2025
[31]

Decoupled meanflow: Turning flow models into flow maps for accelerated sampling, 2025

Kyungmin Lee, Sihyun Yu, and Jinwoo Shin. Decoupled meanflow: Turning flow models into flow maps for accelerated sampling, 2025

work page 2025
[32]

Stable velocity: A variance perspective on flow matching, 2026

Donglin Yang, Yongxing Zhang, Xin Yu, Liang Hou, Xin Tao, Pengfei Wan, Xiaojuan Qi, and Renjie Liao. Stable velocity: A variance perspective on flow matching, 2026

work page 2026
[33]

Temporal pair consistency for variance-reduced flow matching, 2026

Chika Maduabuchi and Jindong Wang. Temporal pair consistency for variance-reduced flow matching, 2026

work page 2026
[34]

Preconditioned score and flow matching, 2026

Shadab Ahamed, Eshed Gal, Simon Ghyselincks, Md Shahriar Rahim Siddiqui, Moshe Eliasof, and Eldad Haber. Preconditioned score and flow matching, 2026

work page 2026
[35]

On the closed-form of flow matching: Generalization does not arise from target stochasticity, 2025

Quentin Bertrand, Anne Gagneux, Mathurin Massias, and R \'e mi Emonet. On the closed-form of flow matching: Generalization does not arise from target stochasticity, 2025

work page 2025
[36]

Flow straight and fast: Learning to generate and transfer data with rectified flow, 2022

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow, 2022

work page 2022
[37]

Improving the training of rectified flows

Sangyun Lee, Zinan Lin, and Giulia Fanti. Improving the training of rectified flows. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems , volume 37, pages 63082--63109. Curran Associates, Inc., 2024

work page 2024
[38]

Glynn and Roberto Szechtman

Peter W. Glynn and Roberto Szechtman. Some new perspectives on the method of control variates. Monte Carlo and Quasi-Monte Carlo Methods 2000 , pages 27--49, 2002

work page 2000
[39]

Human-level control through deep reinforcement learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, et al. Human-level control through deep reinforcement learning. Nature , 518:529--533, 2015

work page 2015
[40]

Bootstrap your own latent: A new approach to self-supervised learning

Jean-Bastien Grill, Florian Strub, Florent Altch \'e , et al. Bootstrap your own latent: A new approach to self-supervised learning. In NeurIPS , 2020

work page 2020
[41]

Polyak and Anatoli B

Boris T. Polyak and Anatoli B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization , 30(4):838--855, 1992

work page 1992
[42]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages 4195--4205, 2023

work page 2023
[43]

Riemannian meanflow, 2026

Dongyeop Woo, Marta Skreta, Seonghyun Park, Kirill Neklyudov, and Sungsoo Ahn. Riemannian meanflow, 2026

work page 2026
[44]

One step diffusion via shortcut models, 2025

Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One step diffusion via shortcut models, 2025

work page 2025
[45]

Optimal Transport: Old and New , volume 338

C \'e dric Villani et al. Optimal Transport: Old and New , volume 338. Springer Berlin, Heidelberg, 2009

work page 2009
[46]

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised learning results

Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised learning results. In Advances in Neural Information Processing Systems , volume 30, 2017

work page 2017

[1] [1]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems , volume 33, pages 6840--6851. Curran Associates, Inc., 2020

work page 2020

[2] [2]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations , 2021

work page 2021

[3] [3]

Elucidating the design space of diffusion-based generative models

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems , volume 35, 2022

work page 2022

[4] [4]

Diffusion models beat GAN s on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat GAN s on image synthesis. In Advances in Neural Information Processing Systems , volume 34, 2021

work page 2021

[5] [5]

Scaling rectified flow transformers for high-resolution image synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M \"u ller, Harry Saini, Yam Levi, Dominik Lorenz, Naveen Rafi, Tim Shafir, et al. Scaling rectified flow transformers for high-resolution image synthesis. In Proceedings of the 41st International Conference on Machine Learning , volume 235 of Proceedings of Machine Learning Research , 2024

work page 2024

[6] [6]

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K. Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems , volume 31, 2018

work page 2018

[7] [7]

Variational inference with normalizing flows

Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In Proceedings of the 32nd International Conference on Machine Learning , volume 37 of Proceedings of Machine Learning Research , pages 1530--1538, 2015

work page 2015

[8] [8]

Density estimation using R eal- NVP

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using R eal- NVP . In International Conference on Learning Representations , 2017

work page 2017

[9] [9]

Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. FFJORD : Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations , 2019

work page 2019

[10] [10]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling, 2023

work page 2023

[11] [11]

Improving and generalizing flow-based generative models with minibatch optimal transport, 2024

Alexander Tong, Kilian Fatras, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport, 2024

work page 2024

[12] [12]

Albergo and Eric Vanden-Eijnden

Michael S. Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants, 2023

work page 2023

[13] [13]

Boffi, and Eric Vanden-Eijnden

Michael Albergo, Nicholas M. Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions. Journal of Machine Learning Research , 26(209):1--80, 2025

work page 2025

[14] [14]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems , volume 32, 2019

work page 2019

[15] [15]

Progressive distillation for fast sampling of diffusion models, 2022

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models, 2022

work page 2022

[16] [16]

Freeman, and Taesung Park

Tianwei Yin, Micha \"e l Gharbi, Richard Zhang, Eli Shechtman, Fr \'e do Durand, William T. Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 6613--6623, June 2024

work page 2024

[17] [17]

Tianwei Yin, Micha\" e l Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fr\' e do Durand, and William T. Freeman. Improved distribution matching distillation for fast image synthesis. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems , volume 37, pages 4745...

work page 2024

[18] [18]

Consistency models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Proceedings of the 40th International Conference on Machine Learning , ICML'23. JMLR.org, 2023

work page 2023

[19] [19]

Simplifying, stabilizing and scaling continuous-time consistency models, 2025

Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models, 2025

work page 2025

[20] [20]

Boffi, Michael S

Nicholas M. Boffi, Michael S. Albergo, and Eric Vanden-Eijnden. Flow map matching with stochastic interpolants: A mathematical framework for consistency models, 2025

work page 2025

[21] [21]

Zico Kolter, and Kaiming He

Zhengyang Geng, Mingyang Deng, Xingjian Bai, J. Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling, 2025

work page 2025

[22] [22]

Alphaflow: Understanding and improving meanflow models, 2025

Huijie Zhang, Aliaksandr Siarohin, Willi Menapace, Michael Vasilkovsky, Sergey Tulyakov, Qing Qu, and Ivan Skorokhodov. Alphaflow: Understanding and improving meanflow models, 2025

work page 2025

[23] [23]

Zico Kolter, and Kaiming He

Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J. Zico Kolter, and Kaiming He. Improved mean flows: On the challenges of fastforward generative models, 2025

work page 2025

[24] [24]

Overcoming the curvature bottleneck in meanflow, 2026

Xinxi Zhang, Shiwei Tan, Quang Nguyen, Quan Dao, Ligong Han, Xiaoxiao He, Tunyu Zhang, Chengzhi Mao, Dimitris Metaxas, and Vladimir Pavlovic. Overcoming the curvature bottleneck in meanflow, 2026

work page 2026

[25] [25]

Terminal velocity matching, 2026

Linqi Zhou, Mathias Parger, Ayaan Haque, and Jiaming Song. Terminal velocity matching, 2026

work page 2026

[26] [26]

Functional mean flow in hilbert space, 2025

Zhiqi Li, Yuchen Sun, Greg Turk, and Bo Zhu. Functional mean flow in hilbert space, 2025

work page 2025

[27] [27]

Monte Carlo methods in financial engineering , volume 53

Paul Glasserman. Monte Carlo methods in financial engineering , volume 53. Springer New York, NY, 2003

work page 2003

[28] [28]

Estimation with quadratic loss

William James, Charles Stein, et al. Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability , volume 1, pages 361--379. University of California Press, 1961

work page 1961

[29] [29]

Modular meanflow: Towards stable and scalable one-step generative modeling, 2025

Haochen You, Baojing Liu, and Hongyang He. Modular meanflow: Towards stable and scalable one-step generative modeling, 2025

work page 2025

[30] [30]

Understanding, accelerating, and improving meanflow training, 2025

Jin-Young Kim, Hyojun Go, Lea Bogensperger, Julius Erbach, Nikolai Kalischek, Federico Tombari, Konrad Schindler, and Dominik Narnhofer. Understanding, accelerating, and improving meanflow training, 2025

work page 2025

[31] [31]

Decoupled meanflow: Turning flow models into flow maps for accelerated sampling, 2025

Kyungmin Lee, Sihyun Yu, and Jinwoo Shin. Decoupled meanflow: Turning flow models into flow maps for accelerated sampling, 2025

work page 2025

[32] [32]

Stable velocity: A variance perspective on flow matching, 2026

Donglin Yang, Yongxing Zhang, Xin Yu, Liang Hou, Xin Tao, Pengfei Wan, Xiaojuan Qi, and Renjie Liao. Stable velocity: A variance perspective on flow matching, 2026

work page 2026

[33] [33]

Temporal pair consistency for variance-reduced flow matching, 2026

Chika Maduabuchi and Jindong Wang. Temporal pair consistency for variance-reduced flow matching, 2026

work page 2026

[34] [34]

Preconditioned score and flow matching, 2026

Shadab Ahamed, Eshed Gal, Simon Ghyselincks, Md Shahriar Rahim Siddiqui, Moshe Eliasof, and Eldad Haber. Preconditioned score and flow matching, 2026

work page 2026

[35] [35]

On the closed-form of flow matching: Generalization does not arise from target stochasticity, 2025

Quentin Bertrand, Anne Gagneux, Mathurin Massias, and R \'e mi Emonet. On the closed-form of flow matching: Generalization does not arise from target stochasticity, 2025

work page 2025

[36] [36]

Flow straight and fast: Learning to generate and transfer data with rectified flow, 2022

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow, 2022

work page 2022

[37] [37]

Improving the training of rectified flows

Sangyun Lee, Zinan Lin, and Giulia Fanti. Improving the training of rectified flows. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems , volume 37, pages 63082--63109. Curran Associates, Inc., 2024

work page 2024

[38] [38]

Glynn and Roberto Szechtman

Peter W. Glynn and Roberto Szechtman. Some new perspectives on the method of control variates. Monte Carlo and Quasi-Monte Carlo Methods 2000 , pages 27--49, 2002

work page 2000

[39] [39]

Human-level control through deep reinforcement learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, et al. Human-level control through deep reinforcement learning. Nature , 518:529--533, 2015

work page 2015

[40] [40]

Bootstrap your own latent: A new approach to self-supervised learning

Jean-Bastien Grill, Florian Strub, Florent Altch \'e , et al. Bootstrap your own latent: A new approach to self-supervised learning. In NeurIPS , 2020

work page 2020

[41] [41]

Polyak and Anatoli B

Boris T. Polyak and Anatoli B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization , 30(4):838--855, 1992

work page 1992

[42] [42]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages 4195--4205, 2023

work page 2023

[43] [43]

Riemannian meanflow, 2026

Dongyeop Woo, Marta Skreta, Seonghyun Park, Kirill Neklyudov, and Sungsoo Ahn. Riemannian meanflow, 2026

work page 2026

[44] [44]

One step diffusion via shortcut models, 2025

Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One step diffusion via shortcut models, 2025

work page 2025

[45] [45]

Optimal Transport: Old and New , volume 338

C \'e dric Villani et al. Optimal Transport: Old and New , volume 338. Springer Berlin, Heidelberg, 2009

work page 2009

[46] [46]

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised learning results

Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised learning results. In Advances in Neural Information Processing Systems , volume 30, 2017

work page 2017