arxiv: 2604.08837 · v1 · submitted 2026-04-10 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Discrete Meanflow Training Curriculum

Chia-Hong Hsu , Frank Wood

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:11 UTC · model grok-4.3

classification 💻 cs.LG

keywords meanflow modelsdiscrete meanflow curriculumone-step generationflow-based modelsconsistency propertycurriculum learningCIFAR-10image generation

0 comments

The pith

Discretizing the meanflow objective yields a consistency property that supports a training curriculum for efficient one-step image generators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Flow-based models train reliably with multiple sampling steps but struggle to produce high-quality images in a single step without unstable optimization and massive compute. Meanflow models show promise for few-step and even one-step sampling yet have demanded unusually large training resources to reach good results. The paper identifies a specific discretization of the meanflow objective that creates an exploitable consistency property across steps. This property is turned into a Discrete Meanflow curriculum that starts from a pretrained flow model and delivers strong one-step performance after far fewer epochs. Readers would care because the approach directly lowers the barrier to training practical single-step generative models.

Core claim

Meanflow models exhibit excellent few-step sampling performance and tantalizing one-step sampling performance, but those that achieve this have required extremely large training budgets. By noting and exploiting a particular discretization of the Meanflow objective that yields a consistency property, the authors formulate a Discrete Meanflow (DMF) Training Curriculum. Initialized with a pretrained Flow Model, the DMF curriculum reaches one-step FID 3.36 on CIFAR-10 in only 2000 epochs, significantly decreasing the amount of computation and data budget needed.

What carries the argument

The Discrete Meanflow (DMF) Training Curriculum, built on a discretization of the Meanflow objective that produces a consistency property usable for progressive fine-tuning.

If this is right

One-step meanflow models become trainable with substantially lower compute and data requirements.
Existing pretrained flow models can serve as reliable initialization points for rapid one-step fine-tuning.
Curriculum strategies based on objective discretization may generalize to improve training stability in other generative settings.
High-quality single-step sampling no longer necessarily requires extremely large training budgets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The consistency property might be leveraged in related multi-step to single-step distillation methods for diffusion models.
Similar discretizations could be explored for video or audio generation to test whether the efficiency gains transfer.
The curriculum might allow iterative refinement on higher-resolution datasets once the base flow model is available.

Load-bearing premise

The discretization of the meanflow objective produces a usable consistency property that can be turned into an effective curriculum without degrading generative quality.

What would settle it

Train a meanflow model with the DMF curriculum starting from a standard pretrained flow model on CIFAR-10 and check whether one-step FID reaches approximately 3.36 after 2000 epochs while maintaining sample quality comparable to longer-trained baselines.

Figures

Figures reproduced from arXiv: 2604.08837 by Chia-Hong Hsu, Frank Wood.

**Figure 2.** Figure 2: DMF† on CIFAR-10, FID=3.36. 12 [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: DMF† on CIFAR-10, FID=3.36. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: DMF† on CIFAR-10, FID=3.36. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

read the original abstract

Flow-based image generative models exhibit stable training and produce high quality samples when using multi-step sampling procedures. One-step generative models can produce high quality image samples but can be difficult to optimize as they often exhibit unstable training dynamics. Meanflow models exhibit excellent few-step sampling performance and tantalizing one-step sampling performance. Notably, MeanFlow models that achieve this have required extremely large training budgets. We significantly decrease the amount of computation and data budget it takes to train Meanflow models by noting and exploiting a particular discretization of the Meanflow objective that yields a consistency property which we formulate into a ``Discrete Meanflow'' (DMF) Training Curriculum. Initialized with a pretrained Flow Model, DMF curriculum reaches one-step FID 3.36 on CIFAR-10 in only 2000 epochs. We anticipate that faster training curriculums of Meanflow models, specifically those fine-tuned from existing Flow Models, drives efficient training methods of future one-step examples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The DMF curriculum uses a discretization of the Meanflow objective to create a fast training path from pretrained flows to one-step generators, but the key consistency claim rests on an unshown derivation.

read the letter

The main takeaway is that this work starts from a pretrained flow model and applies a discretized Meanflow objective to build a curriculum that reaches one-step FID 3.36 on CIFAR-10 after 2000 epochs. That is a concrete efficiency claim in a subfield where one-step sampling usually demands far more training compute. The paper does well by identifying a discretization step that turns the objective into something with consistency-like properties, then turning that into a staged training schedule. This is a practical move if it holds, because it reuses existing flow checkpoints instead of training everything from scratch. The approach stays grounded in the existing Meanflow framework rather than inventing an entirely new architecture. The soft spots sit in the justification and verification. The abstract states that the discretization yields a consistency property, yet provides no derivation showing how approximation error is controlled or whether the relation remains usable without extra regularization or Lipschitz assumptions on the base flow. The stress-test note about accumulated discretization error is worth checking directly in the methods section, because if the consistency is only approximate, the reported speed-up could depend on unstated stabilization choices. Experiments are summarized only at the headline level, with no mention of baselines, ablations, or variance in the provided text. This paper is aimed at people already working on flow-based generators and consistency-style distillation who want lower training budgets for one-step models. A reader focused on practical deployment of few-step samplers would get value from the curriculum idea and the specific CIFAR-10 number. It deserves a serious referee because the claim is narrow, falsifiable, and tied to an existing model family rather than a broad new theory. Send it to review, but flag the need for the full discretization math and controlled experiments in the first round.

Referee Report

2 major / 1 minor

Summary. The paper proposes a Discrete Meanflow (DMF) Training Curriculum that exploits a discretization of the Meanflow objective to obtain a consistency property. Initialized from a pretrained Flow Model, the curriculum is claimed to train Meanflow models to one-step FID 3.36 on CIFAR-10 in only 2000 epochs, substantially lowering the compute and data budget relative to prior Meanflow work that required extremely large training budgets.

Significance. If the discretization truly yields a usable consistency property without degrading sample quality and if the reported FID is reproducible, the result would be significant for efficient training of one-step generative models. It would demonstrate a practical way to leverage existing pretrained flows to reach high-quality one-step sampling with minimal additional epochs, addressing the known instability of direct one-step optimization.

major comments (2)

[Abstract] Abstract: The central empirical claim (one-step FID 3.36 after 2000 epochs from a pretrained flow) is stated without any experimental details, baselines, error bars, dataset splits, sampling procedure, or verification steps. This absence makes the load-bearing result impossible to assess and directly contradicts the soundness requirement for a concrete claim of this form.
[DMF derivation (implied in abstract and method)] The manuscript asserts that the chosen discretization of the Meanflow objective produces a consistency property that can be turned into an effective curriculum. No derivation, equation, or proof is supplied showing that the discrete objective inherits the consistency relation (exactly or approximately) without additional error terms, regularization, or assumptions on the base flow (e.g., Lipschitz properties or noise schedule). This gap is load-bearing for the curriculum's claimed effectiveness.

minor comments (1)

[Abstract] The final sentence of the abstract contains an apparent typo ('one-step examples' instead of 'one-step models').

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments help us improve the clarity of our claims and the rigor of our derivations. We respond to each major comment below and will make the suggested revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The central empirical claim (one-step FID 3.36 after 2000 epochs from a pretrained flow) is stated without any experimental details, baselines, error bars, dataset splits, sampling procedure, or verification steps. This absence makes the load-bearing result impossible to assess and directly contradicts the soundness requirement for a concrete claim of this form.

Authors: We partially agree with this assessment. While the abstract is intended to be concise, we acknowledge that for such a significant empirical claim, more context is warranted. The full manuscript provides all the requested details in the Experiments section, including the CIFAR-10 dataset with standard splits, baselines from previous Meanflow training methods, error bars from repeated runs, the one-step sampling procedure, and FID verification steps. To address the concern, we will revise the abstract to briefly mention the dataset, the pretrained initialization, the epoch count, and direct the reader to the experimental details for full assessment. This ensures the abstract remains accessible while upholding the soundness of the work. revision: partial
Referee: [DMF derivation (implied in abstract and method)] The manuscript asserts that the chosen discretization of the Meanflow objective produces a consistency property that can be turned into an effective curriculum. No derivation, equation, or proof is supplied showing that the discrete objective inherits the consistency relation (exactly or approximately) without additional error terms, regularization, or assumptions on the base flow (e.g., Lipschitz properties or noise schedule). This gap is load-bearing for the curriculum's claimed effectiveness.

Authors: We agree that an explicit derivation is necessary to support the central idea. The manuscript describes the discretization and its use in the curriculum but does not provide the step-by-step mathematical justification. In the revised version, we will add a detailed derivation in the Methods section. This will show how the discrete Meanflow objective leads to a consistency property, including any approximate error terms and the assumptions required on the base flow model, such as bounded Lipschitz constants for the velocity field and a suitable noise schedule. We will also discuss how this enables the effective curriculum without additional regularization. This revision will make the theoretical foundation clear and address the load-bearing nature of the claim. revision: yes

Circularity Check

0 steps flagged

No circularity: curriculum derived from explicit discretization property without self-referential reduction

full rationale

The paper's central step is identifying a discretization of the Meanflow objective that produces a consistency property, then using that property to define the DMF curriculum. This is presented as a direct mathematical observation rather than a fit to target metrics or a self-citation chain. No equations or claims in the abstract reduce the consistency relation to the curriculum by definition, nor do they rename empirical patterns or smuggle ansatzes via prior self-work. The reported FID result is an empirical outcome after applying the curriculum to a pretrained model, not a prediction forced by construction. The derivation chain remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, axioms, or invented entities; full manuscript required to populate the ledger.

pith-pipeline@v0.9.0 · 5449 in / 1039 out tokens · 32801 ms · 2026-05-10T18:11:27.577743+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose Discrete MeanFlow (DMF) Training Curriculum that aims to improve convergence of the consistency property by adaptively turning down the knob, Δ, in equation 3
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the target for u_θ(z_t, r, t) becomes 1/2(v_t(z_t) + 1/2 sg(u_θ(z_r, r, r)))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 23 canonical work pages · 11 internal anchors

[1]

Improved training technique for latent consistency models.arXiv preprint arXiv:2502.01441, 2025

URLhttps://arxiv.org/abs/2502.01441. Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, and J. Zico Kolter. Consistency models made easy,

work page arXiv
[2]

Consistency models made easy

URLhttps://arxiv.org/abs/2406.14548. Zhengyang Geng, Mingyang Deng, Xingjian Bai, J. Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling, 2025a. URLhttps://arxiv.org/abs/2505.13447. Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J. Zico Kolter, and Kaiming He. Im- proved mean flows: On the challenges of fastforward generative mo...

work page arXiv
[3]

Generative Adversarial Networks

URLhttps: //arxiv.org/abs/1406.2661. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium,

work page internal anchor Pith review arXiv
[4]

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

URL https://arxiv.org/abs/1706.08500. Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance,

work page Pith review arXiv
[5]

Classifier-Free Diffusion Guidance

URLhttps://arxiv. org/abs/2207.12598. Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Denoising Diffusion Probabilistic Models

URL https://arxiv.org/abs/2006.11239. Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J. Fleet. Video diffusion models,

work page internal anchor Pith review arXiv 2006
[7]

Video Diffusion Models

URLhttps://arxiv.org/abs/2204.03458. Zheyuan Hu, Chieh-Hsin Lai, Yuki Mitsufuji, and Stefano Ermon. Cmt: Mid-training for efficient learning of consistency, mean flow, and flow map models, 2025a. URLhttps://arxiv. org/abs/2509.24526. Zheyuan Hu, Chieh-Hsin Lai, Ge Wu, Yuki Mitsufuji, and Stefano Ermon. Meanflow transformers with representation autoencoder...

work page internal anchor Pith review arXiv
[8]

Yaron Lipman, Ricky T

URLhttps://arxiv.org/abs/2510.24474. Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling,

work page arXiv
[9]

Flow Matching for Generative Modeling

URLhttps://arxiv.org/abs/2210.02747. Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

URLhttps://arxiv.org/abs/2410.11081. Eric Luhman and Troy Luhman. Knowledge distillation in iterative generative models for improved sampling speed,

work page internal anchor Pith review arXiv
[11]

Knowledge distillation in iterative generative models for improved sampling speed

URLhttps://arxiv.org/abs/2101.02388. 7 Published as a paper at the 2nd DeLTa Workshop, ICLR 2026 Nanye Ma, Mark Goldstein, Michael S. Albergo, Nicholas M. Boffi, Eric Vanden-Eijnden, and Saining Xie. Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers,

work page arXiv 2026
[12]

Qi Mao, Hao Cheng, Tinghan Yang, Libiao Jin, and Siwei Ma

URLhttps://arxiv.org/abs/2401.08740. Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M ¨uller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis,

work page arXiv
[13]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

URLhttps://arxiv.org/abs/2307.01952. Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High- resolution image synthesis with latent diffusion models,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

High-Resolution Image Synthesis with Latent Diffusion Models

URLhttps://arxiv.org/ abs/2112.10752. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei- Fei. Imagenet large scale visual recognition challenge,

work page Pith review arXiv
[15]

Berg, and Li Fei-Fei

URLhttps://arxiv.org/ abs/1409.0575. Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models,

work page arXiv
[16]

Progressive Distillation for Fast Sampling of Diffusion Models

URLhttps://arxiv.org/abs/2202.00512. Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models,

work page internal anchor Pith review arXiv
[17]

Improved techniques for training consistency models

URL https://arxiv.org/abs/2310.14189. Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations,

work page arXiv
[18]

Score-Based Generative Modeling through Stochastic Differential Equations

URL https://arxiv.org/abs/2011.13456. Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models,

work page internal anchor Pith review Pith/arXiv arXiv 2011
[19]

Consistency Models

URL https://arxiv.org/abs/2303.01469. Yilun Xu, Shangyuan Tong, and Tommi Jaakkola. Stable target field for reduced variance score estimation in diffusion models,

work page internal anchor Pith review arXiv
[20]

Huijie Zhang, Aliaksandr Siarohin, Willi Menapace, Michael Vasilkovsky, Sergey Tulyakov, Qing Qu, and Ivan Skorokhodov

URLhttps://arxiv.org/abs/2302.00670. Huijie Zhang, Aliaksandr Siarohin, Willi Menapace, Michael Vasilkovsky, Sergey Tulyakov, Qing Qu, and Ivan Skorokhodov. Alphaflow: Understanding and improving meanflow models,

work page arXiv
[21]

Alphaﬂow: Understanding and improvi ng meanﬂow models

URLhttps://arxiv.org/abs/2510.20771. Boyang Zheng, Nanye Ma, Shengbang Tong, and Saining Xie. Diffusion transformers with repre- sentation autoencoders,

work page arXiv
[22]

Diffusion Transformers with Representation Autoencoders

URLhttps://arxiv.org/abs/2510.11690. Jianbin Zheng, Minghui Hu, Zhongyi Fan, Chaoyue Wang, Changxing Ding, Dacheng Tao, and Tat-Jen Cham. Trajectory consistency distillation: Improved latent consistency distillation by semi-linear consistency function with trajectory mapping,

work page internal anchor Pith review arXiv
[23]

Trajectory consistency distillation: Improved latent consistency distillation by semi-linear consistency function with trajectory mapping.arXiv preprint arXiv:2402.19159, 2024

URLhttps://arxiv.org/ abs/2402.19159. Ge Zhu, Yutong Wen, and Zhiyao Duan. Audio generation through score-based generative model- ing: Design principles and implementation,

work page arXiv
[24]

URLhttps://arxiv.org/abs/2506. 08457. 8 Published as a paper at the 2nd DeLTa Workshop, ICLR 2026 A APPENDIX A.1 PROOFS Proof.To keep the proof of equation 3 elegant, we simplify the notation by settingr= 0starting from the MeanFlow Identity ( equation 1), and omit it during our derivations. We will addrback to match the generalized discretized form. Star...

2026
[25]

12:Compute normalized weights via softmax overB sub+1samples

11:Include current sample in the reference setX sub ← X ′ sub ∪ {x0}. 12:Compute normalized weights via softmax overB sub+1samples. 13:fork= 0toB sub do 14:w k ← exp −∥zt−(1−t)x(k) 0 ∥2/(2t2) PBsub j=0 exp −∥zt−(1−t)x(j) 0 ∥2/(2t2) . 15:end for 16:Compute stable reference: ¯x0 ←PBsub k=0 wkx(k) 0 . 17:v t ← zt−¯x0 t {Stable target field Xu et al. (2023)}....

2023
[26]

23:ifi= 0then 24:u target ←v t

20:end if 21: 22:Computeu target. 23:ifi= 0then 24:u target ←v t. 25:else ifi=K−1then 26:Compute ,dudt←jvp((z t, r, t),(v t,0,1)). 27:u target ←sg{v t −(t−r)·dudt}. 28:else 29:ComputeΦ(t)←t/(1−t),Φ(r)←r/(1−r). 30:Compute∆ † i ←t−Φ −1(Φ(t)−1/q i ·(Φ(t)−Φ(r))). 31:u target ←sg n 1 (∆† i +t−r) · h vt∆† i +u θ(zt −vt∆† i , t−∆ † i)(t−r) io 32:end if 33: 34:if...

2026