arxiv: 2605.12805 · v1 · submitted 2026-05-12 · 💻 cs.LG · cs.AI

Recognition: unknown

Discrete MeanFlow: One-Step Generation via Conditional Transition Kernels

Fairoz Nower Khan , Nabuat Zaman Nahim , Md Sajid Ahmed , Ruiquan Huang , Peizhong Ju

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:24 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords Discrete MeanFlowone-step generationcontinuous-time Markov chaintransition kernelKolmogorov forward equationdiscrete dataflow matching

0 comments

The pith

Discrete MeanFlow proves an identity for conditional transition kernels that enables one-step generation in discrete state spaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops Discrete MeanFlow to bring one-step generation to discrete data by working with probability mass transport instead of point trajectories. It defines a mean discrete rate over a time interval for the conditional transition kernel of a continuous-time Markov chain. The authors establish an identity that connects this average rate to the instantaneous generator at the endpoint via the Kolmogorov forward equation. They then use this to parameterize the kernel directly in a way that automatically satisfies boundary conditions and produces valid probabilities. Generation then requires only a single model evaluation followed by sampling from the resulting categorical distribution.

Core claim

We prove a Discrete MeanFlow identity that relates the finite-interval mean discrete rate to the instantaneous CTMC generator at the endpoint, with the Kolmogorov forward equation replacing the spatial chain rule. Based on this, we parameterize the transition kernel directly using a boundary-by-construction design that guarantees valid probability outputs and exact boundary conditions without auxiliary losses, reducing generation to a single forward pass and one categorical draw.

What carries the argument

The conditional transition kernel of a continuous-time Markov chain (CTMC), which carries the average change in transition probabilities over a time interval.

If this is right

The learned kernel directly provides a valid probability distribution for sampling.
Generation requires no iterative steps, ODE solving, or denoising.
The approach recovers exact analytical solutions on finite-state Markov chains to high precision.
It applies to factorized synthetic sequence tasks across different alphabet sizes and lengths.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the parameterization holds for real discrete data like text tokens, it could replace multi-step diffusion models in discrete domains.
This identity might generalize to other Markov processes beyond the CTMC setup tested.
Hybrid models combining discrete kernels with continuous flows could handle mixed data types.

Load-bearing premise

The boundary-by-construction parameterization accurately captures the target data distribution's transition dynamics for complex, high-dimensional discrete data beyond the synthetic validation cases.

What would settle it

Training the model on a known finite-state Markov chain and checking whether the output kernel exactly matches the analytical transition probabilities derived from the chain's generator.

Figures

Figures reproduced from arXiv: 2605.12805 by Fairoz Nower Khan, Md Sajid Ahmed, Nabuat Zaman Nahim, Peizhong Ju, Ruiquan Huang.

**Figure 3.** Figure 3: Kernel error over the (r, t) grid. Each cell shows the maximum entrywise error maxx,y |Kθ − K| at a given (r, t) pair. Error is zero along the diagonal r = t and grows smoothly with interval length t−r. Maximum errors: 1.4×10−4 , 7.1×10−3 , and 1.3 × 10−2 . The most direct way to validate the Discrete MeanFlow identity is to test whether a model trained with the kernel-residual objective recovers the true … view at source ↗

**Figure 4.** Figure 4: Boundary treatment ablation on the 3-state ring. Boundary-by-construction (blue) achieves 3-10× lower error than an explicit boundary loss (red) across all four metrics and all three seeds, confirming that architectural enforcement of the boundary condition is superior to a loss penalty. Boundary treatment ablation. The boundary-by-construction parameterization is a key design choice. To justify it, we co… view at source ↗

**Figure 5.** Figure 5: Training target variance vs. t. Left: the kernelresidual target ut(y, xt) has variance that diverges as t → 1, while posterior regression remains bounded. Right: the target L2 norm follows the theoretical p |V|/(1 − t) scaling. Diagnosis: variance of the stochastic target. The kernelresidual (KR) loss trains ∂tKθ to match the instantaneous generator ut(y, xt) at a single sampled endpoint xt , which int… view at source ↗

**Figure 6.** Figure 6: shows the true kernel, learned kernel, and absolute error for all three CTMCs evaluated at (r, t) = (0, 1). In each case, the learned kernel is visually indistinguishable from the ground truth, and the error matrices are near zero. − !%"x − $$' $" #$ "%( − !%"x − " ( − !%"x −… view at source ↗

**Figure 7.** Figure 7: Training convergence for Stage I. Left column: kernel-residual loss over training steps. Right column: evaluation metrics (max kernel error, mean kernel error, column TV, generation TV) computed periodically during training. A.3 One-Step Generation Distributions [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: One-step generation on the 10-state birth-death chain. For each starting state x0, we compare the true conditional distribution (blue), the learned kernel Kθ(·, x0, 0, 1) (red), and the empirical distribution from 10,000 one-step samples (orange). A.4 Kernel-Residual vs. Posterior-Regression [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: provides a detailed visual comparison of the kernel-residual and posterior-regression training objectives across all six configurations. Token accuracy is comparable between the two methods, but posterior regression achieves consistently lower TV distance and cross-entropy, confirming that direct supervision produces better-calibrated distributions. i% i% i% i ( i ( i ( … view at source ↗

**Figure 10.** Figure 10: Multi-step generation comparison. Top row: kernel-residual. Bottom row: posterior-regression. Each panel shows average TV distance at 1, 2, 4, and 8 generation steps. A.6 Hybrid Loss Sweep We investigated whether combining the kernel-residual and cross-entropy losses could outperform either alone, using the hybrid objective L = LKR + λ · LCE with λ ∈ {0, 0.1, 1, 10, ∞} [PITH_FULL_IMAGE:figures/full_fig_p… view at source ↗

**Figure 11.** Figure 11: Hybrid loss sweep. Top row: average TV distance vs. λ for each configuration. Bottom row: token accuracy vs. λ. The best TV is consistently achieved at λ = ∞ (pure cross-entropy) [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

**Figure 12.** Figure 12: Multi-step generation from the hybrid sweep. Kernel-residual (blue) degrades with more steps; posterior regression (green) remains stable. A.7 Step-Count Comparison [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 13.** Figure 13: provides an additional view of generation quality as a function of the number of sampling steps across all configurations. i i i i! i! i! [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

**Figure 14.** Figure 14: The mean discrete rate. From source state x=1, the transition kernel Kr,t(·, x) spreads probability mass over time. The mean discrete rate u¯ summarizes this change: positive entries gain probability, negative entry at source state loses probability, and column sums to zero. B Experimental Details This appendix provides the full experimental setup needed to reproduce all results in the paper. B.1 Compute … view at source ↗

read the original abstract

MeanFlow enables one-step generation in continuous spaces by learning an average velocity over a time interval rather than the instantaneous velocity field of flow matching. However, discrete state spaces do not have smooth trajectories or spatial derivatives, so the continuous formulation does not directly apply. We introduce Discrete MeanFlow, which replaces the motion of a point with the transport of probability mass over finite states. Our key object is the conditional transition kernel of a continuous-time Markov chain (CTMC), from which we define a mean discrete rate that measures the average change in transition probability over a time interval. We prove a Discrete MeanFlow identity that relates this finite-interval rate to the instantaneous CTMC generator at the endpoint, with the Kolmogorov forward equation replacing the spatial chain rule of continuous MeanFlow. Based on this identity, we parameterize the transition kernel directly using a boundary-by-construction design that guarantees valid probability outputs and exact boundary conditions without auxiliary losses. Since the learned kernel is itself a probability distribution, generation reduces to a single forward pass followed by one categorical draw meaning no iterative denoising, ODE integration, or multi-step refinement is required. We validate the framework on exact finite-state Markov chains, where the learned kernel recovers the analytical ground truth to high precision, and on factorized synthetic sequence generation tasks with varying alphabet sizes and sequence lengths.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Discrete MeanFlow gives a clean identity for one-step discrete sampling via CTMC kernels, but the experiments stay too narrow to show it works on real data.

read the letter

The punchline is that this paper derives a Discrete MeanFlow identity for continuous-time Markov chains that links a finite-interval mean rate to the endpoint generator using the Kolmogorov forward equation, then uses a boundary-by-construction parameterization to learn valid transition kernels directly. Generation becomes one network pass plus a single categorical sample, with no ODE steps or iterative refinement. That is the actual new piece relative to continuous MeanFlow and standard discrete diffusion setups. On the positive side, the identity looks straightforward once you accept the probability-mass transport view, and the parameterization guarantees valid probabilities and exact boundaries without extra loss terms. The synthetic checks on small exact chains recover ground truth to high precision, which at least confirms the math closes on those cases. The soft spots are clear and proportionate: all validation is on factorized synthetic sequences or tiny state spaces, so there is no evidence yet that the learned kernels capture non-factorized dependencies or scale to high-dimensional discrete data like real sequences. The abstract gives no error analysis or scaling results, which leaves the practical claim open. This is worth a serious referee for groups working on fast discrete generative models, because the core construction is distinct and could be useful if the parameterization holds up. I would send it out for review rather than desk reject, with the expectation that the authors need to add non-factorized experiments and some analysis of approximation error.

Referee Report

2 major / 2 minor

Summary. The paper introduces Discrete MeanFlow for one-step generation in discrete state spaces by transporting probability mass via conditional transition kernels of continuous-time Markov chains (CTMCs). It defines a mean discrete rate over finite time intervals and proves an identity relating this rate to the instantaneous CTMC generator at the endpoint, with the Kolmogorov forward equation substituting for the spatial chain rule. A boundary-by-construction parameterization of the kernel is proposed to enforce valid probabilities and exact boundary conditions without auxiliary losses, reducing generation to a single forward pass plus one categorical sample. Validation shows exact recovery of ground truth on small finite-state chains and results on factorized synthetic sequences of varying lengths and alphabets.

Significance. If the identity and parameterization hold beyond the reported cases, the framework provides a principled one-step alternative to iterative discrete diffusion or autoregressive models, with the CTMC grounding and boundary-by-construction design as notable strengths that eliminate auxiliary losses and multi-step refinement. The exact recovery on synthetic chains supports the mathematical core, though broader impact depends on generalization to non-factorized high-dimensional discrete data.

major comments (2)

[§3] §3 (Discrete MeanFlow identity): The manuscript states that the identity follows from the Kolmogorov forward equation but provides no complete step-by-step derivation, explicit assumptions on the CTMC (e.g., time-homogeneity or finite state space), or error bounds for the finite-interval approximation, which is load-bearing for the claim that the learned kernel exactly matches the target law.
[§5] §5 (Experiments): Validation is confined to exact recovery on small finite-state Markov chains and factorized synthetic sequence tasks; no results are shown on high-dimensional discrete data exhibiting non-factorized dependencies, leaving untested whether the boundary-by-construction parameterization captures complex transition dynamics without auxiliary losses or refinement steps.

minor comments (2)

Notation for the mean discrete rate and conditional kernel is introduced without an explicit comparison table to continuous MeanFlow quantities, which would aid clarity.
[§5] The abstract claims 'high precision' recovery on exact chains, but no quantitative error metrics, sample sizes, or variance estimates appear in the experiment description.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive report and positive assessment of the mathematical contributions. We address each major comment below and indicate the planned revisions.

read point-by-point responses

Referee: [§3] §3 (Discrete MeanFlow identity): The manuscript states that the identity follows from the Kolmogorov forward equation but provides no complete step-by-step derivation, explicit assumptions on the CTMC (e.g., time-homogeneity or finite state space), or error bounds for the finite-interval approximation, which is load-bearing for the claim that the learned kernel exactly matches the target law.

Authors: We agree that a complete derivation will strengthen the presentation. In the revised manuscript we will insert a self-contained step-by-step derivation of the Discrete MeanFlow identity directly from the Kolmogorov forward equation. We will explicitly list the standing assumptions (finite discrete state space and time-homogeneous CTMC) and clarify that the identity is exact for any finite interval under these dynamics; the finite-interval mean rate is not an approximation but an exact integral relation. A short paragraph discussing the limiting behavior as the interval length approaches zero will also be added. These changes will be incorporated without altering any claims or results. revision: yes
Referee: [§5] §5 (Experiments): Validation is confined to exact recovery on small finite-state Markov chains and factorized synthetic sequence tasks; no results are shown on high-dimensional discrete data exhibiting non-factorized dependencies, leaving untested whether the boundary-by-construction parameterization captures complex transition dynamics without auxiliary losses or refinement steps.

Authors: We acknowledge that the current experiments are limited to controlled settings where ground-truth kernels are analytically available. These experiments were chosen to provide rigorous verification of the identity and the boundary-by-construction parameterization. The parameterization itself imposes no factorization assumption and guarantees valid probabilities for arbitrary discrete state spaces by construction. Demonstrating performance on high-dimensional non-factorized data would require new large-scale experiments that are outside the scope of the present work; we plan to pursue such evaluations in follow-up research. revision: no

standing simulated objections not resolved

Absence of experimental results on high-dimensional discrete data with non-factorized dependencies

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard Kolmogorov forward equation

full rationale

The Discrete MeanFlow identity is obtained by direct substitution of the Kolmogorov forward equation into the definition of the finite-interval mean rate, with no fitted parameters or self-referential quantities introduced. The boundary-by-construction kernel parameterization is a structural design that enforces probability simplex membership and endpoint conditions by algebraic construction rather than by optimization; the learned parameters themselves are still determined by an external data-matching objective. No load-bearing step reduces to a prior self-citation, ansatz smuggled via citation, or renaming of an empirical pattern. The framework is therefore self-contained against external mathematical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Relies on standard CTMC theory; no free parameters, invented entities, or ad-hoc axioms are described in the abstract.

axioms (1)

standard math Kolmogorov forward equation governs probability evolution in CTMCs
Invoked to connect finite-interval mean rate to instantaneous generator at endpoint.

pith-pipeline@v0.9.0 · 5546 in / 1170 out tokens · 44083 ms · 2026-05-14T20:24:35.058919+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 3 internal anchors

[1]

Generative ﬂows on discrete state-spaces: Enabling multimodal ﬂows with appl ications to protein co-design

Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth , and Tommi Jaakkola. Generative ﬂows on discrete state-spaces: Enabling multimodal ﬂows with appl ications to protein co-design. In ICLR 2024 Workshop on Generative and Experimental Perspectives for Biom olecular Design ,

work page 2024
[2]

One-step ﬂow policy mirror descent

Tianyi Chen, Haitong Ma, Na Li, Kai Wang, and Bo Dai. One-step ﬂow policy mirror descent. arXiv preprint arXiv:2507.23675,

work page arXiv
[3]

Improved Mean Flows: On the Challenges of Fastforward Generative Models

Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean ﬂows for one-step generative modeling. In The Thirty-ninth Annual Conference on Neural Information Proc essing Systems , 2025a. Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J Zico K olter, and Kaiming He. Improved mean ﬂows: On the challenges of fastforward generative...

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Flow Matching for Offline Reinforcement Learning with Discrete Actions

Fairoz Nower Khan, Nabuat Zaman Nahim, Ruiquan Huang, Haibo Yang, and Peizhong Ju. Flow matching for oﬄine reinforcement learning with discrete actions. arXiv preprint arXiv:2602.06138 ,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Meanaudio: Fast and faithful text-to-audio generation with mean ﬂows

Xiquan Li, Junxi Liu, Yuzhe Liang, Zhikang Niu, Wenxi Chen, a nd Xie Chen. Meanaudio: Fast and faithful text-to-audio generation with mean ﬂows. arXiv preprint arXiv:2508.06098 ,

work page arXiv
[6]

Flow Matching Guide and Code

Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul , Matt Le, Brian Karrer, Ricky TQ Chen, David Lopez-Paz, Heli Ben-Hamu, and Itai Gat. Flow matching guide and code. arXiv preprint arXiv:2412.06264,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Rectified flow: A marginal preserving approach to optimal transport.ArXiv, abs/2209.14577, 2022

Qiang Liu. Rectiﬁed ﬂow: A marginal preserving approach to o ptimal transport. arXiv preprint arXiv:2209.14577,

work page arXiv
[8]

Alphaﬂow: Understanding and improvi ng meanﬂow models

Huijie Zhang, Aliaksandr Siarohin, Willi Menapace, Michae l Vasilkovsky, Sergey Tulyakov, Qing Qu, and Ivan Skorokhodov. Alphaﬂow: Understanding and improvi ng meanﬂow models. arXiv preprint arXiv:2510.20771,

work page arXiv
[9]

Each row shows the true kernel (left), learned kernel (middle), and absolute error (right)

Figure 6: Kernel heatmaps for all three CTMCs at (r, t ) = (0 , 1). Each row shows the true kernel (left), learned kernel (middle), and absolute error (right). Top: 2-state symmetric. Middle: 3-state ring. Bottom: 10-state birth–death. A.2 Training Convergence Figure 7 shows the training loss and evaluation metrics over the course of training for all thre...

work page 2000
[10]

A.4 Kernel-Residual vs

(red), and the empirical distribution from 10,000 one-step samples (o range). A.4 Kernel-Residual vs. Posterior-Regression Figure 9 provides a detailed visual comparison of the kernel -residual and posterior-regression training objec- tives across all six conﬁgurations. Token accuracy is compa rable between the two methods, but posterior regression achiev...

work page 2021
[11]

Each seed controls both the model initializatio n and the random sampling of training data and time pairs

are reported as avera ges over three independent random seeds (42, 123, 2024). Each seed controls both the model initializatio n and the random sampling of training data and time pairs. For Table 1, the standard deviation across seeds is small rel ative to the gap between methods. For example, on the independent (8,

work page 2024
[12]

The learning rate, batch size, and model dimensions were set once on a single pilot run and held ﬁxed fo r all experiments and seeds

20,000 Random seeds 42, 123, 2024 42, 123, 2024 Evaluation samples 5,000 5,000 No hyperparameter tuning was performed across conﬁguratio ns. The learning rate, batch size, and model dimensions were set once on a single pilot run and held ﬁxed fo r all experiments and seeds. The only value that changes between exact kernel recovery conﬁgurations i s the α ...

work page 2024