arxiv: 2605.09291 · v1 · submitted 2026-05-10 · 💻 cs.LG · stat.AP

Recognition: 2 theorem links

· Lean Theorem

dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models

Zhengyan Wan , Yidong Ouyang , Panwen Hu , Qiang Sun

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:46 UTC · model grok-4.3

classification 💻 cs.LG stat.AP

keywords discrete flow modelsreinforcement learningpolicy optimizationtext-to-image generationmultimodal modelsMarkov decision processtrajectory probabilitydiscrete diffusion

0 comments

The pith

By deriving full trajectory probabilities and modeling denoising as an MDP, dFlowGRPO enables rate-aware policy optimization for general discrete flow models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops dFlowGRPO, a reinforcement learning framework that extends optimization techniques to discrete flow models for generating discrete data such as images. The authors calculate probabilities over complete generation sequences and treat the sequence of denoising steps as decisions in a Markov process. This lets the optimization draw on both transition rates and posterior estimates from the model itself. A sympathetic reader would care because it removes the restriction to masked diffusion language models and opens discrete flows to reward-driven training that can match continuous flow performance on generation tasks.

Core claim

The central claim is that discrete flow models admit an explicit full trajectory probability, and that casting the denoising steps as a Markov decision process allows the policy gradient to incorporate conditional transition rates together with the posterior model; the resulting dFlowGRPO therefore supports reinforcement learning across arbitrary probability paths and non-masked source distributions, as shown by its application to the FUDOKI multimodal model on image generation and understanding benchmarks.

What carries the argument

The derivation of the full trajectory probability for discrete flow models combined with the Markov decision process formulation of the denoising sequence, which together let the optimizer use rate and posterior information.

If this is right

dFlowGRPO outperforms existing GRPO-type methods for dLLMs on text-to-image generation tasks.
It reaches performance competitive with continuous flow-based models trained using FlowGRPO.
The same training run also yields strong results on multimodal understanding tasks.
The framework works for a broad family of probability paths and non-masked source distributions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The MDP view of denoising could be reused to add reward signals in other discrete generative settings such as structured text or graph generation.
Explicit use of transition rates may allow hybrid training pipelines that move parameters between discrete and continuous flow models.
If the rate-aware updates remain stable, the approach offers a route to fine-tune multimodal models for both generation quality and reasoning accuracy within one framework.

Load-bearing premise

The full trajectory probability derivation and MDP formulation must hold accurately for arbitrary probability paths and non-masked source distributions without introducing biases or instabilities in the resulting policy updates.

What would settle it

An experiment in which dFlowGRPO applied to a non-masked discrete flow model produces training instability or inferior text-to-image performance compared with standard GRPO baselines would falsify the claim of broad applicability.

read the original abstract

Discrete flow models (DFMs) are a class of flexible generative models for generating discrete data, and diffusion large language models (dLLMs) can be viewed as a special case with a specific choice of mixture path and a masked source distribution. While several recent works have explored reinforcement learning into dLLMs, its application to more general discrete flow models remains underexplored. In this work, we present discrete Flow-GRPO (dFlowGRPO), a unified reinforcement learning framework for discrete flow models that supports a broad family of probability paths and non-masked source distributions. We derive the full trajectory probability for DFMs and formulate denoising as a Markov decision process, enabling dFlowGRPO to incorporate information from both the associated conditional transition rates and the posterior model during reinforcement learning. We apply dFlowGRPO to FUDOKI, a recent multimodal discrete flow model, and evaluate it on both image generation and multimodal understanding tasks. Empirical results show that dFlowGRPO outperforms existing GRPO-type methods for dLLMs on text-to-image generation tasks and achieves performance competitive with continuous flow-based models trained using FlowGRPO, while also demonstrating strong capabilities on understanding tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

dFlowGRPO extends GRPO to general DFMs via trajectory probabilities and an MDP that uses rates and posteriors, but the non-masked and broad-path claims rest on untested ground and thin experiments.

read the letter

dFlowGRPO tries to bring rate-aware policy optimization to general discrete flow models, and the results on FUDOKI look decent for image generation, but the generalization to non-masked sources is not really tested yet. What stands out is the attempt to derive the full trajectory probability for DFMs and cast denoising as an MDP that uses both transition rates and the posterior. This moves past the dLLM-specific GRPO setups and claims support for a wider set of probability paths. The experiments show gains over dLLM GRPO methods on text-to-image and parity with continuous FlowGRPO, which is a reasonable benchmark. The soft spots are clear from the abstract. All the reported results come from FUDOKI, and if that model relies on masking like many dLLMs, then the non-masked case they advertise stays unexamined. The derivation is described but without the actual steps or checks for bias, it's impossible to know if the policy gradient stays unbiased across arbitrary paths. No mention of training instabilities or how they handle different source distributions. This paper is for people already working on RL fine-tuning of flow-based generators, especially in multimodal settings. A reader who wants to see how to adapt GRPO-style methods to discrete flows would find the framework useful as a starting point, though they would need to verify the math themselves. I recommend sending it to peer review. The topic is timely and the framing is honest about extending beyond dLLMs, so referees could help tighten the claims and ask for broader experiments.

Referee Report

2 major / 2 minor

Summary. The paper introduces dFlowGRPO, a unified RL framework for discrete flow models (DFMs) that extends beyond masked dLLMs. It derives the full trajectory probability for DFMs, formulates denoising as an MDP incorporating conditional transition rates and posteriors, and enables rate-aware policy optimization for a broad family of probability paths and non-masked sources. Applied to FUDOKI on text-to-image generation and multimodal understanding tasks, it reports outperformance over GRPO-type dLLM methods and competitiveness with continuous FlowGRPO models.

Significance. If the derivation is general and the MDP yields unbiased gradients without path-specific assumptions, this would meaningfully broaden RL fine-tuning for discrete generative models, moving beyond the masked-mixture restriction of dLLMs. The reported empirical gains on image generation and understanding tasks suggest practical value, especially if the method proves stable across non-masked sources.

major comments (2)

[Methods (trajectory probability derivation and MDP formulation)] The central claim that the derived trajectory probability and MDP formulation support rate-aware policy optimization for arbitrary probability paths and non-masked source distributions (Abstract) is load-bearing. The skeptic concern is valid: discrete flows define transitions via path-dependent conditional rates, and if the derivation relies on identities that hold only under masking or specific mixtures, the resulting policy gradient would be biased for general DFMs. Please provide the explicit steps (e.g., in the Methods derivation) showing how the full trajectory probability remains unbiased without masking assumptions, or demonstrate via a counterexample that it does not.
[Experiments] Empirical support is limited to FUDOKI (a masked-source model). The Abstract claims broad applicability to non-masked sources, yet no experiments or ablation on non-masked DFMs are reported. This leaves the generality of the rate-aware optimization untested and weakens the claim that dFlowGRPO outperforms GRPO-type methods across the stated family of models.

minor comments (2)

[Experiments] Clarify the exact datasets, metrics, and baselines used for the text-to-image and understanding tasks to allow direct comparison with prior GRPO and FlowGRPO results.
Ensure all notation for conditional rates, posteriors, and trajectory probabilities is defined before first use and is consistent between the derivation and the algorithm pseudocode.

Simulated Author's Rebuttal

2 responses · 1 unresolved

Thank you for the constructive feedback on our paper. We address the major comments below and have made revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Methods (trajectory probability derivation and MDP formulation)] The central claim that the derived trajectory probability and MDP formulation support rate-aware policy optimization for arbitrary probability paths and non-masked source distributions (Abstract) is load-bearing. The skeptic concern is valid: discrete flows define transitions via path-dependent conditional rates, and if the derivation relies on identities that hold only under masking or specific mixtures, the resulting policy gradient would be biased for general DFMs. Please provide the explicit steps (e.g., in the Methods derivation) showing how the full trajectory probability remains unbiased without masking assumptions, or demonstrate via a counterexample that it does not.

Authors: We appreciate the referee's careful scrutiny of the derivation. The trajectory probability for DFMs is derived from the general continuous-time Markov chain formulation, where the probability of a trajectory is the product of the instantaneous transition rates λ_t(x_{t+1} | x_t) integrated over the path, combined with the posterior probabilities from the model. This does not rely on masking-specific identities; the masking is a special case where the rate matrix has a particular structure (e.g., absorbing to mask token). The MDP is defined with states as the current discrete state, actions as the next state, and rewards incorporating the rate and posterior. The policy gradient is unbiased because it follows the standard REINFORCE or GRPO estimator applied to this general MDP. To make this explicit, we have added a detailed step-by-step derivation in the revised Methods section (Section 3.2), showing the expansion from the flow matching objective to the trajectory log-probability without any masking assumptions. We believe this addresses the concern and confirms generality. revision: yes
Referee: [Experiments] Empirical support is limited to FUDOKI (a masked-source model). The Abstract claims broad applicability to non-masked sources, yet no experiments or ablation on non-masked DFMs are reported. This leaves the generality of the rate-aware optimization untested and weakens the claim that dFlowGRPO outperforms GRPO-type methods across the stated family of models.

Authors: We agree that our experiments focus on FUDOKI, which uses a masked source distribution, as it is a state-of-the-art multimodal DFM. While this validates the method on a practical model, we acknowledge that direct empirical comparison on non-masked DFMs would provide stronger evidence for the broad applicability. Implementing and training non-masked variants requires significant additional resources and model development, which was beyond the scope of this work. In the revised manuscript, we have added a discussion in the Experiments and Conclusion sections clarifying the scope of the empirical results and emphasizing that the theoretical framework applies generally, with FUDOKI serving as a representative case. We also suggest directions for future work on non-masked sources. revision: partial

standing simulated objections not resolved

Empirical evaluation on non-masked discrete flow models

Circularity Check

0 steps flagged

Derivation of trajectory probability and MDP formulation is self-contained

full rationale

The paper states that it derives the full trajectory probability for DFMs and formulates denoising as a Markov decision process directly from the conditional transition rates and posterior model structure. No equations or steps in the provided abstract or description reduce this derivation to fitted parameters, self-definitions, or load-bearing self-citations by construction. The central claims about supporting a broad family of probability paths are presented as following from the derivation rather than presupposing the target results. Empirical evaluations on FUDOKI are downstream applications and do not feed back into the derivation. This is the standard case of a self-contained derivation without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract; the paper likely relies on standard flow matching and RL assumptions but specifics on free parameters or invented entities are not detailed.

axioms (1)

domain assumption Denoising in DFMs can be formulated as a Markov decision process using full trajectory probabilities derived from conditional transition rates and posterior models.
Invoked to enable incorporation of rate information during reinforcement learning.

pith-pipeline@v0.9.0 · 5516 in / 1260 out tokens · 34374 ms · 2026-05-12T03:46:21.640831+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We derive the full trajectory probability for DFMs and formulate denoising as a Markov decision process, enabling dFlowGRPO to incorporate information from both the associated conditional transition rates and the posterior model
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the transition probability ratio ... is a product of dimension-wise expected posterior ratios, each reweighted by a rate-dependent term

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

167 extracted references · 167 canonical work pages · 19 internal anchors

[1]

International Conference on Learning Representations , year =

Flow matching for generative modeling , author =. International Conference on Learning Representations , year =

work page
[2]

Applebaum, David , year =. L

work page
[3]

arXiv preprint arXiv:2407.18163 , volume=

Statistical optimal transport , author =. arXiv preprint arXiv:2407.18163 , year =

work page arXiv
[4]

International Conference on Learning Representations , year =

Score-based generative modeling through stochastic differential equations , author =. International Conference on Learning Representations , year =

work page
[5]

Advances in Neural Information Processing Systems , volume =

Discrete flow matching , author =. Advances in Neural Information Processing Systems , volume =

work page
[6]

Advances in Neural Information Processing Systems , volume =

Denoising diffusion probabilistic models , author =. Advances in Neural Information Processing Systems , volume =

work page
[7]

1998 , publisher =

Markov chains , author =. 1998 , publisher =

work page 1998
[8]

International Conference on Machine Learning , year =

Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design , author =. International Conference on Machine Learning , year =

work page
[9]

Diffusion models beat

Dhariwal, Prafulla and Nichol, Alexander , booktitle =. Diffusion models beat

work page
[10]

NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications , year =

Classifier-free diffusion guidance , author =. NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications , year =

work page 2021
[11]

Consistency Models

Consistency models , author =. arXiv preprint arXiv:2303.01469 , year =

work page internal anchor Pith review arXiv
[12]

Journal of the American Statistical Association , volume =

Tweedie's formula and selection bias , author =. Journal of the American Statistical Association , volume =. 2011 , publisher =

work page 2011
[13]

Advances in Neural Information Processing Systems , volume =

Generative modeling by estimating gradients of the data distribution , author =. Advances in Neural Information Processing Systems , volume =

work page
[14]

International Conference on Learning Representations , year =

Denoising diffusion implicit models , author =. International Conference on Learning Representations , year =

work page
[15]

Neural Computation , volume =

A connection between score matching and denoising autoencoders , author =. Neural Computation , volume =. 2011 , publisher =

work page 2011
[16]

ACM Computing Surveys , volume =

Diffusion models: a comprehensive survey of methods and applications , author =. ACM Computing Surveys , volume =. 2023 , publisher =

work page 2023
[17]

The Eleventh International Conference on Learning Representations , year=

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow , author=. The Eleventh International Conference on Learning Representations , year=

work page
[18]

International Conference on Learning Representations , year =

Building normalizing flows with stochastic interpolants , author =. International Conference on Learning Representations , year =

work page
[19]

Advances in Neural Information Processing Systems , volume =

Structured denoising diffusion models in discrete state-spaces , author =. Advances in Neural Information Processing Systems , volume =

work page
[20]

Uncertainty in Artificial Intelligence , pages =

Sliced score matching: a scalable approach to density and score estimation , author =. Uncertainty in Artificial Intelligence , pages =. 2020 , organization =

work page 2020
[21]

Advances in Neural Information Processing Systems , volume =

Concrete score matching: generalized score matching for discrete data , author =. Advances in Neural Information Processing Systems , volume =

work page
[22]

International Conference on Machine Learning , year =

Discrete diffusion modeling by estimating the ratios of the data distribution , author =. International Conference on Machine Learning , year =

work page
[23]

Advances in Neural Information Processing Systems , volume =

A continuous time framework for discrete denoising models , author =. Advances in Neural Information Processing Systems , volume =

work page
[24]

Advances in Neural Information Processing Systems , year =

Transfer learning for diffusion models , author =. Advances in Neural Information Processing Systems , year =

work page
[25]

Edit flows: Flow matching with edit operations.arXiv preprint arXiv:2506.09018,

Edit flows: flow matching with edit operations , author =. arXiv preprint arXiv:2506.09018 , year =

work page arXiv
[26]

Advances in Neural Information Processing Systems , volume =

Simplified and generalized masked diffusion for discrete data , author =. Advances in Neural Information Processing Systems , volume =

work page
[27]

Advances in Neural Information Processing Systems , volume =

Simple and effective masked diffusion language models , author =. Advances in Neural Information Processing Systems , volume =

work page
[28]

International Conference on Learning Representations , year =

Your absorbing discrete diffusion secretly models the conditional distributions of clean data , author =. International Conference on Learning Representations , year =

work page
[29]

Advances in Neural Information Processing Systems , volume =

Argmax flows and multinomial diffusion: Learning categorical distributions , author =. Advances in Neural Information Processing Systems , volume =

work page
[30]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

High-resolution image synthesis with latent diffusion models , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page
[31]

Journal of Machine Learning Research , volume =

Gaussian interpolation flows , author =. Journal of Machine Learning Research , volume =

work page
[32]

Advances in Neural Information Processing Systems , volume =

Neural ordinary differential equations , author =. Advances in Neural Information Processing Systems , volume =

work page
[33]

International Conference on Learning Representations , year =

Generator matching: Generative modeling with arbitrary Markov processes , author =. International Conference on Learning Representations , year =

work page
[34]

Transactions on Machine Learning Research , issn=

Error Bounds for Flow Matching Methods , author=. Transactions on Machine Learning Research , issn=

work page
[35]

From denoising diffusions to denoising

Benton, Joe and Shi, Yuyang and De Bortoli, Valentin and Deligiannidis, George and Doucet, Arnaud , journal =. From denoising diffusions to denoising. 2024 , publisher =

work page 2024
[36]

12 Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, et al

d1: Scaling reasoning in diffusion large language models via reinforcement learning , author =. arXiv preprint arXiv:2504.12216 , year =

work page arXiv
[37]

Advances in Neural Information Processing Systems , volume =

Direct preference optimization: your language model is secretly a reward model , author =. Advances in Neural Information Processing Systems , volume =

work page
[38]

International Conference on Learning Representations , year =

Large language diffusion models , author =. International Conference on Learning Representations , year =

work page
[39]

Zhu, Fengqi and Wang, Rongzhen and Nie, Shen and Zhang, Xiaolu and Wu, Chunwei and Hu, Jun and Zhou, Jun and Chen, Jianfei and Lin, Yankai and Wen, Ji-Rong and others , journal =

work page
[40]

Yang, Ling and Tian, Ye and Li, Bowen and Zhang, Xinchen and Shen, Ke and Tong, Yunhai and Wang, Mengdi , journal =

work page
[41]

the method of paired comparisons , author =

Rank analysis of incomplete block designs: I. the method of paired comparisons , author =. Biometrika , volume =. 1952 , publisher =

work page 1952
[42]

International Conference on Machine Learning , pages =

Trust region policy optimization , author =. International Conference on Machine Learning , pages =. 2015 , organization =

work page 2015
[43]

A short note on an inequality between

Canonne, Cl. A short note on an inequality between. arXiv preprint arXiv:2202.07198 , year =

work page arXiv
[44]

Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, YK and Wu, Y and others , journal =

work page
[46]

International Conference on Learning Representations , year =

Scaling up masked diffusion models on text , author =. International Conference on Learning Representations , year =

work page
[47]

Understanding

Liu, Zichen and Chen, Changyu and Li, Wenjun and Qi, Penghui and Pang, Tianyu and Du, Chao and Lee, Wee Sun and Lin, Min , journal =. Understanding

work page
[48]

Advances in Neural Information Processing Systems , volume =

Training and inference on any-order autoregressive models the right way , author =. Advances in Neural Information Processing Systems , volume =

work page
[49]

International Conference on Learning Representations , year =

Autoregressive diffusion models , author =. International Conference on Learning Representations , year =

work page
[50]

arXiv preprint arXiv:2506.23589 , year =

Transition matching: scalable and flexible generative modeling , author =. arXiv preprint arXiv:2506.23589 , year =

work page arXiv
[51]

International Conference on Learning Representations , year =

Flow matching with general discrete paths: a kinetic-optimal perspective , author =. International Conference on Learning Representations , year =

work page
[52]

DeepSeek-

Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Zhang, Ruoyu and Xu, Runxin and Zhu, Qihao and Ma, Shirong and Wang, Peiyi and Bi, Xiao and others , journal =. DeepSeek-

work page
[53]

International Conference on Machine Learning , year =

On the guidance of flow matching , author =. International Conference on Machine Learning , year =

work page
[54]

International Conference on Learning Representations , year =

Unlocking guidance for discrete state-space diffusion and flow models , author =. International Conference on Learning Representations , year =

work page
[55]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

MaskGIT: masked generative image transformer , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

work page
[56]

International Conference on Learning Representations , year =

Score-based continuous-time discrete diffusion models , author =. International Conference on Learning Representations , year =

work page
[57]

Wang, Jin and Lai, Yao and Li, Aoxue and Zhang, Shifeng and Sun, Jiacheng and Kang, Ning and Wu, Chengyue and Li, Zhenguo and Luo, Ping , journal =

work page
[58]

International Conference on Machine Learning , pages =

Deep unsupervised learning using nonequilibrium thermodynamics , author =. International Conference on Machine Learning , pages =. 2015 , organization =

work page 2015
[59]

International Conference on Learning Representations , year =

Simple guidance mechanisms for discrete diffusion models , author =. International Conference on Learning Representations , year =

work page
[60]

Advances in Neural Information Processing Systems , volume =

Elucidating the design space of diffusion-based generative models , author =. Advances in Neural Information Processing Systems , volume =

work page
[61]

International Conference on Machine Learning , pages =

Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning , author =. International Conference on Machine Learning , pages =. 2023 , organization =

work page 2023
[62]

International Conference on Learning Representations , year =

Energy-weighted flow matching for offline reinforcement learning , author =. International Conference on Learning Representations , year =

work page
[63]

Guided flows for generative modeling and decision making.arXiv preprint arXiv:2311.13443, 2023

Guided flows for generative modeling and decision making , author =. arXiv preprint arXiv:2311.13443 , year =

work page arXiv
[64]

Diffusion Posterior Sampling for General Noisy Inverse Problems

Diffusion posterior sampling for general noisy inverse problems , author =. arXiv preprint arXiv:2209.14687 , year =

work page internal anchor Pith review arXiv
[65]

International Conference on Machine Learning , pages =

Loss-guided diffusion models for plug-and-play controllable generation , author =. International Conference on Machine Learning , pages =. 2023 , organization =

work page 2023
[66]

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Autoregressive model beats diffusion: llama for scalable image generation , author =. arXiv preprint arXiv:2406.06525 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[67]

Hierarchical text-conditional image generation with

Ramesh, Aditya and Dhariwal, Prafulla and Nichol, Alex and Chu, Casey and Chen, Mark , journal =. Hierarchical text-conditional image generation with

work page
[68]

International Conference on Learning Representations , year =

Podell, Dustin and English, Zion and Lacey, Kyle and Blattmann, Andreas and Dockhorn, Tim and M. International Conference on Learning Representations , year =

work page
[69]

Ge, Yuying and Zhao, Sijie and Zhu, Jinguo and Ge, Yixiao and Yi, Kun and Song, Lin and Li, Chen and Ding, Xiaohan and Shan, Ying , journal =

work page
[70]

World model on million-length video and language with blockwise

Liu, Hao and Yan, Wilson and Zaharia, Matei and Abbeel, Pieter , booktitle =. World model on million-length video and language with blockwise

work page
[71]

International Conference on Learning Representations , year =

Show-o: one single transformer to unify multimodal understanding and generation , author =. International Conference on Learning Representations , year =

work page
[72]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

Wu, Chengyue and Chen, Xiaokang and Wu, Zhiyu and Ma, Yiyang and Liu, Xingchao and Pan, Zizheng and Liu, Wen and Xie, Zhenda and Yu, Xingkai and Ruan, Chong and Luo, Ping , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

work page
[73]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

Dual diffusion for unified image generation and understanding , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

work page
[74]

Unified multimodal discrete diffusion

Unified multimodal discrete diffusion , author =. arXiv preprint arXiv:2503.20853 , year =

work page arXiv
[75]

International Conference on Learning Representations , year =

Transfusion: predict the next token and diffuse images with one multi-modal model , author =. International Conference on Learning Representations , year =

work page
[76]

Chen, Xiaokang and Wu, Zhiyu and Liu, Xingchao and Pan, Zizheng and Liu, Wen and Xie, Zhenda and Yu, Xingkai and Ruan, Chong , journal =. Janus-

work page
[77]

Emu3: Next-Token Prediction is All You Need

Emu3: next-token prediction is all you need , author =. arXiv preprint arXiv:2409.18869 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[78]

International Conference on Learning Representations , year =

PixArt- : fast training of diffusion transformer for photorealistic text-to-image synthesis , author =. International Conference on Learning Representations , year =

work page
[79]

Improving image generation with better captions , author =

work page
[80]

International Conference on Machine Learning , year =

Scaling rectified flow transformers for high-resolution image synthesis , author =. International Conference on Machine Learning , year =

work page
[81]

arXiv preprint arXiv:2408.08252 , year =

Derivative-free guidance in continuous and discrete diffusion models with soft value-based decoding , author =. arXiv preprint arXiv:2408.08252 , year =

work page arXiv

Showing first 80 references.