pith. machine review for the scientific record. sign in

arxiv: 2605.09291 · v1 · submitted 2026-05-10 · 💻 cs.LG · stat.AP

Recognition: 2 theorem links

· Lean Theorem

dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:46 UTC · model grok-4.3

classification 💻 cs.LG stat.AP
keywords discrete flow modelsreinforcement learningpolicy optimizationtext-to-image generationmultimodal modelsMarkov decision processtrajectory probabilitydiscrete diffusion
0
0 comments X

The pith

By deriving full trajectory probabilities and modeling denoising as an MDP, dFlowGRPO enables rate-aware policy optimization for general discrete flow models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops dFlowGRPO, a reinforcement learning framework that extends optimization techniques to discrete flow models for generating discrete data such as images. The authors calculate probabilities over complete generation sequences and treat the sequence of denoising steps as decisions in a Markov process. This lets the optimization draw on both transition rates and posterior estimates from the model itself. A sympathetic reader would care because it removes the restriction to masked diffusion language models and opens discrete flows to reward-driven training that can match continuous flow performance on generation tasks.

Core claim

The central claim is that discrete flow models admit an explicit full trajectory probability, and that casting the denoising steps as a Markov decision process allows the policy gradient to incorporate conditional transition rates together with the posterior model; the resulting dFlowGRPO therefore supports reinforcement learning across arbitrary probability paths and non-masked source distributions, as shown by its application to the FUDOKI multimodal model on image generation and understanding benchmarks.

What carries the argument

The derivation of the full trajectory probability for discrete flow models combined with the Markov decision process formulation of the denoising sequence, which together let the optimizer use rate and posterior information.

If this is right

  • dFlowGRPO outperforms existing GRPO-type methods for dLLMs on text-to-image generation tasks.
  • It reaches performance competitive with continuous flow-based models trained using FlowGRPO.
  • The same training run also yields strong results on multimodal understanding tasks.
  • The framework works for a broad family of probability paths and non-masked source distributions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The MDP view of denoising could be reused to add reward signals in other discrete generative settings such as structured text or graph generation.
  • Explicit use of transition rates may allow hybrid training pipelines that move parameters between discrete and continuous flow models.
  • If the rate-aware updates remain stable, the approach offers a route to fine-tune multimodal models for both generation quality and reasoning accuracy within one framework.

Load-bearing premise

The full trajectory probability derivation and MDP formulation must hold accurately for arbitrary probability paths and non-masked source distributions without introducing biases or instabilities in the resulting policy updates.

What would settle it

An experiment in which dFlowGRPO applied to a non-masked discrete flow model produces training instability or inferior text-to-image performance compared with standard GRPO baselines would falsify the claim of broad applicability.

read the original abstract

Discrete flow models (DFMs) are a class of flexible generative models for generating discrete data, and diffusion large language models (dLLMs) can be viewed as a special case with a specific choice of mixture path and a masked source distribution. While several recent works have explored reinforcement learning into dLLMs, its application to more general discrete flow models remains underexplored. In this work, we present discrete Flow-GRPO (dFlowGRPO), a unified reinforcement learning framework for discrete flow models that supports a broad family of probability paths and non-masked source distributions. We derive the full trajectory probability for DFMs and formulate denoising as a Markov decision process, enabling dFlowGRPO to incorporate information from both the associated conditional transition rates and the posterior model during reinforcement learning. We apply dFlowGRPO to FUDOKI, a recent multimodal discrete flow model, and evaluate it on both image generation and multimodal understanding tasks. Empirical results show that dFlowGRPO outperforms existing GRPO-type methods for dLLMs on text-to-image generation tasks and achieves performance competitive with continuous flow-based models trained using FlowGRPO, while also demonstrating strong capabilities on understanding tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces dFlowGRPO, a unified RL framework for discrete flow models (DFMs) that extends beyond masked dLLMs. It derives the full trajectory probability for DFMs, formulates denoising as an MDP incorporating conditional transition rates and posteriors, and enables rate-aware policy optimization for a broad family of probability paths and non-masked sources. Applied to FUDOKI on text-to-image generation and multimodal understanding tasks, it reports outperformance over GRPO-type dLLM methods and competitiveness with continuous FlowGRPO models.

Significance. If the derivation is general and the MDP yields unbiased gradients without path-specific assumptions, this would meaningfully broaden RL fine-tuning for discrete generative models, moving beyond the masked-mixture restriction of dLLMs. The reported empirical gains on image generation and understanding tasks suggest practical value, especially if the method proves stable across non-masked sources.

major comments (2)
  1. [Methods (trajectory probability derivation and MDP formulation)] The central claim that the derived trajectory probability and MDP formulation support rate-aware policy optimization for arbitrary probability paths and non-masked source distributions (Abstract) is load-bearing. The skeptic concern is valid: discrete flows define transitions via path-dependent conditional rates, and if the derivation relies on identities that hold only under masking or specific mixtures, the resulting policy gradient would be biased for general DFMs. Please provide the explicit steps (e.g., in the Methods derivation) showing how the full trajectory probability remains unbiased without masking assumptions, or demonstrate via a counterexample that it does not.
  2. [Experiments] Empirical support is limited to FUDOKI (a masked-source model). The Abstract claims broad applicability to non-masked sources, yet no experiments or ablation on non-masked DFMs are reported. This leaves the generality of the rate-aware optimization untested and weakens the claim that dFlowGRPO outperforms GRPO-type methods across the stated family of models.
minor comments (2)
  1. [Experiments] Clarify the exact datasets, metrics, and baselines used for the text-to-image and understanding tasks to allow direct comparison with prior GRPO and FlowGRPO results.
  2. Ensure all notation for conditional rates, posteriors, and trajectory probabilities is defined before first use and is consistent between the derivation and the algorithm pseudocode.

Simulated Author's Rebuttal

2 responses · 1 unresolved

Thank you for the constructive feedback on our paper. We address the major comments below and have made revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Methods (trajectory probability derivation and MDP formulation)] The central claim that the derived trajectory probability and MDP formulation support rate-aware policy optimization for arbitrary probability paths and non-masked source distributions (Abstract) is load-bearing. The skeptic concern is valid: discrete flows define transitions via path-dependent conditional rates, and if the derivation relies on identities that hold only under masking or specific mixtures, the resulting policy gradient would be biased for general DFMs. Please provide the explicit steps (e.g., in the Methods derivation) showing how the full trajectory probability remains unbiased without masking assumptions, or demonstrate via a counterexample that it does not.

    Authors: We appreciate the referee's careful scrutiny of the derivation. The trajectory probability for DFMs is derived from the general continuous-time Markov chain formulation, where the probability of a trajectory is the product of the instantaneous transition rates λ_t(x_{t+1} | x_t) integrated over the path, combined with the posterior probabilities from the model. This does not rely on masking-specific identities; the masking is a special case where the rate matrix has a particular structure (e.g., absorbing to mask token). The MDP is defined with states as the current discrete state, actions as the next state, and rewards incorporating the rate and posterior. The policy gradient is unbiased because it follows the standard REINFORCE or GRPO estimator applied to this general MDP. To make this explicit, we have added a detailed step-by-step derivation in the revised Methods section (Section 3.2), showing the expansion from the flow matching objective to the trajectory log-probability without any masking assumptions. We believe this addresses the concern and confirms generality. revision: yes

  2. Referee: [Experiments] Empirical support is limited to FUDOKI (a masked-source model). The Abstract claims broad applicability to non-masked sources, yet no experiments or ablation on non-masked DFMs are reported. This leaves the generality of the rate-aware optimization untested and weakens the claim that dFlowGRPO outperforms GRPO-type methods across the stated family of models.

    Authors: We agree that our experiments focus on FUDOKI, which uses a masked source distribution, as it is a state-of-the-art multimodal DFM. While this validates the method on a practical model, we acknowledge that direct empirical comparison on non-masked DFMs would provide stronger evidence for the broad applicability. Implementing and training non-masked variants requires significant additional resources and model development, which was beyond the scope of this work. In the revised manuscript, we have added a discussion in the Experiments and Conclusion sections clarifying the scope of the empirical results and emphasizing that the theoretical framework applies generally, with FUDOKI serving as a representative case. We also suggest directions for future work on non-masked sources. revision: partial

standing simulated objections not resolved
  • Empirical evaluation on non-masked discrete flow models

Circularity Check

0 steps flagged

Derivation of trajectory probability and MDP formulation is self-contained

full rationale

The paper states that it derives the full trajectory probability for DFMs and formulates denoising as a Markov decision process directly from the conditional transition rates and posterior model structure. No equations or steps in the provided abstract or description reduce this derivation to fitted parameters, self-definitions, or load-bearing self-citations by construction. The central claims about supporting a broad family of probability paths are presented as following from the derivation rather than presupposing the target results. Empirical evaluations on FUDOKI are downstream applications and do not feed back into the derivation. This is the standard case of a self-contained derivation without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract; the paper likely relies on standard flow matching and RL assumptions but specifics on free parameters or invented entities are not detailed.

axioms (1)
  • domain assumption Denoising in DFMs can be formulated as a Markov decision process using full trajectory probabilities derived from conditional transition rates and posterior models.
    Invoked to enable incorporation of rate information during reinforcement learning.

pith-pipeline@v0.9.0 · 5516 in / 1260 out tokens · 34374 ms · 2026-05-12T03:46:21.640831+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

167 extracted references · 167 canonical work pages · 19 internal anchors

  1. [1]

    International Conference on Learning Representations , year =

    Flow matching for generative modeling , author =. International Conference on Learning Representations , year =

  2. [2]

    Applebaum, David , year =. L

  3. [3]

    arXiv preprint arXiv:2407.18163 , volume=

    Statistical optimal transport , author =. arXiv preprint arXiv:2407.18163 , year =

  4. [4]

    International Conference on Learning Representations , year =

    Score-based generative modeling through stochastic differential equations , author =. International Conference on Learning Representations , year =

  5. [5]

    Advances in Neural Information Processing Systems , volume =

    Discrete flow matching , author =. Advances in Neural Information Processing Systems , volume =

  6. [6]

    Advances in Neural Information Processing Systems , volume =

    Denoising diffusion probabilistic models , author =. Advances in Neural Information Processing Systems , volume =

  7. [7]

    1998 , publisher =

    Markov chains , author =. 1998 , publisher =

  8. [8]

    International Conference on Machine Learning , year =

    Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design , author =. International Conference on Machine Learning , year =

  9. [9]

    Diffusion models beat

    Dhariwal, Prafulla and Nichol, Alexander , booktitle =. Diffusion models beat

  10. [10]

    NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications , year =

    Classifier-free diffusion guidance , author =. NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications , year =

  11. [11]

    Consistency Models

    Consistency models , author =. arXiv preprint arXiv:2303.01469 , year =

  12. [12]

    Journal of the American Statistical Association , volume =

    Tweedie's formula and selection bias , author =. Journal of the American Statistical Association , volume =. 2011 , publisher =

  13. [13]

    Advances in Neural Information Processing Systems , volume =

    Generative modeling by estimating gradients of the data distribution , author =. Advances in Neural Information Processing Systems , volume =

  14. [14]

    International Conference on Learning Representations , year =

    Denoising diffusion implicit models , author =. International Conference on Learning Representations , year =

  15. [15]

    Neural Computation , volume =

    A connection between score matching and denoising autoencoders , author =. Neural Computation , volume =. 2011 , publisher =

  16. [16]

    ACM Computing Surveys , volume =

    Diffusion models: a comprehensive survey of methods and applications , author =. ACM Computing Surveys , volume =. 2023 , publisher =

  17. [17]

    The Eleventh International Conference on Learning Representations , year=

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow , author=. The Eleventh International Conference on Learning Representations , year=

  18. [18]

    International Conference on Learning Representations , year =

    Building normalizing flows with stochastic interpolants , author =. International Conference on Learning Representations , year =

  19. [19]

    Advances in Neural Information Processing Systems , volume =

    Structured denoising diffusion models in discrete state-spaces , author =. Advances in Neural Information Processing Systems , volume =

  20. [20]

    Uncertainty in Artificial Intelligence , pages =

    Sliced score matching: a scalable approach to density and score estimation , author =. Uncertainty in Artificial Intelligence , pages =. 2020 , organization =

  21. [21]

    Advances in Neural Information Processing Systems , volume =

    Concrete score matching: generalized score matching for discrete data , author =. Advances in Neural Information Processing Systems , volume =

  22. [22]

    International Conference on Machine Learning , year =

    Discrete diffusion modeling by estimating the ratios of the data distribution , author =. International Conference on Machine Learning , year =

  23. [23]

    Advances in Neural Information Processing Systems , volume =

    A continuous time framework for discrete denoising models , author =. Advances in Neural Information Processing Systems , volume =

  24. [24]

    Advances in Neural Information Processing Systems , year =

    Transfer learning for diffusion models , author =. Advances in Neural Information Processing Systems , year =

  25. [25]

    Edit flows: Flow matching with edit operations.arXiv preprint arXiv:2506.09018,

    Edit flows: flow matching with edit operations , author =. arXiv preprint arXiv:2506.09018 , year =

  26. [26]

    Advances in Neural Information Processing Systems , volume =

    Simplified and generalized masked diffusion for discrete data , author =. Advances in Neural Information Processing Systems , volume =

  27. [27]

    Advances in Neural Information Processing Systems , volume =

    Simple and effective masked diffusion language models , author =. Advances in Neural Information Processing Systems , volume =

  28. [28]

    International Conference on Learning Representations , year =

    Your absorbing discrete diffusion secretly models the conditional distributions of clean data , author =. International Conference on Learning Representations , year =

  29. [29]

    Advances in Neural Information Processing Systems , volume =

    Argmax flows and multinomial diffusion: Learning categorical distributions , author =. Advances in Neural Information Processing Systems , volume =

  30. [30]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    High-resolution image synthesis with latent diffusion models , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

  31. [31]

    Journal of Machine Learning Research , volume =

    Gaussian interpolation flows , author =. Journal of Machine Learning Research , volume =

  32. [32]

    Advances in Neural Information Processing Systems , volume =

    Neural ordinary differential equations , author =. Advances in Neural Information Processing Systems , volume =

  33. [33]

    International Conference on Learning Representations , year =

    Generator matching: Generative modeling with arbitrary Markov processes , author =. International Conference on Learning Representations , year =

  34. [34]

    Transactions on Machine Learning Research , issn=

    Error Bounds for Flow Matching Methods , author=. Transactions on Machine Learning Research , issn=

  35. [35]

    From denoising diffusions to denoising

    Benton, Joe and Shi, Yuyang and De Bortoli, Valentin and Deligiannidis, George and Doucet, Arnaud , journal =. From denoising diffusions to denoising. 2024 , publisher =

  36. [36]

    12 Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, et al

    d1: Scaling reasoning in diffusion large language models via reinforcement learning , author =. arXiv preprint arXiv:2504.12216 , year =

  37. [37]

    Advances in Neural Information Processing Systems , volume =

    Direct preference optimization: your language model is secretly a reward model , author =. Advances in Neural Information Processing Systems , volume =

  38. [38]

    International Conference on Learning Representations , year =

    Large language diffusion models , author =. International Conference on Learning Representations , year =

  39. [39]

    Zhu, Fengqi and Wang, Rongzhen and Nie, Shen and Zhang, Xiaolu and Wu, Chunwei and Hu, Jun and Zhou, Jun and Chen, Jianfei and Lin, Yankai and Wen, Ji-Rong and others , journal =

  40. [40]

    Yang, Ling and Tian, Ye and Li, Bowen and Zhang, Xinchen and Shen, Ke and Tong, Yunhai and Wang, Mengdi , journal =

  41. [41]

    the method of paired comparisons , author =

    Rank analysis of incomplete block designs: I. the method of paired comparisons , author =. Biometrika , volume =. 1952 , publisher =

  42. [42]

    International Conference on Machine Learning , pages =

    Trust region policy optimization , author =. International Conference on Machine Learning , pages =. 2015 , organization =

  43. [43]

    A short note on an inequality between

    Canonne, Cl. A short note on an inequality between. arXiv preprint arXiv:2202.07198 , year =

  44. [44]

    Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, YK and Wu, Y and others , journal =

  45. [46]

    International Conference on Learning Representations , year =

    Scaling up masked diffusion models on text , author =. International Conference on Learning Representations , year =

  46. [47]

    Understanding

    Liu, Zichen and Chen, Changyu and Li, Wenjun and Qi, Penghui and Pang, Tianyu and Du, Chao and Lee, Wee Sun and Lin, Min , journal =. Understanding

  47. [48]

    Advances in Neural Information Processing Systems , volume =

    Training and inference on any-order autoregressive models the right way , author =. Advances in Neural Information Processing Systems , volume =

  48. [49]

    International Conference on Learning Representations , year =

    Autoregressive diffusion models , author =. International Conference on Learning Representations , year =

  49. [50]

    arXiv preprint arXiv:2506.23589 , year =

    Transition matching: scalable and flexible generative modeling , author =. arXiv preprint arXiv:2506.23589 , year =

  50. [51]

    International Conference on Learning Representations , year =

    Flow matching with general discrete paths: a kinetic-optimal perspective , author =. International Conference on Learning Representations , year =

  51. [52]

    DeepSeek-

    Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Zhang, Ruoyu and Xu, Runxin and Zhu, Qihao and Ma, Shirong and Wang, Peiyi and Bi, Xiao and others , journal =. DeepSeek-

  52. [53]

    International Conference on Machine Learning , year =

    On the guidance of flow matching , author =. International Conference on Machine Learning , year =

  53. [54]

    International Conference on Learning Representations , year =

    Unlocking guidance for discrete state-space diffusion and flow models , author =. International Conference on Learning Representations , year =

  54. [55]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    MaskGIT: masked generative image transformer , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

  55. [56]

    International Conference on Learning Representations , year =

    Score-based continuous-time discrete diffusion models , author =. International Conference on Learning Representations , year =

  56. [57]

    Wang, Jin and Lai, Yao and Li, Aoxue and Zhang, Shifeng and Sun, Jiacheng and Kang, Ning and Wu, Chengyue and Li, Zhenguo and Luo, Ping , journal =

  57. [58]

    International Conference on Machine Learning , pages =

    Deep unsupervised learning using nonequilibrium thermodynamics , author =. International Conference on Machine Learning , pages =. 2015 , organization =

  58. [59]

    International Conference on Learning Representations , year =

    Simple guidance mechanisms for discrete diffusion models , author =. International Conference on Learning Representations , year =

  59. [60]

    Advances in Neural Information Processing Systems , volume =

    Elucidating the design space of diffusion-based generative models , author =. Advances in Neural Information Processing Systems , volume =

  60. [61]

    International Conference on Machine Learning , pages =

    Contrastive energy prediction for exact energy-guided diffusion sampling in offline reinforcement learning , author =. International Conference on Machine Learning , pages =. 2023 , organization =

  61. [62]

    International Conference on Learning Representations , year =

    Energy-weighted flow matching for offline reinforcement learning , author =. International Conference on Learning Representations , year =

  62. [63]

    Guided flows for generative modeling and decision making.arXiv preprint arXiv:2311.13443, 2023

    Guided flows for generative modeling and decision making , author =. arXiv preprint arXiv:2311.13443 , year =

  63. [64]

    Diffusion Posterior Sampling for General Noisy Inverse Problems

    Diffusion posterior sampling for general noisy inverse problems , author =. arXiv preprint arXiv:2209.14687 , year =

  64. [65]

    International Conference on Machine Learning , pages =

    Loss-guided diffusion models for plug-and-play controllable generation , author =. International Conference on Machine Learning , pages =. 2023 , organization =

  65. [66]

    Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

    Autoregressive model beats diffusion: llama for scalable image generation , author =. arXiv preprint arXiv:2406.06525 , year =

  66. [67]

    Hierarchical text-conditional image generation with

    Ramesh, Aditya and Dhariwal, Prafulla and Nichol, Alex and Chu, Casey and Chen, Mark , journal =. Hierarchical text-conditional image generation with

  67. [68]

    International Conference on Learning Representations , year =

    Podell, Dustin and English, Zion and Lacey, Kyle and Blattmann, Andreas and Dockhorn, Tim and M. International Conference on Learning Representations , year =

  68. [69]

    Ge, Yuying and Zhao, Sijie and Zhu, Jinguo and Ge, Yixiao and Yi, Kun and Song, Lin and Li, Chen and Ding, Xiaohan and Shan, Ying , journal =

  69. [70]

    World model on million-length video and language with blockwise

    Liu, Hao and Yan, Wilson and Zaharia, Matei and Abbeel, Pieter , booktitle =. World model on million-length video and language with blockwise

  70. [71]

    International Conference on Learning Representations , year =

    Show-o: one single transformer to unify multimodal understanding and generation , author =. International Conference on Learning Representations , year =

  71. [72]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

    Wu, Chengyue and Chen, Xiaokang and Wu, Zhiyu and Ma, Yiyang and Liu, Xingchao and Pan, Zizheng and Liu, Wen and Xie, Zhenda and Yu, Xingkai and Ruan, Chong and Luo, Ping , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

  72. [73]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

    Dual diffusion for unified image generation and understanding , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

  73. [74]

    Unified multimodal discrete diffusion

    Unified multimodal discrete diffusion , author =. arXiv preprint arXiv:2503.20853 , year =

  74. [75]

    International Conference on Learning Representations , year =

    Transfusion: predict the next token and diffuse images with one multi-modal model , author =. International Conference on Learning Representations , year =

  75. [76]

    Chen, Xiaokang and Wu, Zhiyu and Liu, Xingchao and Pan, Zizheng and Liu, Wen and Xie, Zhenda and Yu, Xingkai and Ruan, Chong , journal =. Janus-

  76. [77]

    Emu3: Next-Token Prediction is All You Need

    Emu3: next-token prediction is all you need , author =. arXiv preprint arXiv:2409.18869 , year =

  77. [78]

    International Conference on Learning Representations , year =

    PixArt- : fast training of diffusion transformer for photorealistic text-to-image synthesis , author =. International Conference on Learning Representations , year =

  78. [79]

    Improving image generation with better captions , author =

  79. [80]

    International Conference on Machine Learning , year =

    Scaling rectified flow transformers for high-resolution image synthesis , author =. International Conference on Machine Learning , year =

  80. [81]

    arXiv preprint arXiv:2408.08252 , year =

    Derivative-free guidance in continuous and discrete diffusion models with soft value-based decoding , author =. arXiv preprint arXiv:2408.08252 , year =

Showing first 80 references.