Improved Mean Flows: On the Challenges of Fastforward Generative Models
Pith reviewed 2026-05-17 02:14 UTC · model grok-4.3
The pith
Reformulated MeanFlow training enables 1.72 FID one-step generation on ImageNet 256x256 from scratch.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The improved MeanFlow (iMF) recasts the original training target, which depended on both ground-truth fields and the network itself, into a loss on instantaneous velocity v re-parameterized by a network predicting average velocity u. This yields a standard regression problem that improves training stability. Guidance is formulated as explicit conditioning variables processed via in-context conditioning, retaining flexibility at test time. Trained entirely from scratch, iMF reaches 1.72 FID with a single function evaluation on ImageNet 256×256, outperforming prior one-step methods and closing the gap to multi-step approaches without distillation.
What carries the argument
Re-parameterization of the objective as a regression loss on instantaneous velocity v via a network predicting average velocity u, combined with explicit guidance scales as in-context conditioning variables.
If this is right
- One-step generative models can be trained stably from scratch on large-scale image datasets like ImageNet.
- Guidance scale remains adjustable at inference time without retraining the model.
- In-context conditioning reduces model size while maintaining or improving performance.
- Single-evaluation generation reaches FID scores competitive with multi-step methods.
- High-quality fastforward generation is possible without distillation.
Where Pith is reading between the lines
- The velocity re-parameterization could be tested in other flow-matching or velocity-based generative frameworks to check for similar stability gains.
- Explicit in-context conditioning might extend to text-to-image or video generation to improve flexibility in those settings.
- Combining this approach with model compression techniques could enable real-time single-step generation on resource-limited hardware.
- Experiments at higher resolutions or on different data modalities would test whether the performance scaling holds beyond 256×256 images.
Load-bearing premise
Re-parameterizing the training objective as a loss on instantaneous velocity re-parameterized by average velocity prediction creates a standard regression problem that improves stability without introducing bias or new instabilities.
What would settle it
A side-by-side training run on ImageNet 256×256 where the re-parameterized loss produces higher instability or worse final FID than the original MeanFlow objective would falsify the stability and performance gains.
Figures
read the original abstract
MeanFlow (MF) has recently been established as a framework for one-step generative modeling. However, its ``fastforward'' nature introduces key challenges in both the training objective and the guidance mechanism. First, the original MF's training target depends not only on the underlying ground-truth fields but also on the network itself. To address this issue, we recast the objective as a loss on the instantaneous velocity $v$, re-parameterized by a network that predicts the average velocity $u$. Our reformulation yields a more standard regression problem and improves the training stability. Second, the original MF fixes the classifier-free guidance scale during training, which sacrifices flexibility. We tackle this issue by formulating guidance as explicit conditioning variables, thereby retaining flexibility at test time. The diverse conditions are processed through in-context conditioning, which reduces model size and benefits performance. Overall, our $\textbf{improved MeanFlow}$ ($\textbf{iMF}$) method, trained entirely from scratch, achieves $\textbf{1.72}$ FID with a single function evaluation (1-NFE) on ImageNet 256$\times$256. iMF substantially outperforms prior methods of this kind and closes the gap with multi-step methods while using no distillation. We hope our work will further advance fastforward generative modeling as a stand-alone paradigm.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents 'improved MeanFlow' (iMF), an enhancement to the MeanFlow framework for one-step (fastforward) generative modeling. The authors identify two challenges: (1) the original training target depending on the network itself, addressed by recasting the objective as a loss on instantaneous velocity v re-parameterized via a network predicting average velocity u; (2) fixed classifier-free guidance scale, addressed by explicit conditioning variables processed through in-context conditioning. They report that iMF, trained from scratch, achieves an FID of 1.72 on ImageNet 256×256 using a single function evaluation (1-NFE), substantially outperforming prior one-step methods and closing the gap with multi-step approaches without using distillation.
Significance. If the re-parameterized objective is mathematically equivalent to the original MeanFlow loss and the learned model satisfies the mean-flow consistency condition without bias, this work would represent a meaningful step forward in developing efficient, standalone one-step generative models. The reported 1.72 FID score with 1-NFE is competitive and highlights the potential of fastforward paradigms. The use of in-context conditioning for flexible guidance is a practical contribution that could benefit other conditional generation tasks.
major comments (2)
- [Method section (training objective reformulation)] The reformulation of the training objective as a loss on instantaneous velocity v, re-parameterized by a network predicting average velocity u, is presented as yielding a more standard regression problem that improves stability. However, no derivation is provided showing that this re-parameterization is equivalent to the original MeanFlow objective or that the mapping from u to v preserves the fixed point exactly without introducing bias or approximation error. This is load-bearing for the central claim, as the 1.72 FID result is reported for a model trained with the new objective.
- [Experiments section] Experimental results: The manuscript reports a concrete 1.72 FID for 1-NFE on ImageNet 256×256 but provides no error bars, ablation studies isolating the effect of the re-parameterization versus the guidance changes, or verification that the learned flow satisfies the original mean-flow consistency condition. These omissions make it difficult to confirm that the performance gain stems from the proposed fixes rather than an altered objective.
minor comments (2)
- [Abstract] The abstract and introduction could more explicitly define 'fastforward' and 'MeanFlow' for readers new to the prior work, including a brief recap of the original objective.
- [Method] Notation for v (instantaneous velocity) and u (average velocity) should be introduced with a clear equation relating them to the original MeanFlow fields.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions made to strengthen the paper.
read point-by-point responses
-
Referee: [Method section (training objective reformulation)] The reformulation of the training objective as a loss on instantaneous velocity v, re-parameterized by a network predicting average velocity u, is presented as yielding a more standard regression problem that improves stability. However, no derivation is provided showing that this re-parameterization is equivalent to the original MeanFlow objective or that the mapping from u to v preserves the fixed point exactly without introducing bias or approximation error. This is load-bearing for the central claim, as the 1.72 FID result is reported for a model trained with the new objective.
Authors: We agree that an explicit derivation is necessary to support the central claim. In the revised manuscript we have added a full derivation in the Method section (new subsection 3.2) proving that the re-parameterized objective on instantaneous velocity v is mathematically equivalent to the original MeanFlow loss. The derivation shows that, when the network satisfies the mean-flow consistency condition, the mapping from the predicted average velocity u to v preserves the fixed point exactly and introduces neither bias nor approximation error; the change is only in the form of the regression target, which improves numerical stability without altering the optimization landscape at convergence. revision: yes
-
Referee: [Experiments section] Experimental results: The manuscript reports a concrete 1.72 FID for 1-NFE on ImageNet 256×256 but provides no error bars, ablation studies isolating the effect of the re-parameterization versus the guidance changes, or verification that the learned flow satisfies the original mean-flow consistency condition. These omissions make it difficult to confirm that the performance gain stems from the proposed fixes rather than an altered objective.
Authors: We acknowledge these omissions weaken the experimental validation. In the revised version we have added (i) error bars computed over three independent runs for the main 1.72 FID result, (ii) ablation tables that isolate the contribution of the velocity re-parameterization from the in-context conditioning changes, and (iii) a consistency verification experiment that reports the mean-flow consistency error on held-out data, confirming the learned model satisfies the original condition to within numerical tolerance. These additions directly address the concern that gains might stem from an altered objective. revision: yes
Circularity Check
No significant circularity; reformulation is a modeling choice and performance is empirical.
full rationale
The paper recasts the original MF training target (which depends on the network) as a loss on instantaneous velocity v re-parameterized via a network outputting average velocity u. This is described as producing a more standard regression problem that improves stability. The headline result of 1.72 FID at 1-NFE on ImageNet 256x256 is obtained by training the resulting model from scratch and measuring its generative performance on held-out data. No equation in the provided text reduces the reported FID or the validity of the one-step generator to a fitted parameter or self-citation by construction. The re-parameterization is a design decision whose correctness is assessed by downstream empirical outcomes rather than being tautological. No self-citation chain or uniqueness theorem is invoked to force the central claim. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Re-parameterizing the training target as a regression on instantaneous velocity produces a more stable and standard optimization problem
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We recast the objective as a loss on the instantaneous velocity v, re-parameterized by a network that predicts the average velocity u. ... V_θ(zt) ≜ u_θ(zt) + (t−r) JVP_sg(u_θ; v_θ)
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leancostAlphaLog_high_calibrated_iff unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the MeanFlow identity ... u(zt) = v(zt) − (t−r) d/dt u(zt)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 22 Pith papers
-
Discrete MeanFlow: One-Step Generation via Conditional Transition Kernels
Discrete MeanFlow parameterizes CTMC conditional transition kernels with a boundary-by-construction design to enable exact one-step generation in discrete state spaces.
-
One-Step Generative Modeling via Wasserstein Gradient Flows
W-Flow achieves state-of-the-art one-step ImageNet 256x256 generation at 1.29 FID by training a static neural network to follow a Wasserstein gradient flow that minimizes Sinkhorn divergence, delivering roughly 100x f...
-
CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making
CoFlow achieves state-of-the-art coordination quality in offline MARL using only 1-3 denoising steps by natively coupling velocity fields across agents via coordinated attention and gating.
-
CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making
CoFlow achieves state-of-the-art coordination in offline MARL using single-pass joint velocity fields with Coordinated Velocity Attention and Adaptive Coordination Gating.
-
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance
FMRG is a training-free, single-trajectory guidance method for flow models derived from optimal control that achieves strong reward alignment with only 3 NFEs.
-
Speech Enhancement Based on Drifting Models
DriftSE achieves one-step speech enhancement by evolving the pushforward distribution of a mapping function to match the clean speech distribution using a learned drifting field.
-
Learning Sampled-data Control for Swarms via MeanFlow
Generalizes MeanFlow to learn finite-horizon minimum-energy control coefficients for linear swarm systems via a differential identity and stop-gradient regression objective.
-
Setting-Matched and Semantics-Scaled Benchmarking of One-Step Generative Models Against Multistep Diffusion and Flow Models
Matched benchmarking reveals FID misleads in few-step regimes under CFG, prompting CLIP-scaled and PickScore-scaled FID and IS variants for better semantic evaluation of one-step image generators.
-
Flow Map Language Models: One-step Language Modeling via Continuous Denoising
Continuous flow language models match discrete diffusion baselines and their distilled one-step flow map versions exceed 8-step discrete diffusion quality on LM1B and OWT.
-
Efficient Image Synthesis with Sphere Latent Encoder
Decouples Sphere Encoder into fixed pretrained encoder and spherical latent denoiser, yielding higher quality and faster inference than the joint original on Animal-Faces, Oxford-Flowers and ImageNet-1K.
-
ELF: Embedded Language Flows
ELF is a continuous embedding-space flow matching model for language that stays continuous until the last step and outperforms prior discrete and continuous diffusion language models with fewer sampling steps.
-
A Few-Step Generative Model on Cumulative Flow Maps
Cumulative flow maps unify few-step generative modeling for diffusion and flow models via cumulative transport and parameterization with minimal changes to time embeddings and objectives.
-
CoFlow: Coordinated Few-Step Flow for Offline Multi-Agent Decision Making
CoFlow preserves inter-agent coordination in few-step offline MARL by using a natively joint velocity field with Coordinated Velocity Attention and Adaptive Coordination Gating, matching or exceeding baselines in 1-3 ...
-
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance
FMRG is a training-free single-trajectory guidance framework for flow-based models that matches or exceeds baselines on reward-guided tasks and inverse problems using as few as 3 NFEs.
-
Point-MF: One-step Point Cloud Generation from a Single Image via Mean Flows
Point-MF performs one-step point cloud reconstruction from single images by learning a mean velocity field in point space with a tailored Diffusion Transformer and a new auxiliary loss.
-
Speech Enhancement Based on Drifting Models
DriftSE formulates speech denoising as an equilibrium problem solved in one step via a learned drifting field that matches distributions, enabling unpaired training and outperforming multi-step baselines on VoiceBank-DEMAND.
-
Speech Enhancement Based on Drifting Models
DriftSE achieves one-step speech enhancement by evolving a pushforward distribution to match clean speech using a drifting field, outperforming multi-step diffusion on VoiceBank-DEMAND.
-
FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation
FlowLM converts diffusion LMs to flow matching via fine-tuning, achieving few-step generation that rivals or beats 2000-step diffusion and saturates faster than training flow models from scratch.
-
Flow Map Language Models: One-step Language Modeling via Continuous Denoising
Continuous flows on token embeddings with flow-map distillation produce one-step language models whose quality exceeds recent 8-step discrete diffusion baselines on LM1B and OpenWebText.
-
Drift Flow Matching
Drift Flow Matching connects direct transport maps from Drift Models with flow-based iterative refinement to enable adaptive computation in generative modeling.
-
Real-time Speech Restoration using Data Prediction Mean Flows
A Data Prediction Mean Flow model enables real-time speech restoration with 120x lower compute and no algorithmic latency beyond the STFT while matching state-of-the-art offline quality.
-
Accelerating Redshift-Conditioned Galaxy Image Synthesis with One-step Generative Modeling
One-step pixel-MeanFlow models recover key galaxy morphology statistics at orders-of-magnitude lower computational cost than standard DDPM sampling while remaining weaker on fine-grained structure.
Reference graph
Works this paper leans on
-
[1]
Building nor- malizing flows with stochastic interpolants
Michael S Albergo and Eric Vanden-Eijnden. Building nor- malizing flows with stochastic interpolants. InICLR, 2023
work page 2023
-
[2]
Stochastic interpolants: A unifying framework for flows and diffusions
Michael S Albergo, Nicholas M Boffi, and Eric Vanden- Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions. InICLR, 2023
work page 2023
-
[3]
Nicholas M Boffi, Michael S Albergo, and Eric Vanden- Eijnden. Flow map matching.TMLR, 2025
work page 2025
-
[4]
Large scale GAN training for high fidelity natural image synthesis
Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale GAN training for high fidelity natural image synthesis. InICLR, 2019
work page 2019
-
[5]
Maskgit: Masked generative image transformer
Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, and William T Freeman. Maskgit: Masked generative image transformer. In CVPR, 2022
work page 2022
-
[6]
Visual generation without guidance
Huayu Chen, Kai Jiang, Kaiwen Zheng, Jianfei Chen, Hang Su, and Jun Zhu. Visual generation without guidance. In ICML, 2025
work page 2025
-
[7]
arXiv preprint arXiv:2510.14974 (2025)
Hansheng Chen, Kai Zhang, Hao Tan, Leonidas Guibas, Gordon Wetzstein, and Sai Bi. pi-flow: Policy-based few- step generation via imitation distillation.arXiv preprint arXiv:2510.14974, 2025
-
[8]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InCVPR, 2009
work page 2009
-
[9]
Diffusion models beat gans on image synthesis
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. InNeurIPS, 2021
work page 2021
-
[10]
One step diffusion via shortcut models
Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One step diffusion via shortcut models. InICLR, 2025
work page 2025
-
[11]
Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, and J Zico Kolter. Consistency models made easy. InICLR, 2024
work page 2024
-
[12]
Mean flows for one-step generative modeling
Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling. InNeurIPS, 2025
work page 2025
-
[13]
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noord- huis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, large mini- batch sgd: Training imagenet in 1 hour.arXiv preprint arXiv:1706.02677, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[14]
Starflow: Scaling latent normalizing flows for high-resolution image synthesis.NeurIPS, 2025
Jiatao Gu, Tianrong Chen, David Berthelot, Huangjie Zheng, Yuyang Wang, Ruixiang Zhang, Laurent Dinh, Miguel An- gel Bautista, Josh Susskind, and Shuangfei Zhai. Starflow: Scaling latent normalizing flows for high-resolution image synthesis.NeurIPS, 2025
work page 2025
-
[15]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR, 2016. 10 class 14: indigo bunting, indigo finch, indigo bird, Passerina cyanea class 22: bald eagle, American eagle, Haliaeetus leucocephalus class 42: agama class 81: ptarmigan class 108: sea anemone, anemone class 140: red-backed sandpiper, dunli...
work page 2016
-
[16]
Gans trained by a two time-scale update rule converge to a local nash equilibrium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bern- hard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. InNeurIPS, 2017
work page 2017
-
[17]
Classifier-free diffusion guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. InNeurIPS Workshop, 2021
work page 2021
-
[18]
Denoising diffu- sion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. InNeurIPS, 2020
work page 2020
-
[19]
simple diffusion: End-to-end diffusion for high resolution images
Emiel Hoogeboom, Jonathan Heek, and Tim Salimans. simple diffusion: End-to-end diffusion for high resolution images. In ICML, 2023
work page 2023
-
[20]
Simpler diffusion (sid2): 1.5 fid on imagenet512 with pixel-space diffusion
Emiel Hoogeboom, Thomas Mensink, Jonathan Heek, Kay Lamerigts, Ruiqi Gao, and Tim Salimans. Simpler diffusion (sid2): 1.5 fid on imagenet512 with pixel-space diffusion. In CVPR, 2025
work page 2025
-
[21]
Zheyuan Hu, Chieh-Hsin Lai, Yuki Mitsufuji, and Stefano 11 class 483: castle class 540: drilling platform, offshore rig class 562: fountain class 649: megalith, megalithic structure class 698: palace class 963: pizza, pizza pie class 970: alp class 973: coral reef class 976: promontory, headland, head, foreland class 985: daisy Figure 9.Uncurated1-NFE cla...
-
[22]
Scaling up gans for text-to-image synthesis
Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, and Taesung Park. Scaling up gans for text-to-image synthesis. InCVPR, 2023
work page 2023
-
[23]
Elucidating the design space of diffusion-based generative models
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. InNeurIPS, 2022
work page 2022
-
[24]
Consistency trajectory models: Learning probability flow ODE trajectory of diffusion
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Mu- rata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, and Stefano Ermon. Consistency trajectory models: Learning probability flow ODE trajectory of diffusion. In ICLR, 2024
work page 2024
-
[25]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015. 12
work page 2015
-
[26]
Applying guidance in a limited interval improves sample and distribution quality in diffusion models
Tuomas Kynkäänniemi, Miika Aittala, Tero Karras, Samuli Laine, Timo Aila, and Jaakko Lehtinen. Applying guidance in a limited interval improves sample and distribution quality in diffusion models. InNeurIPS, 2024
work page 2024
-
[27]
Autoregressive image generation using resid- ual quantization
Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, and Wook-Shin Han. Autoregressive image generation using resid- ual quantization. InCVPR, 2022
work page 2022
-
[28]
Kyungmin Lee, Sihyun Yu, and Jinwoo Shin. Decoupled meanflow: Turning flow models into flow maps for acceler- ated sampling.arXiv preprint arXiv:2510.24474, 2025
-
[29]
Autoregressive image generation without vector quantization
Tianhong Li, Yonglong Tian, He Li, Mingyang Deng, and Kaiming He. Autoregressive image generation without vector quantization. InNeurIPS, 2024
work page 2024
-
[30]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximil- ian Nickel, and Matthew Le. Flow matching for generative modeling. InICLR, 2023
work page 2023
-
[31]
Flow straight and fast: Learning to generate and transfer data with rectified flow
Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InICLR, 2023
work page 2023
-
[32]
Simplifying, stabilizing and scaling continuous-time consistency models
Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models. InICLR, 2025
work page 2025
-
[33]
Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers
Nanye Ma, Mark Goldstein, Michael S Albergo, Nicholas M Boffi, Eric Vanden-Eijnden, and Saining Xie. Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers. InECCV, 2024
work page 2024
-
[34]
On distillation of guided diffusion models
Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik P Kingma, Stefano Ermon, Jonathan Ho, and Tim Salimans. On distillation of guided diffusion models. InCVPR, 2023
work page 2023
-
[35]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InCVPR, 2023
work page 2023
-
[36]
Flow-anchored consistency models
Yansong Peng, Kai Zhu, Yu Liu, Pingyu Wu, Hebei Li, Xi- aoyan Sun, and Feng Wu. Flow-anchored consistency models. arXiv preprint arXiv:2507.03738, 2025
-
[37]
Beyond next-token: Next-x prediction for autoregressive visual generation
Sucheng Ren, Qihang Yu, Ju He, Xiaohui Shen, Alan Yuille, and Liang-Chieh Chen. Beyond next-token: Next-x prediction for autoregressive visual generation. InICCV, 2025
work page 2025
-
[38]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2021
work page 2021
-
[39]
Improved techniques for training gans
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. InNeurIPS, 2016
work page 2016
-
[40]
Stylegan-xl: Scaling stylegan to large diverse datasets
Axel Sauer, Katja Schwarz, and Andreas Geiger. Stylegan-xl: Scaling stylegan to large diverse datasets. InSIGGRAPH, 2022
work page 2022
-
[41]
GLU Variants Improve Transformer
Noam Shazeer. Glu variants improve transformer.arXiv preprint arXiv:2002.05202, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2002
-
[42]
Deep unsupervised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric A Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InICML, 2015
work page 2015
-
[43]
Improved techniques for training consistency models
Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models. InICLR, 2024
work page 2024
-
[44]
Generative modeling by estimating gradients of the data distribution
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. InNeurIPS, 2019
work page 2019
-
[45]
Score-based generative modeling through stochastic differential equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InICLR, 2021
work page 2021
-
[46]
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. InICML, 2023
work page 2023
-
[47]
Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 2024
Jianlin Su, Yu Lu, Shengfeng Pan, Murtadha Ahmed, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 2024
work page 2024
-
[48]
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun, Yi Jiang, Shoufa Chen, Shilong Zhang, Bingyue Peng, Ping Luo, and Zehuan Yuan. Autoregressive model beats diffusion: Llama for scalable image generation.arXiv preprint arXiv:2406.06525, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[49]
Diffusion models without classifier-free guidance
Zhicong Tang, Jianmin Bao, Dong Chen, and Baining Guo. Diffusion models without classifier-free guidance. InICML, 2025
work page 2025
-
[50]
Visual autoregressive modeling: Scalable image generation via next-scale prediction
Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, and Li- wei Wang. Visual autoregressive modeling: Scalable image generation via next-scale prediction. InNeurIPS, 2024
work page 2024
-
[51]
Jetformer: An autoregressive generative model of raw images and text
Michael Tschannen, André Susano Pinto, and Alexander Kolesnikov. Jetformer: An autoregressive generative model of raw images and text. InICLR, 2025
work page 2025
-
[52]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, 2017
work page 2017
-
[53]
Ddt: Decoupled diffusion transformer.arXiv preprint arXiv:2504.05741, 2025
Shuai Wang, Zhi Tian, Weilin Huang, and Limin Wang. Ddt: Decoupled diffusion transformer.arXiv preprint arXiv:2504.05741, 2025
-
[54]
Zidong Wang, Yiyuan Zhang, Xiaoyu Yue, Xiangyu Yue, Yangguang Li, Wanli Ouyang, and Lei Bai. Transition models: Rethinking the generative learning objective.arXiv preprint arXiv:2509.04394, 2025
-
[55]
Jingfeng Yao, Bin Yang, and Xinggang Wang. Reconstruc- tion vs. generation: Taming optimization dilemma in latent diffusion models. InCVPR, 2025
work page 2025
-
[56]
Randomized autoregressive visual generation
Qihang Yu, Ju He, Xueqing Deng, Xiaohui Shen, and Liang- Chieh Chen. Randomized autoregressive visual generation. InICCV, 2025
work page 2025
-
[57]
Representa- tion alignment for generation: Training diffusion transformers is easier than you think
Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, and Saining Xie. Representa- tion alignment for generation: Training diffusion transformers is easier than you think. InICLR, 2025
work page 2025
-
[58]
Root mean square layer normalization
Biao Zhang and Rico Sennrich. Root mean square layer normalization. InNeurIPS, 2019
work page 2019
- [59]
-
[60]
Diffusion Transformers with Representation Autoencoders
Boyang Zheng, Nanye Ma, Shengbang Tong, and Saining Xie. Diffusion transformers with representation autoencoders. arXiv preprint arXiv:2510.11690, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[61]
Linqi Zhou, Stefano Ermon, and Jiaming Song. Inductive moment matching. InICML, 2025. 13
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.