Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

Kan Liu; Kesong Li; Kuo-kun Tseng; Tao Lan; Weiyi Lu; Yixuan Xu

arxiv: 2605.21123 · v1 · pith:U5BUMNGEnew · submitted 2026-05-20 · 💻 cs.CV · cs.LG

Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models

Kesong Li , Yixuan Xu , Kuo-kun Tseng , Weiyi Lu , Kan Liu , Tao Lan This is my paper

Pith reviewed 2026-05-21 04:56 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords Direct Preference OptimizationDiffusion ModelsFlow MatchingText-to-Image GenerationModel AlignmentGenerative ModelsPreference Optimization

0 comments

The pith

Linear-DPO replaces the sigmoid utility in standard DPO with a linear function and an EMA reference model to better align diffusion and flow-matching image generators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives a generalized DPO objective that applies to both denoising diffusion and flow-matching models through a unified reverse-time SDE framework. It shows from a gradient viewpoint that the standard sigmoid utility creates suboptimal updates for regression-based text-to-image tasks. Linear-DPO addresses this by adopting a sustained linear utility and an EMA-updated reference model during training. Experiments on SD1.5, SDXL, and SD3-Medium report better qualitative and quantitative results than prior DPO variants for preference alignment. A sympathetic reader would care because improved alignment could produce images that more closely match human preferences without discrete NLP-style objective mismatches.

Core claim

By generalizing the DPO objective via a unified reverse-time SDE framework to cover both diffusion and flow-matching, and replacing the aggressive sigmoid-based utility with a sustained linear utility while incorporating an EMA-updated reference model, Linear-DPO achieves superior performance over existing baselines in aligning generative models for text-to-image generation.

What carries the argument

Linear-DPO objective that uses a sustained linear utility function instead of sigmoid and an EMA-updated reference model to carry out preference optimization under the unified reverse-time SDE framework.

If this is right

Linear-DPO applies to both diffusion models like SD1.5 and SDXL and flow-matching models like SD3-Medium with reported gains.
The linear utility avoids the gradient issues of sigmoid in continuous generative settings.
EMA reference model updates support stable training throughout the preference optimization process.
The approach reduces objective mismatch when moving from discrete NLP DPO to regression-based image generation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Linear-DPO might improve alignment stability when preference data contains label noise common in image ratings.
The same linear utility change could be tested in other continuous generative settings such as audio or video synthesis.
Combining Linear-DPO with existing reward models might further boost performance on standard text-to-image benchmarks.
The unified SDE view opens a path to apply similar preference objectives to newer flow-based architectures.

Load-bearing premise

The unified reverse-time SDE framework accurately generalizes the DPO objective to both diffusion and flow-matching without introducing new objective mismatch for regression-based generative tasks.

What would settle it

Training Linear-DPO on SDXL or SD3-Medium and measuring no gain or a loss in human preference win rates or FID scores compared to standard DPO on the same preference dataset would falsify the claimed superiority.

Figures

Figures reproduced from arXiv: 2605.21123 by Kan Liu, Kesong Li, Kuo-kun Tseng, Tao Lan, Weiyi Lu, Yixuan Xu.

**Figure 1.** Figure 1: Samples from SDXL and SD3-M fine-tuned with our proposed Linear-DPO. Linear-DPO is a more powerful direct preference optimization method designed for diffusion and flow-matching generative models; the results show significant improvements in visual appeal, detail richness, and alignment with human preferences. To address the challenges above, we seek to answer: Can we design a single, principled direct pre… view at source ↗

**Figure 2.** Figure 2: Curves of the original sigmoid utility in Diffusion-DPO and our proposed linear utility function (top). (a) and (b) show the implicit accuracy during training with the two utility functions, respectively (Bottom). (how quickly preferences become separated) and the overall gradient scale, creating a direct trade-off and making it difficult to achieve both robustness and effectiveness. (3) Mismatch with th… view at source ↗

**Figure 3.** Figure 3: Qualitative comparison on SD1.5 model of images generated by each methods using the same prompt are presented. The proposed Linear-DPO substantially improves the visual quality and text–image alignment compared to other methods. Stable Diffusion XL-1.0 (SDXL) (Podell et al., 2023) as our diffusion models. For flow-matching, we select Stable Diffusion 3-Medium (SD3-M) (Esser et al., 2024), which adopts the … view at source ↗

**Figure 4.** Figure 4: Win ratios of Linear-DPO vs. other methods on SD3-M across three validation datasets, based on automated evaluations using the HPSv3 score. images with richer details. A similar trend is observed in [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 7.** Figure 7: PickScores under different η (top) and γ (bottom). clipping mechanism. However, when η is too large, it leads to over-optimization and consequently hurts performance. Effect of the EMA Reference Model. In Section 4.2, we discuss using an EMA copy of the policy model as the reference model to enable smoother and more sustained optimization. To validate the effectiveness of the EMA reference and select an a… view at source ↗

**Figure 6.** Figure 6: (b), the Kahneman–Tversky utility achieves higher PickScore than the other two asymmetric variants, which is consistent with the results in Diffusion-KTO; beyond these three forms, our linear utility achieves the highest PickScore. Additional analysis of the utility functions is provided in Appendix E.1. −10.0 −7.5 −5.0 −2.5 0.0 2.5 5.0 7.5 10.0 x 0 0.5 1.0 Utility U(x) Linear utility Kahneman–Tversky: σ(x… view at source ↗

**Figure 8.** Figure 8: Image samples generated from SD1.5 fine-tuned with various methods, using validation prompts from Pick-a-Pic v2, HPDv2 and PartiPrompt 21 [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Image samples generated from SDXL fine-tuned with various methods, using validation prompts from Pick-a-Pic v2, HPDv2 and PartiPrompt. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

**Figure 10.** Figure 10: Image samples generated from SD3-M fine-tuned with various methods, using validation prompts from Pick-a-Pic v2, HPDv2 and PartiPrompt. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

read the original abstract

Direct Preference Optimization (DPO) is successful for alignment in LLMs but still faces challenges in text-to-image generation. Existing studies are confined to denoising diffusion models while overlooking flow-matching, and suffer from an objective mismatch when applying discrete NLP-based DPO to regression-based generative tasks.\ In this paper, we derive a generalized DPO objective that covers both diffusion and flow-matching via a unified reverse-time SDE framework, and point out from a gradient perspective that the standard DPO objective is suboptimal for text-to-image generation. Consequently, we propose Linear-DPO, which replaces the aggressive sigmoid-based utility function with a sustained linear utility and incorporates an EMA-updated reference model. Qualitative and quantitative experiments on diffusion models (SD1.5, SDXL) and flow-matching model (SD3-Medium) demonstrate the superiority of our approach over existing baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Linear-DPO unifies DPO for diffusion and flow-matching via SDE but the gradient suboptimality claim for standard DPO looks under-supported for the continuous regression case.

read the letter

The main thing here is the extension of DPO to flow-matching models through a unified reverse-time SDE framework, paired with swapping the sigmoid for a linear utility and adding an EMA-updated reference model to address claimed gradient problems in text-to-image tasks. This specific combo is not in the prior work the abstract cites, so the unification step counts as new. The experiments on SD1.5, SDXL, and SD3-Medium give it some practical grounding, with both qualitative and quantitative comparisons that reportedly beat baselines. That coverage of a flow-matching model is useful since those are becoming more common. The paper does a reasonable job showing the method can be applied across these architectures without obvious breakage. The soft spot is the motivation for why standard DPO is suboptimal from a gradient view. The stress-test concern holds up on the abstract alone: if the analysis does not explicitly check how reward gradients vary across denoising timesteps or fully adapt to the regression nature of image latents, the linear replacement risks looking like a heuristic rather than a necessary fix. Without equations or derivations visible, it is hard to tell whether the objective mismatch is actually resolved or just papered over. This is for people already working on preference optimization for image generators, especially those fine-tuning Stable Diffusion variants. Readers who care about incremental alignment tweaks would get value from the results. The work shows clear enough thinking to deserve a serious referee, mainly to verify the SDE unification and the gradient analysis in the full text.

Referee Report

2 major / 2 minor

Summary. The paper derives a generalized DPO objective for diffusion and flow-matching models via a unified reverse-time SDE framework. It argues from a gradient perspective that the standard sigmoid-based DPO utility is suboptimal for text-to-image regression tasks, and proposes Linear-DPO which substitutes a linear utility function and adds an EMA-updated reference model. Experiments on SD1.5, SDXL, and SD3-Medium are reported to show superiority over baselines.

Significance. If the gradient analysis and generalization hold without introducing new objective mismatch, the work would usefully extend preference optimization to continuous generative models and address a practical limitation of applying LLM-style DPO to image synthesis. The coverage of flow-matching alongside diffusion and the EMA reference are practical strengths; reproducible experiments on three distinct model families add value.

major comments (2)

[§3] §3 (unified reverse-time SDE derivation): the manuscript must explicitly derive the gradient of the proposed objective with respect to the denoising network parameters and show where the sigmoid saturates in a manner that harms the regression loss on continuous latents; without this step the suboptimality claim for the diffusion/flow-matching regime remains unverified.
[§4.2] §4.2 (Linear-DPO formulation): the replacement of the sigmoid by a linear utility together with the EMA reference update is load-bearing for the central claim, yet the paper provides no ablation isolating the contribution of each change nor a direct comparison of the resulting gradients against standard DPO under the same reference-model schedule.

minor comments (2)

[Table 1] Table 1 and Figure 3: quantitative metrics (e.g., CLIP score, human preference rates) should be reported with standard deviations and statistical significance tests to support the superiority statements.
[Notation] Notation: the symbol for the EMA reference model should be introduced once and used consistently; currently the text alternates between r_EMA and r_ref without explicit definition.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating the revisions we will incorporate to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (unified reverse-time SDE derivation): the manuscript must explicitly derive the gradient of the proposed objective with respect to the denoising network parameters and show where the sigmoid saturates in a manner that harms the regression loss on continuous latents; without this step the suboptimality claim for the diffusion/flow-matching regime remains unverified.

Authors: We agree that an explicit derivation of the gradient with respect to the denoising network parameters would make the suboptimality argument more rigorous. Section 3 currently presents a gradient-based perspective on the limitations of the sigmoid utility for continuous regression tasks, but we acknowledge that the derivation steps are not fully expanded. In the revision we will add the complete gradient derivation under the unified reverse-time SDE framework and explicitly highlight the saturation regime of the sigmoid and its effect on the regression loss for continuous latents. revision: yes
Referee: [§4.2] §4.2 (Linear-DPO formulation): the replacement of the sigmoid by a linear utility together with the EMA reference update is load-bearing for the central claim, yet the paper provides no ablation isolating the contribution of each change nor a direct comparison of the resulting gradients against standard DPO under the same reference-model schedule.

Authors: We recognize that isolating the linear utility and EMA reference components would strengthen the central claim. The experiments in Section 4.2 compare the full Linear-DPO objective against baselines, but do not contain separate ablations or gradient comparisons under matched reference schedules. We will add these ablations and the requested gradient comparison in the revised version. revision: yes

Circularity Check

0 steps flagged

Derivation of generalized DPO objective and Linear-DPO proposal is self-contained with no reductions to inputs by construction.

full rationale

The paper first derives a generalized DPO objective covering diffusion and flow-matching via a unified reverse-time SDE framework, then identifies suboptimality of the standard sigmoid-based DPO from a gradient perspective for text-to-image regression tasks, and proposes Linear-DPO as a replacement using sustained linear utility plus EMA-updated reference. No quoted equations or steps reduce the claimed predictions or uniqueness to fitted parameters, self-citations, or ansatzes by construction. The central claims rest on the independent SDE unification and gradient analysis rather than re-labeling or self-referential inputs, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the reverse-time SDE provides a faithful unification and that gradient analysis correctly identifies suboptimality of sigmoid utility; no free parameters or invented entities are visible in the abstract.

axioms (1)

domain assumption Unified reverse-time SDE framework covers both diffusion and flow-matching models for DPO derivation
Invoked to derive generalized objective and identify mismatch with regression tasks.

pith-pipeline@v0.9.0 · 5691 in / 1144 out tokens · 31710 ms · 2026-05-21T04:56:32.153337+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we replace this sigmoid gating with a smoother linear utility ulinear(x) = 0.2x + 0.5 ... ω′(ΔDθ) = clip(ulinear(β̄ΔDθ), η, 1)
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the optimization of L(θ) can be interpreted as a form of weighted supervised fine-tuning ... modulated by the weighting function ω(ΔDθ) := β̄ σ(β̄ΔDθ)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 18 internal anchors

[1]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Sdxl: Improving latent diffusion models for high-resolution image synthesis , author=. arXiv preprint arXiv:2307.01952 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[3]

Advances in neural information processing systems , volume=

Variational diffusion models , author=. Advances in neural information processing systems , volume=

work page
[4]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Advances in Neural Information Processing Systems , volume=

Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models , author=. Advances in Neural Information Processing Systems , volume=

work page
[6]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Advances in Neural Information Processing Systems , volume=

Simpo: Simple preference optimization with a reference-free reward , author=. Advances in Neural Information Processing Systems , volume=

work page
[8]

arXiv preprint , year=

Kolors: Effective training of diffusion model for photorealistic text-to-image synthesis , author=. arXiv preprint , year=

work page
[9]

Stochastic Processes and their Applications , volume=

Reverse-time diffusion equation models , author=. Stochastic Processes and their Applications , volume=. 1982 , publisher=

work page 1982
[10]

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Hunyuan-dit: A powerful multi-resolution diffusion transformer with fine-grained chinese understanding , author=. arXiv preprint arXiv:2405.08748 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[11]

International conference on machine learning , pages=

Deep unsupervised learning using nonequilibrium thermodynamics , author=. International conference on machine learning , pages=. 2015 , organization=

work page 2015
[12]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

work page
[13]

Score-Based Generative Modeling through Stochastic Differential Equations

Score-based generative modeling through stochastic differential equations , author=. arXiv preprint arXiv:2011.13456 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2011
[14]

Flow Matching for Generative Modeling

Flow matching for generative modeling , author=. arXiv preprint arXiv:2210.02747 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Flow straight and fast: Learning to generate and transfer data with rectified flow , author=. arXiv preprint arXiv:2209.03003 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[16]

Building Normalizing Flows with Stochastic Interpolants

Building normalizing flows with stochastic interpolants , author=. arXiv preprint arXiv:2209.15571 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Advances in neural information processing systems , volume=

Elucidating the design space of diffusion-based generative models , author=. Advances in neural information processing systems , volume=

work page
[18]

International Conference on Medical image computing and computer-assisted intervention , pages=

U-net: Convolutional networks for biomedical image segmentation , author=. International Conference on Medical image computing and computer-assisted intervention , pages=. 2015 , organization=

work page 2015
[19]

Training Diffusion Models with Reinforcement Learning

Training diffusion models with reinforcement learning , author=. arXiv preprint arXiv:2305.13301 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[20]

Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS) 2023 , year=

Reinforcement learning for fine-tuning text-to-image diffusion models , author=. Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS) 2023 , year=

work page 2023
[21]

Advances in neural information processing systems , volume=

Generative modeling by estimating gradients of the data distribution , author=. Advances in neural information processing systems , volume=

work page
[22]

Forty-first international conference on machine learning , year=

Scaling rectified flow transformers for high-resolution image synthesis , author=. Forty-first international conference on machine learning , year=

work page
[23]

Advances in neural information processing systems , volume=

Deep reinforcement learning from human preferences , author=. Advances in neural information processing systems , volume=

work page
[24]

Qwen-Image Technical Report

Qwen-image technical report , author=. arXiv preprint arXiv:2508.02324 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

Stochastic interpolants: A unifying framework for flows and diffusions , author=. arXiv preprint arXiv:2303.08797 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Advances in neural information processing systems , volume=

Pick-a-pic: An open dataset of user preferences for text-to-image generation , author=. Advances in neural information processing systems , volume=

work page
[27]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Hpsv3: Towards wide-spectrum human preference score , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[28]

2024 , howpublished=

Black Forest Labs , title=. 2024 , howpublished=

work page 2024
[29]

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling autoregressive models for content-rich text-to-image generation , author=. arXiv preprint arXiv:2206.10789 , volume=

work page internal anchor Pith review Pith/arXiv arXiv
[30]

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis , author=. arXiv preprint arXiv:2306.09341 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[31]

2023 , month =

Christoph Schuhmann , title =. 2023 , month =

work page 2023
[32]

International conference on machine learning , pages=

Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021
[33]

Advances in Neural Information Processing Systems , volume=

Imagereward: Learning and evaluating human preferences for text-to-image generation , author=. Advances in Neural Information Processing Systems , volume=

work page
[34]

Classifier-Free Diffusion Guidance

Classifier-free diffusion guidance , author=. arXiv preprint arXiv:2207.12598 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[35]

GitHub repository , howpublished =

Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Dhruv Nair and Sayak Paul and William Berman and Yiyi Xu and Steven Liu and Thomas Wolf , title =. GitHub repository , howpublished =. 2022 , publisher =

work page 2022
[36]

Advances in neural information processing systems , volume=

Direct preference optimization: Your language model is secretly a reward model , author=. Advances in neural information processing systems , volume=

work page
[37]

the method of paired comparisons , author=

Rank analysis of incomplete block designs: I. the method of paired comparisons , author=. Biometrika , volume=. 1952 , publisher=

work page 1952
[38]

Journal of Risk and uncertainty , volume=

Advances in prospect theory: Cumulative representation of uncertainty , author=. Journal of Risk and uncertainty , volume=. 1992 , publisher=

work page 1992
[39]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Diffusion model alignment using direct preference optimization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[40]

European Conference on Computer Vision , pages=

Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024
[41]

Denoising Diffusion Implicit Models

Denoising diffusion implicit models , author=. arXiv preprint arXiv:2010.02502 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2010
[42]

Flow-GRPO: Training Flow Matching Models via Online RL

Flow-grpo: Training flow matching models via online rl , author=. arXiv preprint arXiv:2505.05470 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[43]

DanceGRPO: Unleashing GRPO on Visual Generation

DanceGRPO: Unleashing GRPO on Visual Generation , author=. arXiv preprint arXiv:2505.07818 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[44]

Advances in Neural Information Processing Systems , volume=

Aligning diffusion models by optimizing human utility , author=. Advances in Neural Information Processing Systems , volume=

work page
[45]

The Thirteenth International Conference on Learning Representations , year=

DSPO: Direct score preference optimization for diffusion model alignment , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[46]

arXiv preprint arXiv:2507.07510 , year=

Divergence minimization preference optimization for diffusion model alignment , author=. arXiv preprint arXiv:2507.07510 , year=

work page arXiv
[47]

First Workshop on Scalable Optimization for Efficient and Adaptive Foundation Models , year=

Margin-aware preference optimization for aligning diffusion models without reference , author=. First Workshop on Scalable Optimization for Efficient and Adaptive Foundation Models , year=

work page
[48]

Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models , author=

work page
[49]

International Conference on Machine Learning , pages=

Sequence tutor: Conservative fine-tuning of sequence generation models with kl-control , author=. International Conference on Machine Learning , pages=. 2017 , organization=

work page 2017
[50]

Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) , pages=

Human-centric dialog training via offline reinforcement learning , author=. Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) , pages=

work page 2020
[51]

Decoupled Weight Decay Regularization

Decoupled weight decay regularization , author=. arXiv preprint arXiv:1711.05101 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Sdxl: Improving latent diffusion models for high-resolution image synthesis , author=. arXiv preprint arXiv:2307.01952 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[3] [3]

Advances in neural information processing systems , volume=

Variational diffusion models , author=. Advances in neural information processing systems , volume=

work page

[4] [4]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Advances in Neural Information Processing Systems , volume=

Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models , author=. Advances in Neural Information Processing Systems , volume=

work page

[6] [6]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Advances in Neural Information Processing Systems , volume=

Simpo: Simple preference optimization with a reference-free reward , author=. Advances in Neural Information Processing Systems , volume=

work page

[8] [8]

arXiv preprint , year=

Kolors: Effective training of diffusion model for photorealistic text-to-image synthesis , author=. arXiv preprint , year=

work page

[9] [9]

Stochastic Processes and their Applications , volume=

Reverse-time diffusion equation models , author=. Stochastic Processes and their Applications , volume=. 1982 , publisher=

work page 1982

[10] [10]

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Hunyuan-dit: A powerful multi-resolution diffusion transformer with fine-grained chinese understanding , author=. arXiv preprint arXiv:2405.08748 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

International conference on machine learning , pages=

Deep unsupervised learning using nonequilibrium thermodynamics , author=. International conference on machine learning , pages=. 2015 , organization=

work page 2015

[12] [12]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

work page

[13] [13]

Score-Based Generative Modeling through Stochastic Differential Equations

Score-based generative modeling through stochastic differential equations , author=. arXiv preprint arXiv:2011.13456 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2011

[14] [14]

Flow Matching for Generative Modeling

Flow matching for generative modeling , author=. arXiv preprint arXiv:2210.02747 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Flow straight and fast: Learning to generate and transfer data with rectified flow , author=. arXiv preprint arXiv:2209.03003 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[16] [16]

Building Normalizing Flows with Stochastic Interpolants

Building normalizing flows with stochastic interpolants , author=. arXiv preprint arXiv:2209.15571 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Advances in neural information processing systems , volume=

Elucidating the design space of diffusion-based generative models , author=. Advances in neural information processing systems , volume=

work page

[18] [18]

International Conference on Medical image computing and computer-assisted intervention , pages=

U-net: Convolutional networks for biomedical image segmentation , author=. International Conference on Medical image computing and computer-assisted intervention , pages=. 2015 , organization=

work page 2015

[19] [19]

Training Diffusion Models with Reinforcement Learning

Training diffusion models with reinforcement learning , author=. arXiv preprint arXiv:2305.13301 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS) 2023 , year=

Reinforcement learning for fine-tuning text-to-image diffusion models , author=. Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS) 2023 , year=

work page 2023

[21] [21]

Advances in neural information processing systems , volume=

Generative modeling by estimating gradients of the data distribution , author=. Advances in neural information processing systems , volume=

work page

[22] [22]

Forty-first international conference on machine learning , year=

Scaling rectified flow transformers for high-resolution image synthesis , author=. Forty-first international conference on machine learning , year=

work page

[23] [23]

Advances in neural information processing systems , volume=

Deep reinforcement learning from human preferences , author=. Advances in neural information processing systems , volume=

work page

[24] [24]

Qwen-Image Technical Report

Qwen-image technical report , author=. arXiv preprint arXiv:2508.02324 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[25] [25]

Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

Stochastic interpolants: A unifying framework for flows and diffusions , author=. arXiv preprint arXiv:2303.08797 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

Advances in neural information processing systems , volume=

Pick-a-pic: An open dataset of user preferences for text-to-image generation , author=. Advances in neural information processing systems , volume=

work page

[27] [27]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Hpsv3: Towards wide-spectrum human preference score , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page

[28] [28]

2024 , howpublished=

Black Forest Labs , title=. 2024 , howpublished=

work page 2024

[29] [29]

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling autoregressive models for content-rich text-to-image generation , author=. arXiv preprint arXiv:2206.10789 , volume=

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis , author=. arXiv preprint arXiv:2306.09341 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[31] [31]

2023 , month =

Christoph Schuhmann , title =. 2023 , month =

work page 2023

[32] [32]

International conference on machine learning , pages=

Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021

[33] [33]

Advances in Neural Information Processing Systems , volume=

Imagereward: Learning and evaluating human preferences for text-to-image generation , author=. Advances in Neural Information Processing Systems , volume=

work page

[34] [34]

Classifier-Free Diffusion Guidance

Classifier-free diffusion guidance , author=. arXiv preprint arXiv:2207.12598 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[35] [35]

GitHub repository , howpublished =

Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Dhruv Nair and Sayak Paul and William Berman and Yiyi Xu and Steven Liu and Thomas Wolf , title =. GitHub repository , howpublished =. 2022 , publisher =

work page 2022

[36] [36]

Advances in neural information processing systems , volume=

Direct preference optimization: Your language model is secretly a reward model , author=. Advances in neural information processing systems , volume=

work page

[37] [37]

the method of paired comparisons , author=

Rank analysis of incomplete block designs: I. the method of paired comparisons , author=. Biometrika , volume=. 1952 , publisher=

work page 1952

[38] [38]

Journal of Risk and uncertainty , volume=

Advances in prospect theory: Cumulative representation of uncertainty , author=. Journal of Risk and uncertainty , volume=. 1992 , publisher=

work page 1992

[39] [39]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Diffusion model alignment using direct preference optimization , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[40] [40]

European Conference on Computer Vision , pages=

Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers , author=. European Conference on Computer Vision , pages=. 2024 , organization=

work page 2024

[41] [41]

Denoising Diffusion Implicit Models

Denoising diffusion implicit models , author=. arXiv preprint arXiv:2010.02502 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2010

[42] [42]

Flow-GRPO: Training Flow Matching Models via Online RL

Flow-grpo: Training flow matching models via online rl , author=. arXiv preprint arXiv:2505.05470 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[43] [43]

DanceGRPO: Unleashing GRPO on Visual Generation

DanceGRPO: Unleashing GRPO on Visual Generation , author=. arXiv preprint arXiv:2505.07818 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[44] [44]

Advances in Neural Information Processing Systems , volume=

Aligning diffusion models by optimizing human utility , author=. Advances in Neural Information Processing Systems , volume=

work page

[45] [45]

The Thirteenth International Conference on Learning Representations , year=

DSPO: Direct score preference optimization for diffusion model alignment , author=. The Thirteenth International Conference on Learning Representations , year=

work page

[46] [46]

arXiv preprint arXiv:2507.07510 , year=

Divergence minimization preference optimization for diffusion model alignment , author=. arXiv preprint arXiv:2507.07510 , year=

work page arXiv

[47] [47]

First Workshop on Scalable Optimization for Efficient and Adaptive Foundation Models , year=

Margin-aware preference optimization for aligning diffusion models without reference , author=. First Workshop on Scalable Optimization for Efficient and Adaptive Foundation Models , year=

work page

[48] [48]

Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models , author=

work page

[49] [49]

International Conference on Machine Learning , pages=

Sequence tutor: Conservative fine-tuning of sequence generation models with kl-control , author=. International Conference on Machine Learning , pages=. 2017 , organization=

work page 2017

[50] [50]

Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) , pages=

Human-centric dialog training via offline reinforcement learning , author=. Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) , pages=

work page 2020

[51] [51]

Decoupled Weight Decay Regularization

Decoupled weight decay regularization , author=. arXiv preprint arXiv:1711.05101 , year=

work page internal anchor Pith review Pith/arXiv arXiv