Exploring the Design Space of Reward Backpropagation for Flow Matching

Boye Niu; Chi Zhang; Ruoyu Wang; Tongliang Liu; Xiangxin Zhou; Yushi Huang

arxiv: 2606.11075 · v1 · pith:RAESORIWnew · submitted 2026-06-09 · 💻 cs.LG

Exploring the Design Space of Reward Backpropagation for Flow Matching

Ruoyu Wang , Boye Niu , Xiangxin Zhou , Yushi Huang , Tongliang Liu , Chi Zhang This is my paper

Pith reviewed 2026-06-27 14:08 UTC · model grok-4.3

classification 💻 cs.LG

keywords flow matchingreward backpropagationtext-to-image modelspreference optimizationgradient computationmemory efficiencysurrogate trajectoryalignment methods

0 comments

The pith

FlowBP provides a unified framework for designing memory-bounded reward backpropagation trajectories in flow matching models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper explores ways to align flow matching text-to-image models with human preferences using direct reward backpropagation. It tackles the problems of excessive memory for full trajectory activations and gradient inflation from chained Jacobians over many steps. The proposed FlowBP framework treats the backward trajectory as a design space, using a cached no-gradient rollout and a lightweight surrogate built from velocities with selective re-forwarding. This allows separating choices like active set and integration weights, and the three variants show gains on preference and quality metrics while controlling resources. Readers would care as it makes scalable preference tuning more practical for large models like FLUX.

Core claim

FlowBP is a surrogate-trajectory framework that keeps a no-gradient cached rollout for sampling and builds a lightweight backward surrogate from cached and selectively re-forwarded velocities. It separates four choices—reward-model input, active set, integration weights, and bridge coupling—and recovers prior methods as special cases. The three instantiated variants bound memory by active-set size and limit gradient chaining to at most one Jacobian factor, leading to improvements over direct-gradient baselines on most metrics for SD3.5-M, FLUX.1-dev, and FLUX.2-Klein-base.

What carries the argument

The surrogate-trajectory framework that decouples the backward path into cached rollout and lightweight velocity-based surrogate, with separable design choices of reward-model input, active set, integration weights, and bridge coupling.

Load-bearing premise

The lightweight backward surrogate from cached and selectively re-forwarded velocities approximates the full backward trajectory with enough accuracy that the resulting gradients remain useful for optimization.

What would settle it

If experiments on models small enough for full backpropagation show that FlowBP variants yield worse alignment metrics than direct methods, or if the memory savings do not hold in practice, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2606.11075 by Boye Niu, Chi Zhang, Ruoyu Wang, Tongliang Liu, Xiangxin Zhou, Yushi Huang.

**Figure 2.** Figure 2: Surrogate backward graphs under the unified framework. Each panel uses the same cached [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparison on GenEval prompts with [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation of FlowBP-Lagrange on FLUX.1-dev. Left: effect of the Lagrange quadrature order on HPSv2.1 training reward. Middle: effect of the gradient-support scale. Right: connector prediction errors for Euler, Lagrange, and uniform connectors, where uniform uses the same supports as Lagrange but replaces the integrated coefficients with equal weights. 1 50 100 150 200 250 300 Training step 0.38 0.40 0.42 0.… view at source ↗

**Figure 5.** Figure 5: Ablation of FlowBP-Bridge. Left: effect of the [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 7.** Figure 7: Connector residual and reward dynamics during training. Left: in LeapAlign, we observe [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Evaluation dynamics on the HPDv2 test split for [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗

**Figure 9.** Figure 9: Additional qualitative results on SD3.5-M using prompts from the HPDv2 test split. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_9.png] view at source ↗

**Figure 10.** Figure 10: Additional qualitative results on FLUX.1-dev using prompts from the HPDv2 test split. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_10.png] view at source ↗

**Figure 11.** Figure 11: Additional qualitative results on FLUX.2-Klein-base using prompts from the HPDv2 test split. 32 [PITH_FULL_IMAGE:figures/full_fig_p032_11.png] view at source ↗

read the original abstract

Aligning text-to-image flow matching models with human preferences via direct reward backpropagation is sample-efficient but hampered by two well-known pathologies: activations cannot be stored across the full sampling trajectory at modern model scale, and chained Jacobian products across steps inflate the reward gradient as it travels back to early indices. Connector-based methods, such as LeapAlign, address these issues by replacing the full backward trajectory with a short pinned path, highlighting a useful decoupling between sampling and optimization. However, the quality of the resulting gradient depends on how accurately this short path approximates the full rollout, especially over long intervals. We propose FlowBP, a unified surrogate-trajectory framework that treats the backward trajectory itself as the design object. FlowBP keeps a no-gradient cached rollout for sampling, then builds a lightweight backward surrogate from cached and selectively re-forwarded velocities. This view separates four choices: the reward-model input, active set, integration weights, and bridge coupling, and recovers prior direct-gradient methods as particular settings. Within this framework, we instantiate three variants: FlowBP-Sparse uses sparse Euler reconstruction, FlowBP-Bridge adds controlled bridge coupling, and FlowBP-Lagrange raises the order of leap quadrature. All three bound memory by the active-set size and limit gradient chaining to at most one Jacobian factor. Across SD3.5-M, FLUX.1-dev, and FLUX.2-Klein-base on preference, quality, and compositional metrics, the three variants improve over direct-gradient baselines on most metrics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FlowBP frames reward backprop in flow matching as a design choice over the backward surrogate trajectory, recovering prior methods as special cases while proposing three variants that claim better metrics with bounded memory.

read the letter

The main point is that the authors treat the backward trajectory itself as something you can design, separating four knobs—reward-model input, active set, integration weights, and bridge coupling—and showing how earlier direct-gradient and connector approaches sit inside that space. They then build three concrete variants: sparse Euler reconstruction, added bridge coupling, and higher-order leap quadrature. Each keeps memory to the size of the active set and caps Jacobian chaining at one factor by using a cached no-gradient rollout plus selective re-forwarding.

That unification is the useful piece. It turns an ad-hoc fix into a set of explicit choices, which makes it easier to see trade-offs and try new combinations. The experiments run the variants on SD3.5-M, FLUX.1-dev, and FLUX.2-Klein-base and report gains on preference, quality, and compositional metrics over direct baselines.

The soft spot is the missing check on the surrogate itself. The improvements rest on the cached-velocity backward path producing gradients whose direction and size stay close to a full-trajectory backprop, but the abstract gives no cosine similarity, norm ratio, or per-step error between the two. Without that, it is hard to know whether the reported wins come from the design choices or from particular integration settings. The lack of visible tables or ablation numbers in the summary also leaves the magnitude of the gains unclear.

This is for people already working on preference alignment for flow-matching image models who want a more systematic handle on the backward pass. It is worth sending to peer review because the framing is clear and the problems it targets are real, even if the empirical section will need tighter validation on the approximation quality.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces FlowBP, a unified surrogate-trajectory framework for reward backpropagation in flow matching models. It treats the backward trajectory as a design object with four separable choices (reward-model input, active set, integration weights, bridge coupling), recovers prior direct-gradient methods as special cases, and instantiates three variants (FlowBP-Sparse with sparse Euler reconstruction, FlowBP-Bridge with controlled bridge coupling, FlowBP-Lagrange with higher-order leap quadrature). All variants bound memory by active-set size and limit gradient chaining to at most one Jacobian factor. Experiments across SD3.5-M, FLUX.1-dev, and FLUX.2-Klein-base report that the variants improve over direct-gradient baselines on most preference, quality, and compositional metrics.

Significance. If the cached-velocity surrogate produces gradients sufficiently close in direction and magnitude to full-trajectory backpropagation, the work supplies a practical and modular approach to scalable direct reward optimization for large flow models while controlling memory and numerical pathologies; the explicit recovery of prior methods as particular settings of the design space is a clear strength.

major comments (2)

[Abstract] Abstract: the central empirical claim that the three variants improve over direct-gradient baselines on most metrics is asserted without any quantitative tables, error bars, ablation details, or reported effect sizes, which is load-bearing for evaluating whether the observed gains are reliable or attributable to the surrogate construction.
[Abstract and framework description] The load-bearing assumption is that the lightweight backward surrogate built from cached and selectively re-forwarded velocities yields reward gradients whose direction and magnitude remain useful for optimization; no quantitative validation (cosine similarity, relative norm difference, or per-step approximation error) of this surrogate against full-trajectory backpropagation is described, leaving open the possibility that gains arise from active-set or integration choices rather than approximation quality.

minor comments (1)

[Abstract] The abstract states improvements 'on most metrics' without specifying which metrics or models show gains versus which do not; a summary table in the main text would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract and the need for surrogate validation. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim that the three variants improve over direct-gradient baselines on most metrics is asserted without any quantitative tables, error bars, ablation details, or reported effect sizes, which is load-bearing for evaluating whether the observed gains are reliable or attributable to the surrogate construction.

Authors: We agree the abstract states the empirical outcome at a high level without numbers. The full manuscript contains the supporting tables, per-metric comparisons, and ablations in the experimental sections. Due to abstract length limits we kept the summary concise, but we can revise the abstract to include one or two key quantitative effect sizes (e.g., average preference-score lift) if the editor permits. revision: partial
Referee: [Abstract and framework description] The load-bearing assumption is that the lightweight backward surrogate built from cached and selectively re-forwarded velocities yields reward gradients whose direction and magnitude remain useful for optimization; no quantitative validation (cosine similarity, relative norm difference, or per-step approximation error) of this surrogate against full-trajectory backpropagation is described, leaving open the possibility that gains arise from active-set or integration choices rather than approximation quality.

Authors: The manuscript does not report direct surrogate-quality metrics such as cosine similarity or per-step approximation error between the cached-velocity surrogate and full-trajectory gradients. Downstream task improvements across three models provide indirect evidence that the gradients remain useful, but we acknowledge this does not substitute for explicit gradient-level validation. We will add a targeted analysis (cosine similarity and relative-norm statistics on a subset of trajectories) in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework evaluated on external benchmarks

full rationale

The paper defines a design space for surrogate backward trajectories in flow matching reward optimization, recovers prior methods as special cases, and reports empirical gains on preference/quality metrics across SD3.5-M, FLUX.1-dev, and FLUX.2-Klein-base. No equations, fitted parameters, or self-citations are shown to reduce the claimed improvements to quantities defined by construction. The central results rest on measured performance against independent baselines rather than algebraic identity with inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated assumption that the surrogate approximation error remains tolerable for optimization.

pith-pipeline@v0.9.1-grok · 5811 in / 1284 out tokens · 14606 ms · 2026-06-27T14:08:12.428007+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 7 linked inside Pith

[1]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards
[2]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =
[3]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

2016
[4]

Forty-first International Conference on Machine Learning , year=

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis , author=. Forty-first International Conference on Machine Learning , year=
[5]

The Eleventh International Conference on Learning Representations , year=

Flow Matching for Generative Modeling , author=. The Eleventh International Conference on Learning Representations , year=
[6]

The Eleventh International Conference on Learning Representations , year=

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow , author=. The Eleventh International Conference on Learning Representations , year=
[7]

arXiv preprint arXiv:2505.05470 , year=

Flow-grpo: Training flow matching models via online rl , author=. arXiv preprint arXiv:2505.05470 , year=

Pith/arXiv arXiv
[8]

The Twelfth International Conference on Learning Representations , year=

Directly Fine-Tuning Diffusion Models on Differentiable Rewards , author=. The Twelfth International Conference on Learning Representations , year=
[9]

European Conference on Computer Vision , pages=

Deep reward supervisions for tuning text-to-image diffusion models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024
[10]

2024 , howpublished=

Black Forest Labs , title=. 2024 , howpublished=

2024
[11]

arXiv preprint arXiv:2306.09341 , year=

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis , author=. arXiv preprint arXiv:2306.09341 , year=

Pith/arXiv arXiv
[12]

Advances in Neural Information Processing Systems , year=

Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation , author=. Advances in Neural Information Processing Systems , year=
[13]

Advances in Neural Information Processing Systems , year=

GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment , author=. Advances in Neural Information Processing Systems , year=
[14]

arXiv preprint arXiv:2503.05236 , year=

Unified Reward Model for Multimodal Understanding and Generation , author=. arXiv preprint arXiv:2503.05236 , year=

Pith/arXiv arXiv
[15]

International Conference on Learning Representations , year=

Decoupled Weight Decay Regularization , author=. International Conference on Learning Representations , year=
[16]

arXiv preprint arXiv:2207.12598 , year=

Classifier-Free Diffusion Guidance , author=. arXiv preprint arXiv:2207.12598 , year=

Pith/arXiv arXiv
[17]

Advances in Neural Information Processing Systems , volume=

Imagereward: Learning and evaluating human preferences for text-to-image generation , author=. Advances in Neural Information Processing Systems , volume=
[18]

arXiv preprint arXiv:2604.15311 , year=

LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories , author=. arXiv preprint arXiv:2604.15311 , year=

Pith/arXiv arXiv
[19]

2026 , howpublished =

2026
[20]

The Twelfth International Conference on Learning Representations , year=

Training Diffusion Models with Reinforcement Learning , author=. The Twelfth International Conference on Learning Representations , year=
[21]

Thirty-seventh Conference on Neural Information Processing Systems , year=

Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=
[22]

arXiv preprint arXiv:2505.07818 , year=

Dancegrpo: Unleashing grpo on visual generation , author=. arXiv preprint arXiv:2505.07818 , year=

Pith/arXiv arXiv
[23]

arXiv preprint arXiv:2507.21802 , year=

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE , author=. arXiv preprint arXiv:2507.21802 , year=

Pith/arXiv arXiv
[24]

Diffusion

Kaiwen Zheng and Huayu Chen and Haotian Ye and Haoxiang Wang and Qinsheng Zhang and Kai Jiang and Hang Su and Stefano Ermon and Jun Zhu and Ming-Yu Liu , booktitle=. Diffusion. 2026 , url=

2026
[25]

arXiv preprint arXiv:2509.25050 , year=

Advantage weighted matching: Aligning rl with pretraining in diffusion models , author=. arXiv preprint arXiv:2509.25050 , year=

arXiv
[26]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=
[27]

International Conference on Learning Representations , year=

Score-Based Generative Modeling through Stochastic Differential Equations , author=. International Conference on Learning Representations , year=
[28]

International Conference on Learning Representations , year=

Progressive Distillation for Fast Sampling of Diffusion Models , author=. International Conference on Learning Representations , year=
[29]

2023 , eprint=

Aligning Text-to-Image Diffusion Models with Reward Backpropagation , author=. 2023 , eprint=

2023
[30]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Wallace, Bram and Dang, Meihua and Rafailov, Rafael and Zhou, Linqi and Lou, Aaron and Purushwalkam, Senthil and Ermon, Stefano and Xiong, Caiming and Joty, Shafiq and Naik, Nikhil , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2024 , pages =

2024
[31]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Using human feedback to fine-tune diffusion models without any reward model , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[32]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

Self-Play Fine-tuning of Diffusion Models for Text-to-image Generation , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=
[33]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Liang, Zhanhao and Yuan, Yuhui and Gu, Shuyang and Chen, Bohan and Hang, Tiankai and Cheng, Mingxi and Li, Ji and Zheng, Liang , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2025 , pages =

2025
[34]

The Thirteenth International Conference on Learning Representations , year=

IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation , author=. The Thirteenth International Conference on Learning Representations , year=
[35]

2025 , eprint=

Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference , author=. 2025 , eprint=

2025
[36]

The Thirteenth International Conference on Learning Representations , year=

Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control , author=. The Thirteenth International Conference on Learning Representations , year=
[37]

2025 , url=

Huaisheng Zhu and Teng Xiao and Vasant G Honavar , booktitle=. 2025 , url=

2025
[38]

Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a

Kaiwen Zheng and Yongxin Chen and Huayu Chen and Guande He and Ming-Yu Liu and Jun Zhu and Qinsheng Zhang , booktitle=. Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a. 2025 , url=

2025
[39]

Forty-first International Conference on Machine Learning , year=

A Dense Reward View on Aligning Text-to-Image Diffusion with Preference , author=. Forty-first International Conference on Machine Learning , year=
[40]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

[1] [1]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

[2] [2]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

[3] [3]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

2016

[4] [4]

Forty-first International Conference on Machine Learning , year=

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis , author=. Forty-first International Conference on Machine Learning , year=

[5] [5]

The Eleventh International Conference on Learning Representations , year=

Flow Matching for Generative Modeling , author=. The Eleventh International Conference on Learning Representations , year=

[6] [6]

The Eleventh International Conference on Learning Representations , year=

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow , author=. The Eleventh International Conference on Learning Representations , year=

[7] [7]

arXiv preprint arXiv:2505.05470 , year=

Flow-grpo: Training flow matching models via online rl , author=. arXiv preprint arXiv:2505.05470 , year=

Pith/arXiv arXiv

[8] [8]

The Twelfth International Conference on Learning Representations , year=

Directly Fine-Tuning Diffusion Models on Differentiable Rewards , author=. The Twelfth International Conference on Learning Representations , year=

[9] [9]

European Conference on Computer Vision , pages=

Deep reward supervisions for tuning text-to-image diffusion models , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024

[10] [10]

2024 , howpublished=

Black Forest Labs , title=. 2024 , howpublished=

2024

[11] [11]

arXiv preprint arXiv:2306.09341 , year=

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis , author=. arXiv preprint arXiv:2306.09341 , year=

Pith/arXiv arXiv

[12] [12]

Advances in Neural Information Processing Systems , year=

Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation , author=. Advances in Neural Information Processing Systems , year=

[13] [13]

Advances in Neural Information Processing Systems , year=

GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment , author=. Advances in Neural Information Processing Systems , year=

[14] [14]

arXiv preprint arXiv:2503.05236 , year=

Unified Reward Model for Multimodal Understanding and Generation , author=. arXiv preprint arXiv:2503.05236 , year=

Pith/arXiv arXiv

[15] [15]

International Conference on Learning Representations , year=

Decoupled Weight Decay Regularization , author=. International Conference on Learning Representations , year=

[16] [16]

arXiv preprint arXiv:2207.12598 , year=

Classifier-Free Diffusion Guidance , author=. arXiv preprint arXiv:2207.12598 , year=

Pith/arXiv arXiv

[17] [17]

Advances in Neural Information Processing Systems , volume=

Imagereward: Learning and evaluating human preferences for text-to-image generation , author=. Advances in Neural Information Processing Systems , volume=

[18] [18]

arXiv preprint arXiv:2604.15311 , year=

LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories , author=. arXiv preprint arXiv:2604.15311 , year=

Pith/arXiv arXiv

[19] [19]

2026 , howpublished =

2026

[20] [20]

The Twelfth International Conference on Learning Representations , year=

Training Diffusion Models with Reinforcement Learning , author=. The Twelfth International Conference on Learning Representations , year=

[21] [21]

Thirty-seventh Conference on Neural Information Processing Systems , year=

Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

[22] [22]

arXiv preprint arXiv:2505.07818 , year=

Dancegrpo: Unleashing grpo on visual generation , author=. arXiv preprint arXiv:2505.07818 , year=

Pith/arXiv arXiv

[23] [23]

arXiv preprint arXiv:2507.21802 , year=

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE , author=. arXiv preprint arXiv:2507.21802 , year=

Pith/arXiv arXiv

[24] [24]

Diffusion

Kaiwen Zheng and Huayu Chen and Haotian Ye and Haoxiang Wang and Qinsheng Zhang and Kai Jiang and Hang Su and Stefano Ermon and Jun Zhu and Ming-Yu Liu , booktitle=. Diffusion. 2026 , url=

2026

[25] [25]

arXiv preprint arXiv:2509.25050 , year=

Advantage weighted matching: Aligning rl with pretraining in diffusion models , author=. arXiv preprint arXiv:2509.25050 , year=

arXiv

[26] [26]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

[27] [27]

International Conference on Learning Representations , year=

Score-Based Generative Modeling through Stochastic Differential Equations , author=. International Conference on Learning Representations , year=

[28] [28]

International Conference on Learning Representations , year=

Progressive Distillation for Fast Sampling of Diffusion Models , author=. International Conference on Learning Representations , year=

[29] [29]

2023 , eprint=

Aligning Text-to-Image Diffusion Models with Reward Backpropagation , author=. 2023 , eprint=

2023

[30] [30]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Wallace, Bram and Dang, Meihua and Rafailov, Rafael and Zhou, Linqi and Lou, Aaron and Purushwalkam, Senthil and Ermon, Stefano and Xiong, Caiming and Joty, Shafiq and Naik, Nikhil , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2024 , pages =

2024

[31] [31]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Using human feedback to fine-tune diffusion models without any reward model , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[32] [32]

The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

Self-Play Fine-tuning of Diffusion Models for Text-to-image Generation , author=. The Thirty-eighth Annual Conference on Neural Information Processing Systems , year=

[33] [33]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =

Liang, Zhanhao and Yuan, Yuhui and Gu, Shuyang and Chen, Bohan and Hang, Tiankai and Cheng, Mingxi and Li, Ji and Zheng, Liang , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2025 , pages =

2025

[34] [34]

The Thirteenth International Conference on Learning Representations , year=

IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation , author=. The Thirteenth International Conference on Learning Representations , year=

[35] [35]

2025 , eprint=

Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference , author=. 2025 , eprint=

2025

[36] [36]

The Thirteenth International Conference on Learning Representations , year=

Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control , author=. The Thirteenth International Conference on Learning Representations , year=

[37] [37]

2025 , url=

Huaisheng Zhu and Teng Xiao and Vasant G Honavar , booktitle=. 2025 , url=

2025

[38] [38]

Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a

Kaiwen Zheng and Yongxin Chen and Huayu Chen and Guande He and Ming-Yu Liu and Jun Zhu and Qinsheng Zhang , booktitle=. Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a. 2025 , url=

2025

[39] [39]

Forty-first International Conference on Machine Learning , year=

A Dense Reward View on Aligning Text-to-Image Diffusion with Preference , author=. Forty-first International Conference on Machine Learning , year=

[40] [40]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=