A new adjoint matching framework formulates flow model alignment as optimal control, enabling direct regression training and terminal-trajectory truncation for efficiency gains on models like SiT-XL and FLUX.
hub
Optimizing ddpm sampling with shortcut fine-tuning.arXiv preprint arXiv:2301.13362
10 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.
MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.
V-GRPO makes ELBO surrogates stable and efficient for online RL alignment of denoising models, delivering SOTA text-to-image performance with 2-3x speedups over MixGRPO and DiffusionNFT.
NaP-Control uses RL to directly predict optimized diffusion noise from a task-agnostic prior, enabling fast inference and higher success rates for versatile whole-body character control while preserving motion quality.
ViPO enhances GRPO for visual generation by creating spatially and temporally aware advantage maps from pretrained vision models to focus optimization on perceptually important regions.
DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.
DDPO uses policy gradients on the denoising process to optimize diffusion models for arbitrary rewards like human feedback or compressibility.
E²PO uses embedding-level perturbations to maintain intra-group variance and discriminative signal in RL-based preference optimization for generative flow models.
A data-generation pipeline plus pairwise subject-consistency rewards in RL improve consistency and prompt adherence for multi-subject personalized image generation.
citing papers explorer
-
Improved techniques for fine-tuning flow models via adjoint matching: a deterministic control pipeline
A new adjoint matching framework formulates flow model alignment as optimal control, enabling direct regression training and terminal-trajectory truncation for efficiency gains on models like SiT-XL and FLUX.
-
Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation
OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.
-
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
MixGRPO speeds up GRPO for flow-based image generators by restricting SDE sampling and optimization to a sliding window while using ODE elsewhere, cutting training time by up to 71% with better alignment performance.
-
V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think
V-GRPO makes ELBO surrogates stable and efficient for online RL alignment of denoising models, delivering SOTA text-to-image performance with 2-3x speedups over MixGRPO and DiffusionNFT.
-
NaP-Control: Navigating Diffusion Prior for Versatile and Fast Character Control
NaP-Control uses RL to directly predict optimized diffusion noise from a task-agnostic prior, enabling fast inference and higher success rates for versatile whole-body character control while preserving motion quality.
-
Seeing What Matters: Visual Preference Policy Optimization for Visual Generation
ViPO enhances GRPO for visual generation by creating spatially and temporally aware advantage maps from pretrained vision models to focus optimization on perceptually important regions.
-
Diffusion Policy Policy Optimization
DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.
-
Training Diffusion Models with Reinforcement Learning
DDPO uses policy gradients on the denoising process to optimize diffusion models for arbitrary rewards like human feedback or compressibility.
-
Embedding-perturbed Exploration Preference Optimization for Flow Models
E²PO uses embedding-level perturbations to maintain intra-group variance and discriminative signal in RL-based preference optimization for generative flow models.
-
PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards
A data-generation pipeline plus pairwise subject-consistency rewards in RL improve consistency and prompt adherence for multi-subject personalized image generation.