DRM turns a pre-trained diffusion model into a step-wise reward model and uses it for dense RL training (Step-wise GRPO) and guided sampling to improve final image quality.
arXiv preprint arXiv:2303.14420 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5representative citing papers
ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.
DanceGRPO applies GRPO to visual generation tasks to achieve stable policy optimization across diffusion models, rectified flows, multiple tasks, and diverse reward models, outperforming prior RL methods.
DiffKT3D transfers priors from video diffusion models to 3D radiotherapy dose prediction via modality-specific embeddings and clinically guided RL, reducing voxel MAE from 2.07 to 1.93 and claiming SOTA over the GDP-HMM challenge winner.
RAFT aligns generative models by ranking samples with a reward model and fine-tuning only on the top-ranked outputs, reporting gains on reward scores and automated metrics for LLMs and diffusion models.
citing papers explorer
-
DRM: Diffusion-based Reward Model With Step-wise Guidance
DRM turns a pre-trained diffusion model into a step-wise reward model and uses it for dense RL training (Step-wise GRPO) and guided sampling to improve final image quality.
-
Deepfake Detection Generalization with Diffusion Noise
ANL uses diffusion noise prediction and attention to regularize deepfake detectors for better generalization to unseen synthesis methods without added inference cost.
-
DanceGRPO: Unleashing GRPO on Visual Generation
DanceGRPO applies GRPO to visual generation tasks to achieve stable policy optimization across diffusion models, rectified flows, multiple tasks, and diverse reward models, outperforming prior RL methods.
-
Any2Any 3D Diffusion Models with Knowledge Transfer: A Radiotherapy Planning Study
DiffKT3D transfers priors from video diffusion models to 3D radiotherapy dose prediction via modality-specific embeddings and clinically guided RL, reducing voxel MAE from 2.07 to 1.93 and claiming SOTA over the GDP-HMM challenge winner.