UnifiedReward is the first unified reward model that jointly assesses multimodal understanding and generation to provide better preference signals for aligning vision models via DPO.
T2v-turbo: Breaking the quality bottleneck of video consistency model with mixed reward feedback
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 5representative citing papers
LIVEditor-14B applies a new sparse attention method (ISA) that prunes context and uses query-sharpness routing to cut attention latency ~60% with no loss in editing quality on standard benchmarks.
The work introduces rCM, a score-regularized continuous-time consistency model that matches DMD2 quality on large models up to 14B parameters while improving diversity and enabling 1-4 step sampling.
A human preference dataset and VideoReward model enable Flow-DPO and Flow-NRG to produce smoother, better-aligned videos from text prompts in flow-based generators.
RATS lets few-step visual generators surpass multi-step teachers by shaping trajectories with reward-based adaptive guidance instead of strict imitation.
citing papers explorer
-
Unified Reward Model for Multimodal Understanding and Generation
UnifiedReward is the first unified reward model that jointly assesses multimodal understanding and generation to provide better preference signals for aligning vision models via DPO.
-
Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency
The work introduces rCM, a score-regularized continuous-time consistency model that matches DMD2 quality on large models up to 14B parameters while improving diversity and enabling 1-4 step sampling.
-
Improving Video Generation with Human Feedback
A human preference dataset and VideoReward model enable Flow-DPO and Flow-NRG to produce smoother, better-aligned videos from text prompts in flow-based generators.