Pref-GRPO stabilizes T2I RL training by using pairwise win rates from preference models as rewards instead of normalized pointwise scores, while UniGenBench enables finer-grained model evaluation across themes and criteria.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Pref-GRPO stabilizes T2I RL training by using pairwise win rates from preference models as rewards instead of normalized pointwise scores, while UniGenBench enables finer-grained model evaluation across themes and criteria.