RationalRewards recovers rationales from preference data via PARROT to create a critique-first reward model that improves visual generators at both training time through RL and test time through prompt refinement, matching RL fine-tuning performance while using far less data.
-3 (Minor mismatch):Most relevant elements are preserved, but a few aspects (e.g., background details, lighting consistency) are missing or incorrectly handled
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
RationalRewards recovers rationales from preference data via PARROT to create a critique-first reward model that improves visual generators at both training time through RL and test time through prompt refinement, matching RL fine-tuning performance while using far less data.