ConsistRM improves generative reward models via consistency-aware self-training, outperforming vanilla RFT by 1.5% on average across five benchmarks and four base models.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training
ConsistRM improves generative reward models via consistency-aware self-training, outperforming vanilla RFT by 1.5% on average across five benchmarks and four base models.