DocReward is a Bradley-Terry trained reward model on 117K paired documents across 32 domains that evaluates structural and stylistic professionalism independently of content, outperforming GPT-5 on benchmarks and guiding RL agents to higher-quality outputs.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DocReward: A Document Reward Model for Structuring and Stylizing
DocReward is a Bradley-Terry trained reward model on 117K paired documents across 32 domains that evaluates structural and stylistic professionalism independently of content, outperforming GPT-5 on benchmarks and guiding RL agents to higher-quality outputs.