Thinking LLMs achieve ~10 percentage points higher accuracy than non-thinking ones on RewardBench with under 2x compute overhead, outperforming augmentation strategies that cost over 8x more while also showing better bias robustness.
Ryan, Danmei Xu, Chris Nivera, and Daniel Campos
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.AI 1years
2025 1verdicts
UNVERDICTED 1roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
Explicit Reasoning Makes Better Judges: A Systematic Study on Accuracy, Efficiency, and Robustness
Thinking LLMs achieve ~10 percentage points higher accuracy than non-thinking ones on RewardBench with under 2x compute overhead, outperforming augmentation strategies that cost over 8x more while also showing better bias robustness.