pith. sign in

Fairer preferences elicit improved human-aligned large language model judgments

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short

cs.LG · 2026-06-08 · unverdicted · novelty 6.0

Reasoning Arena converts non-diverse reward groups in RLVR into relative rewards via adaptive trace tournaments and Bradley-Terry fitting on anchor comparisons, claiming 7.6% average gains and 27-41% faster training on math/coding benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

  • Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short cs.LG · 2026-06-08 · unverdicted · none · ref 39

    Reasoning Arena converts non-diverse reward groups in RLVR into relative rewards via adaptive trace tournaments and Bradley-Terry fitting on anchor comparisons, claiming 7.6% average gains and 27-41% faster training on math/coding benchmarks.