pith. machine review for the scientific record. sign in

, wCs)∈ RCs+1, encoding the relative value of each category

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2026 1

verdicts

ACCEPT 1

representative citing papers

Ranking Reasoning LLMs under Test-Time Scaling

cs.LG · 2026-03-11 · accept · novelty 5.0

Many established statistical ranking techniques produce orderings of reasoning LLMs under test-time scaling that closely match a Bayesian gold standard, with mean Kendall tau_b of 0.93-0.95 at full trials and best methods reaching 0.86 at single trials.

citing papers explorer

Showing 1 of 1 citing paper.

  • Ranking Reasoning LLMs under Test-Time Scaling cs.LG · 2026-03-11 · accept · none · ref 17

    Many established statistical ranking techniques produce orderings of reasoning LLMs under test-time scaling that closely match a Bayesian gold standard, with mean Kendall tau_b of 0.93-0.95 at full trials and best methods reaching 0.86 at single trials.