Filtered Reasoning Score evaluates LLM reasoning quality using only the top-K% most confident traces, revealing differences in reasoning capability that accuracy alone misses and showing cross-benchmark transfer.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Filtered Reasoning Score: Evaluating Reasoning Quality on a Model's Most-Confident Traces
Filtered Reasoning Score evaluates LLM reasoning quality using only the top-K% most confident traces, revealing differences in reasoning capability that accuracy alone misses and showing cross-benchmark transfer.