MTR-Bench is a new automated benchmark for multi-turn reasoning in LLMs covering diverse tasks and difficulty levels with 3600 instances.
Let’s verify step by step
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
PathCal calibrates reasoning paths by type-aware soft rebalancing of reflection-marker logits at uncertain states, yielding better efficiency-performance trade-offs on six benchmarks.
A majorization-minimization framework turns IRT into scalable matrix factorization subproblems for LLM evaluation, delivering orders-of-magnitude speedups with identifiability guarantees.
citing papers explorer
-
PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning
PathCal calibrates reasoning paths by type-aware soft rebalancing of reflection-marker logits at uncertain states, yielding better efficiency-performance trade-offs on six benchmarks.