MTR-Bench is a new automated benchmark for multi-turn reasoning in LLMs covering diverse tasks and difficulty levels with 3600 instances.
Navigation requires strategic use of path counts and flags, as connected junctions appear in random order at each visit
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation
MTR-Bench is a new automated benchmark for multi-turn reasoning in LLMs covering diverse tasks and difficulty levels with 3600 instances.