Evaluating large language models with grid-based game competitions: An extensible LLM benchmark and leaderboard

Oguzhan Topsakal, Colby Jacob Edell, Jackson Bailey Harper · 2024 · arXiv 2407.07796

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

citation-role summary

background 1

background 1

cs.CL · 2025-05-21 · unverdicted · novelty 7.0

MTR-Bench is a new automated benchmark for multi-turn reasoning in LLMs covering diverse tasks and difficulty levels with 3600 instances.

Showing 1 of 1 citing paper.

MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation cs.CL · 2025-05-21 · unverdicted · none · ref 35
MTR-Bench is a new automated benchmark for multi-turn reasoning in LLMs covering diverse tasks and difficulty levels with 3600 instances.