ChessArena benchmarks 13 LLMs in over 800 chess games, revealing none exceed amateur human level and some lose to random moves, while a fine-tuned Qwen3-8B approaches larger models.
Since House 5 is very short, the short person must be in House 3
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
ChessArena benchmarks 13 LLMs in over 800 chess games, revealing none exceed amateur human level and some lose to random moves, while a fine-tuned Qwen3-8B approaches larger models.