ChessArena benchmarks 13 LLMs in over 800 chess games, revealing none exceed amateur human level and some lose to random moves, while a fine-tuned Qwen3-8B approaches larger models.
So, House 6 has MusicGenre: classical
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ChessArena: A Chess Testbed for Evaluating Strategic Reasoning Capabilities of Large Language Models
ChessArena benchmarks 13 LLMs in over 800 chess games, revealing none exceed amateur human level and some lose to random moves, while a fine-tuned Qwen3-8B approaches larger models.