CivBench trains models on turn-level states in Civilization V to predict victory probabilities, providing a progress-based evaluation of LLM strategic capabilities across 307 games with 7 models.
BALROG : Benchmarking agentic LLM and VLM reasoning on games
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
SOL is a new hierarchical RL algorithm that reaches 35x higher throughput and outperforms flat agents when trained on 30 billion frames in NetHack while showing positive scaling.
citing papers explorer
-
CivBench: Progress-Based Evaluation for LLMs' Strategic Decision-Making in Civilization V
CivBench trains models on turn-level states in Civilization V to predict victory probabilities, providing a progress-based evaluation of LLM strategic capabilities across 307 games with 7 models.
-
Scalable Option Learning in High-Throughput Environments
SOL is a new hierarchical RL algorithm that reaches 35x higher throughput and outperforms flat agents when trained on 30 billion frames in NetHack while showing positive scaling.