PokeGym is a new benchmark that tests VLMs on long-horizon tasks in a complex 3D game using only visual observations, identifying deadlock recovery as the primary failure mode.
Open-Ended Wargames with Large Language Models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
LLMs in extended security dilemma games show increased conflict with more players, unraveling in finite games, and reduced conflict with communication, providing a new way to probe IR mechanisms.
citing papers explorer
-
PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models
PokeGym is a new benchmark that tests VLMs on long-horizon tasks in a complex 3D game using only visual observations, identifying deadlock recovery as the primary failure mode.
-
Multi-Agent Strategic Games with LLMs
LLMs in extended security dilemma games show increased conflict with more players, unraveling in finite games, and reduced conflict with communication, providing a new way to probe IR mechanisms.