arxiv: 2605.09965 · v2 · submitted 2026-05-11 · 💻 cs.CV

Recognition: no theorem link

Towards Generalist Game Players: An Investigation of Foundation Models in the Game Multiverse

Kuan Zhang , Dongchen Liu , Qiyue Zhao , Tianyu Xin , Yue Su , Haisheng Wang , Han Yin , Hongbo Ma

show 7 more authors

Peize Li Tianjun Gu Xiangnan Wu Xinran Zhang Yongxuan Li Zirong Chen Yiming Li

Authors on Pith no claims yet

Pith reviewed 2026-05-13 06:37 UTC · model grok-4.3

classification 💻 cs.CV

keywords foundation modelsgeneralist agentsgame AImultitask learningreinforcement learninggame environmentsAGI

0 comments

The pith

Foundation models can be trained across games with different rules to become generalist players that master any challenge and eventually create new worlds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper maps how foundation models are being used to build agents that play many different games. It contrasts the single set of physics in the real world with the multiverse of games that have entirely different rules, aesthetics, and goals, arguing this variety makes games the ideal training ground for AGI-level generalization. The work organizes the entire lifecycle of such agents into four pillars—Dataset, Model, Harness, and Benchmark—and shows how advances in these pillars break five key trade-offs. It then lays out a five-level roadmap that starts with single-game mastery and ends at a creator stage where the agent invents new game environments and evolves inside them.

Core claim

By treating the full range of games as a multiverse with varying rules, foundation models can serve as generalist players whose development follows four interdependent pillars of Dataset, Model, Harness, and Benchmark. Progress consists of breaking five fundamental trade-offs that currently limit the system. This produces a clear five-level path from mastering one game to simultaneously creating and adapting within an expanding theoretical game multiverse, which the authors present as a concrete route to AGI.

What carries the argument

The four pillars—Dataset, Model, Harness, and Benchmark—that together define the lifecycle of a generalist game player and provide the means to break five fundamental trade-offs.

If this is right

Agents advance through five defined levels, reaching the point where they both create new game worlds and continually adapt inside them.
Any improvement in one of the four pillars directly reduces one of the five binding trade-offs for the whole system.
Foundation models become the central component that enables seamless mastery across games governed by unrelated rules and objectives.
The same training process that produces generalist players also supplies the data and mechanisms needed for the final creator stage.
The game multiverse serves as the complete training and evaluation ground for AGI-level generalization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Skills learned in the game multiverse could transfer to real-world tasks that require rapid adaptation to novel constraints.
The creator stage implies agents could generate entirely new environments with custom physics and win conditions for further self-improvement.
Future benchmarks would need to measure zero-shot performance on games whose rule sets are generated on the fly rather than drawn from existing libraries.
The roadmap could be tested by checking whether models at higher levels show faster adaptation when the underlying game engine is swapped for a different one.

Load-bearing premise

That experiences gathered from games with entirely different rules can produce the same kind of omni-reality adaptability that humans show when they move from one set of real-world physics to many invented game worlds.

What would settle it

Train a foundation model on a broad collection of games and then test whether it can play a completely new game whose rules, physics, and objectives were never encountered during training, without any additional fine-tuning or data.

read the original abstract

The real world unfolds along a single set of physics laws, yet human intelligence demonstrates a remarkable capacity to generalize experiences from this singular physical existence into a multiverse of games, each governed by entirely different rules, aesthetics, physics, and objectives. This omni-reality adaptability is a hallmark of general intelligence. As Artificial Intelligence progresses towards Artificial General Intelligence, the multiverse of games has evolved from mere entertainment into the ultimate ground for training and evaluating AGI. The pursuit of this generality has unfolded across four eras: from environment-specific symbolic and reinforcement learning agents, to current large foundation models acting as generalist players, and toward a future creator stage where agent both creates new game worlds and continually evolves within them. We trace the full lifecycle of a generalist game player along four interdependent pillars: Dataset, Model, Harness, and Benchmark. Every advance across these pillars can be read as an attempt to break one of five fundamental trade-offs that currently bound the whole system. Building on this end-to-end view, we chart a five-level roadmap, progressing from single-game mastery to the ultimate creator stage in which the agent simultaneously creates and evolves within theoretical game multiverse. Taken together, our work offers a unified lens onto a rapidly shifting field,and a principled path toward the omnipotent generalist agent capable of seamlessly mastering any challenge within the multiverse of games, thereby paving the way for AGI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a survey that organizes game-AI work into four pillars and a five-level roadmap, but it adds no new experiments or derivations.

read the letter

The paper's main move is to frame the history of game agents as four interdependent pillars—Dataset, Model, Harness, and Benchmark—and then map progress onto five trade-offs and a five-level roadmap that ends with agents creating and evolving inside new game worlds. That structure is the actual new piece; it gives a single diagram for how the pieces fit together across eras from symbolic RL to current foundation models. The writing is clear on the motivation: games offer many different rule sets, so they could be a good testbed for generalization that single-physics environments do not provide. The authors also correctly note that current models still hit hard limits on cross-game transfer and long-horizon planning. Those observations line up with what people in the field already see in practice. The soft spots are straightforward. The claims stay at the level of organization; there are no new runs, no ablation studies, and no formal argument showing that the pillars actually loosen the stated trade-offs. The roadmap is presented as a logical next step, but it rests on the untested premise that scaling the four pillars in sequence will produce the creator-stage agent. The opening contrast between real-world physics and game multiverses is stated rather than defended with evidence. Readers who want concrete methods or reproducible results will come away empty. This is useful for researchers already working on foundation models for games who need a quick map of the literature and a shared vocabulary for the bottlenecks. It is less useful for anyone looking for a technical advance they can build on directly. The paper deserves a serious referee because the framework is coherent and the field is moving fast enough that a clean synthesis can help coordination, even if the authors will need to add grounding or examples in revision.

Referee Report

1 major / 0 minor

Summary. The manuscript claims that the pursuit of generalist game players has progressed through four eras and can be organized along four interdependent pillars (Dataset, Model, Harness, Benchmark). It argues that every advance in these pillars breaks one of five fundamental trade-offs, and it charts a five-level roadmap ending in a creator stage where agents both create and evolve within game multiverses, thereby supplying a unified lens and principled path toward AGI via foundation models.

Significance. If the proposed organization and roadmap prove useful to the community, the work could help structure ongoing research on foundation-model game agents by clarifying interdependencies and long-term directions. As a high-level conceptual survey without new empirical results, derivations, or quantitative validation of the trade-off mappings, its significance rests on the clarity and adoption of the synthesis rather than on novel technical contributions.

major comments (1)

Abstract and §1: the central claim that 'every advance across these pillars can be read as an attempt to break one of five fundamental trade-offs' is load-bearing for the unified-lens contribution, yet the five trade-offs are never enumerated and no concrete mapping from cited prior works to specific trade-offs is supplied, leaving the 'principled path' as an author-defined taxonomy rather than a substantiated analysis.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The single major comment identifies a clarity issue in how our central claim is presented; we agree this requires revision and will update the manuscript accordingly to strengthen the substantiation of our framework.

read point-by-point responses

Referee: [—] Abstract and §1: the central claim that 'every advance across these pillars can be read as an attempt to break one of five fundamental trade-offs' is load-bearing for the unified-lens contribution, yet the five trade-offs are never enumerated and no concrete mapping from cited prior works to specific trade-offs is supplied, leaving the 'principled path' as an author-defined taxonomy rather than a substantiated analysis.

Authors: We agree that an explicit enumeration of the five trade-offs and direct mappings to prior works would make the load-bearing claim more rigorous and less reliant on reader inference. The manuscript discusses the trade-offs implicitly through the pillar interdependencies and roadmap levels, but we acknowledge they are not listed as a numbered set with concrete examples in §1. In the revision we will (1) add a concise enumerated list of the five trade-offs in the abstract and §1, (2) insert a new table that maps representative works from each pillar (e.g., large-scale pre-training datasets to the data-efficiency vs. coverage trade-off, instruction-tuned models to the specialization vs. generality trade-off) to the specific trade-off each advance targets, and (3) briefly reference these mappings when describing the five-level progression. These additions will be confined to the introductory sections and will not change the paper’s scope or conclusions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; conceptual survey with independent organization

full rationale

The manuscript is a high-level survey that organizes prior game-AI literature into four pillars (Dataset, Model, Harness, Benchmark) and maps progress onto five trade-offs plus a five-level roadmap. No equations, fitted parameters, predictions, or deductive derivations appear in the provided text. The central claim—that the framework supplies a unified lens and principled path—is presented explicitly as an author-defined organizational synthesis rather than a result derived from premises that reduce to the paper's own inputs or self-citations. All references to prior work function as external citations without load-bearing uniqueness theorems or ansatzes imported from the same authors. The opening contrast between single-physics reality and multi-rule games is motivational framing, not a premise requiring proof that loops back on itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that games with differing rules can train general intelligence comparable to real-world adaptability, without independent evidence supplied in the abstract.

axioms (2)

domain assumption The real world unfolds along a single set of physics laws while games have entirely different rules, allowing generalization to test AGI
Invoked in the first sentence to frame human intelligence and the role of games.
ad hoc to paper Every advance across Dataset, Model, Harness, and Benchmark pillars breaks one of five fundamental trade-offs
Stated as the basis for the end-to-end view and roadmap.

pith-pipeline@v0.9.0 · 5600 in / 1432 out tokens · 121699 ms · 2026-05-13T06:37:32.650515+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

228 extracted references · 228 canonical work pages · 23 internal anchors

[1]

Wolf: Werewolf-based observations for llm deception and falsehoods, 2025

Mrinal Agarwal, Saad Rana, Theo Sundoro, Hermela Berhe, Spencer Kim, Vasu Sharma, Sean O’Brien, and Kevin Zhu. Wolf: Werewolf-based observations for llm deception and falsehoods, 2025. URLhttps://arxiv. org/abs/2512.09187

work page arXiv 2025
[2]

LLM -Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

Saaket Agashe, Yue Fan, Anthony Reyna, and Xin Eric Wang. LLM-coordination: Evaluating and analyzing multi-agent coordination abilities in large language models. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 8053–8072, 2025. doi: 10.18653/v1/2025.findings-naacl.448. URLhttps: //aclanthology.org/2025.findings-naacl.448/

work page doi:10.18653/v1/2025.findings-naacl.448 2025
[3]

Flashadventure: A benchmark for gui agents solving full story arcs in diverse adventure games

Jaewoo Ahn, Junseo Kim, Heeseung Yun, Jaehyeon Son, Dongmin Park, Jaewoong Cho, and Gunhee Kim. Flashadventure: A benchmark for gui agents solving full story arcs in diverse adventure games. InEMNLP, 2025

work page 2025
[4]

Storkey, Tim Pearce, and Francois Fleuret

Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos J. Storkey, Tim Pearce, and Francois Fleuret. Diffusion for world modeling: Visual details matter in atari. InAdvances in Neural Information Processing Systems, 2024

work page 2024
[5]

Anderson

John R. Anderson. Acquisition of cognitive skill.Psychological Review, 89:369–406, 1982

work page 1982
[6]

Qwen Technical Report

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng X...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, et al. Qwen3-vl technical report.arXiv:2511.21631, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Qwen2.5-VL Technical Report

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report, 2...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

Werewolf arena: A case study in llm evaluation via social deduction, 2024

Suma Bailis, Jane Friedhoff, and Feiyang Chen. Werewolf arena: A case study in llm evaluation via social deduction, 2024. URLhttps://arxiv.org/abs/2407.13943

work page arXiv 2024
[10]

Video pretraining (VPT): Learning to act by watching unlabeled online videos

Bowen Baker, Ilge Akkaya, Peter Zhokov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, and Jeff Clune. Video pretraining (VPT): Learning to act by watching unlabeled online videos. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022

work page 2022
[11]

Philip J. Ball, Jakob Bauer, Frank Belletti, Bethanie Brownfield, Ariel Ephrat, Shlomi Fruchter, Agrim Gupta, Kristian Holsheimer, Aleksander Holynski, Jiri Hron, Christos Kaplanis, Marjorie Limont, Matt McGill, Yanko Oliveira, Jack Parker-Holder, Frank Perbet, Guy Scully, Jeremy Shar, Stephen Spencer, Omer Tov, Ruben Villegas, Emma Wang, Jessica Yung, Ci...

work page 2025
[12]

M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents.Journal of Artificial Intelligence Research, 47:253–279, jun 2013

work page 2013
[13]

clembench-2024: Achallenging, dynamic, complementary, multilingualbenchmarkandunderlyingflexibleframe- work for llms as multi-action agents, 2024

Anne Beyer, Kranti Chalamalasetti, Sherzod Hakimov, Brielen Madureira, Philipp Sadler, and David Schlangen. clembench-2024: Achallenging, dynamic, complementary, multilingualbenchmarkandunderlyingflexibleframe- work for llms as multi-action agents, 2024. URLhttps://arxiv.org/abs/2405.20859

work page arXiv 2024
[14]

Enhancing vision- language model training with reinforcement learning in synthetic worlds for real-world success, 2025

George Bredis, Stanislav Dereka, Viacheslav Sinii, Ruslan Rakhimov, and Daniil Gavrilov. Enhancing vision- language model training with reinforcement learning in synthetic worlds for real-world success, 2025. URL https://arxiv.org/abs/2508.04280

work page arXiv 2025
[15]

Language mod- els are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, et al. Language mod- els are few-shot learners. InAdvances in Neural Information Processing Systems, volume 33, pages 1877–

work page
[16]

URLhttps://proceedings.neurips.cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

Curran Associates, Inc., 2020. URLhttps://proceedings.neurips.cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

work page 2020
[17]

Genie: Generative interactive environments

Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Hughes, et al. Genie: Generative interactive environments. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProce...

work page 2024
[18]

GROOT: Learning to follow instructionsbywatchinggameplayvideos

Shaofei Cai, Bowei Zhang, Zihao Wang, Xiaojian Ma, Anji Liu, and Yitao Liang. GROOT: Learning to follow instructionsbywatchinggameplayvideos. InThe Twelfth International Conference on Learning Representations,

work page
[19]

URLhttps://openreview.net/forum?id=uleDLeiaT3

work page
[20]

Rocket-1: Mastering open-world interaction with visual-temporal context prompting

Shaofei Cai, Zihao Wang, Kewei Lian, Zhancun Mu, Xiaojian Ma, Anji Liu, and Yitao Liang. Rocket-1: Mastering open-world interaction with visual-temporal context prompting. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12122–12131, 2025

work page 2025
[21]

Deep blue.Artificial Intelligence, 134(1):57–83, 2002

Murray Campbell, A.Joseph Hoane, and Feng hsiung Hsu. Deep blue.Artificial Intelligence, 134(1):57–83, 2002. ISSN 0004-3702. doi: https://doi.org/10.1016/S0004-3702(01)00129-1. URLhttps://www.sciencedirect.com/ science/article/pii/S0004370201001291

work page doi:10.1016/s0004-3702(01)00129-1 2002
[22]

clembench: Using game play to evaluate chat-optimized language models as conversational agents

Kranti Chalamalasetti, Jana Götze, Sherzod Hakimov, Brielen Madureira, Philipp Sadler, and David Schlangen. clembench: Using game play to evaluate chat-optimized language models as conversational agents. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages11174–...

work page doi:10.18653/v1/2023.emnlp-main.689 2023
[23]

Gamegen-x: Interactive open-world 39 game video generation

Haoxuan Che, Xuanhua He, Quande Liu, Cheng Jin, and Hao Chen. Gamegen-x: Interactive open-world 39 game video generation. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=8VG8tpPZhe

work page 2025
[24]

Model as a game: On numerical and spatial consistency for generative games

Jingye Chen, Yuzhong Zhao, Yupan Huang, Lei Cui, Li Dong, Tengchao Lv, Qifeng Chen, and Furu Wei. Model as a game: On numerical and spatial consistency for generative games. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 1958–1967, October 2025

work page 1958
[25]

G1: Bootstrapping perception and reasoning abilities of vision-language model via reinforcement learning.arXiv preprint arXiv:2505.13426,

Liang Chen, Hongcheng Gao, Tianyu Liu, Zhiqi Huang, Flood Sung, Xinyu Zhou, Yuxin Wu, and Baobao Chang. G1: Bootstrapping perception and reasoning abilities of vision-language model via reinforcement learning, 2025. URLhttps://arxiv.org/abs/2505.13426

work page arXiv 2025
[26]

Can vlms play action role-playing games? take black myth wukong as a study case, 2024

Peng Chen, Pi Bu, Jun Song, Yuan Gao, and Bo Zheng. Can vlms play action role-playing games? take black myth wukong as a study case, 2024. URLhttps://arxiv.org/abs/2409.12889

work page arXiv 2024
[27]

Combatvla: An efficient vision- language-action model for combat tasks in 3d action role-playing games

Peng Chen, Pi Bu, Yingyao Wang, Xinyi Wang, Ziming Wang, Jie Guo, et al. Combatvla: An efficient vision- language-action model for combat tasks in 3d action role-playing games. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10919–10928, October 2025

work page 2025
[28]

Agentverse: Fa- cilitating multi-agent collaboration and exploring emergent behaviors

Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, and Jie Zhou. Agentverse: Fa- cilitating multi-agent collaboration and exploring emergent behaviors. InThe Twelfth International Conference on Learning Representations, 2024. ...

work page 2024
[29]

Scaling offline model-based RL via jointly-optimized world-action model pretraining

Jie Cheng, Ruixi Qiao, YINGWEI MA, Binhua Li, Gang Xiong, Qinghai Miao, Yongbin Li, and Yisheng Lv. Scaling offline model-based RL via jointly-optimized world-action model pretraining. InThe Thirteenth Interna- tional Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=T1OvCSFaum

work page 2025
[30]

Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 44:1684 – 1704, 2023

ChengChi, Siyuan Feng, Yilun Du, ZhenjiaXu, Eric Cousineau, BenjaminBurchfiel, andShuran Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 44:1684 – 1704, 2023. URLhttps://api.semanticscholar.org/CorpusID:257378658

work page 2023
[31]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[32]

PrismarineJS/mineflayer: Create Minecraft bots with a powerful, stable, and high- level JavaScript API.https://github.com/PrismarineJS/mineflayer, 2013

PrismarineJS contributors. PrismarineJS/mineflayer: Create Minecraft bots with a powerful, stable, and high- level JavaScript API.https://github.com/PrismarineJS/mineflayer, 2013. GitHub repository

work page 2013
[33]

Beyond intuition and instinct blindness: Toward an evolutionarily rigorous cognitive science.Cognition, 50(1-3):41–77, 1994

Leda Cosmides and John Tooby. Beyond intuition and instinct blindness: Toward an evolutionarily rigorous cognitive science.Cognition, 50(1-3):41–77, 1994

work page 1994
[34]

Gamebench: EvaluatingstrategicreasoningabilitiesofLLMagents

Anthony Costarelli, Mat Allen, Roman Hauksson, Grace Sodunke, Suhas Hariharan, Carlson Cheng, Wenjie Li, JoshuaMClymer, andArjunYadav. Gamebench: EvaluatingstrategicreasoningabilitiesofLLMagents. InLan- guage Gamification - NeurIPS 2024 Workshop, 2024. URLhttps://openreview.net/forum?id=qrzKE533Jr

work page 2024
[35]

Oasis: A universe in a transformer

Decart and Spruce Campbell Xinlei Chen Robert Wachen Julian Quevedo, Quinn McIntyre. Oasis: A universe in a transformer. 2024. URLhttps://oasis-model.github.io/

work page 2024
[36]

Reinforcedlanguagemodelsforsequentialdecisionmaking

JimDilkes, VahidYazdanpanah, andSebastianStein. Reinforcedlanguagemodelsforsequentialdecisionmaking. arXiv preprint arXiv:2508.10839, 2025

work page arXiv 2025
[37]

Minenpc-task: Task suite for memory-aware minecraft agents.arXiv preprint arXiv:2601.05215, 2026

Tamil Sudaravan Mohan Doss, Michael Xu, Sudha Rao, Andrew D Wilson, and Balasaravanan Thoravi Kumar- avel. Minenpc-task: Task suite for memory-aware minecraft agents.arXiv preprint arXiv:2601.05215, 2026

work page arXiv 2026
[38]

Gtbench: Uncovering the strategic reasoning capabilities of llms via game-theoretic evaluations.Advances in Neural Information Processing Systems, 37:28219–28253, 2024

Jinhao Duan, Renming Zhang, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Elias Stengel-Eskin, Mohit Bansal, Tianlong Chen, and Kaidi Xu. Gtbench: Uncovering the strategic reasoning capabilities of llms via game-theoretic evaluations.Advances in Neural Information Processing Systems, 37:28219–28253, 2024

work page 2024
[39]

Rachit Dubey, Pulkit Agrawal, Deepak Pathak, Tom Griffiths, and Alexei A. Efros. Investigating human priors for playing video games. InICML, pages 1348–1356, 2018. URLhttp://proceedings.mlr.press/v80/ dubey18a.html

work page 2018
[40]

Mazeeval: A benchmark for testing sequential decision-making in language models, 2025

Hafsteinn Einarsson. Mazeeval: A benchmark for testing sequential decision-making in language models, 2025. URLhttps://arxiv.org/abs/2507.20395. 40

work page arXiv 2025
[41]

Minedojo: Building open-ended embodied agents with internet- scale knowledge.Advances in Neural Information Processing Systems, 35:18343–18362, 2022

Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, and Anima Anandkumar. Minedojo: Building open-ended embodied agents with internet- scale knowledge.Advances in Neural Information Processing Systems, 35:18343–18362, 2022

work page 2022
[42]

Towards ef- ficient online tuning of VLM agents via counterfactual soft reinforcement learning

Lang Feng, Weihao Tan, Zhiyi Lyu, Longtao Zheng, Haiyang Xu, Ming Yan, Fei Huang, and Bo An. Towards ef- ficient online tuning of VLM agents via counterfactual soft reinforcement learning. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=H76PMm7hf2

work page 2025
[43]

Llama-rider: Spurring large language models to explore the open world

Yicheng Feng, Yuxuan Wang, Jiazheng Liu, Sipeng Zheng, and Zongqing Lu. Llama-rider: Spurring large language models to explore the open world. InFindings of the Association for Computational Linguistics: NAACL 2024, pages 4705–4724, 2024

work page 2024
[44]

Human performance

Paul M Fitts and Michael I Posner. Human performance. 1967

work page 1967
[45]

Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry

John H Flavell. Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry. American psychologist, 34(10):906, 1979

work page 1979
[46]

ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence

ARC Foundation. Arc-agi-3: A new challenge for frontier agentic intelligence.arXiv preprint arXiv:2603.24621, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[47]

Mixture of Masters: Sparse Chess Language Models with Player Routing

Giacomo Frisoni, Lorenzo Molfetta, Davide Freddi, and Gianluca Moro. Mixture of masters: Sparse chess language models with player routing, 2026. URLhttps://arxiv.org/abs/2602.04447

work page internal anchor Pith review Pith/arXiv arXiv 2026
[48]

Vistawise: Building cost-effective agent with cross-modal knowledge graph for minecraft

Honghao Fu, Junlong Ren, Qi Chai, Deheng Ye, Yujun Cai, and Hao Wang. Vistawise: Building cost-effective agent with cross-modal knowledge graph for minecraft. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 21895–21909, 2025

work page 2025
[49]

Structure-mapping: A theoretical framework for analogy.Cognitive science, 7(2):155–170, 1983

Dedre Gentner. Structure-mapping: A theoretical framework for analogy.Cognitive science, 7(2):155–170, 1983

work page 1983
[50]

Mary Gick and Keith J. Holyoak. Schema induction and analogical transfer.Cognitive Psychology, 15:1–38, 1983

work page 1983
[51]

doi: 10.1093/acprof:oso/9780199563029

Gerd Gigerenzer and Daniel Goldstein. Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 62:650–669, 10 1996. doi: 10.1093/acprof:oso/9780199744282.003.0002

work page doi:10.1093/acprof:oso/9780199744282.003.0002 1996
[52]

Visualizing and understanding atari agents

Samuel Greydanus, Anurag Koul, Jonathan Dodge, and Alan Fern. Visualizing and understanding atari agents. InInternational Conference on Machine Learning, pages 1792–1801. PMLR, 2018

work page 2018
[53]

Human-level competitive pokémon via scalable offline reinforcement learning with transformers

Jake Grigsby, Yuqi Xie, Justin Sasek, Steven Zheng, and Yuke Zhu. Human-level competitive pokémon via scalable offline reinforcement learning with transformers. InReinforcement Learning Conference, 2025

work page 2025
[54]

Textarena

Leon Guertler, Bobby Cheng, Simon Yu, Bo Liu, Leshem Choshen, and Cheston Tan. Textarena.arXiv preprint arXiv:2504.11442, 2025

work page arXiv 2025
[55]

Rl unplugged: A suite of benchmarks for offline reinforcement learning

Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, et al. Rl unplugged: A suite of benchmarks for offline reinforcement learning. volume 33, pages 7248–7259, 2020

work page 2020
[56]

Deepseek-r1 incentivizes reasoning in llms through reinforcement learning.Nature, 645(8081):633–638, 2025

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning.Nature, 645(8081):633–638, 2025

work page 2025
[57]

Mineworld: a real-time and open-source interactive world model on minecraft.arXiv preprint arXiv:2504.08388, 2025

Junliang Guo, Yang Ye, Tianyu He, Haoyu Wu, Yushu Jiang, Tim Pearce, and Jiang Bian. Mineworld: a real-time and open-source interactive world model on minecraft.arXiv preprint arXiv:2504.08388, 2025

work page arXiv 2025
[58]

Ctrl-world: A controllable generative world model for robot manipulation

Yanjiang Guo, Lucy Xiaoyang Shi, Jianyu Chen, and Chelsea Finn. Ctrl-world: A controllable generative world model for robot manipulation. InThe Fourteenth International Conference on Learning Representations, 2026

work page 2026
[59]

Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, and Ruslan Salakhutdinov

William H. Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, and Ruslan Salakhutdinov. Minerl: a large-scale dataset of minecraft demonstrations. InProceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI’19, page 2442–2448. AAAI Press, 2019. ISBN 9780999241141

work page 2019
[60]

Recurrent world models facilitate policy evolution

David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. volume 31, 2018

work page 2018
[61]

Benchmarking the spectrum of agent capabilities, 2022

Danijar Hafner. Benchmarking the spectrum of agent capabilities, 2022

work page 2022
[62]

Dream to control: Learning behaviors by latent imagination

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representations, 2020. 41

work page 2020
[63]

Mastering atari with discrete world models

Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models. InInternational Conference on Learning Representations, 2021

work page 2021
[64]

Nature , year=

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse control tasks through world models.Nature, 640:647–653, 2025. doi: 10.1038/s41586-025-08744-2

work page doi:10.1038/s41586-025-08744-2 2025
[65]

The formation of learning sets.Psychological review, 56(1):51, 1949

Harry F Harlow. The formation of learning sets.Psychological review, 56(1):51, 1949

work page 1949
[66]

Interactive fiction games: A colossal adventure

Matthew Hausknecht, Prithviraj Ammanabrolu, Marc-Alexandre Côté, and Xingdi Yuan. Interactive fiction games: A colossal adventure. InProceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7903–7910, 2020

work page 2020
[67]

Matrix-game 2.0: An open-source real-time and streaming interactive world model

Xianglong He, Chunli Peng, Zexiang Liu, Boyang Wang, Yifan Zhang, Qi Cui, et al. Matrix-game 2.0: An open-source, real-time, and streaming interactive world model.arXiv preprint arXiv:2508.13009, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[68]

Plaicraft: Large-scale time-aligned vision-speech-action dataset for embodied ai.arXiv preprint arXiv:2505.12707, 2025

Yingchen He, Christian D Weilbach, Martyna E Wojciechowska, Yuxuan Zhang, and Frank Wood. Plaicraft: Large-scale time-aligned vision-speech-action dataset for embodied ai.arXiv preprint arXiv:2505.12707, 2025

work page arXiv 2025
[69]

Gamearena: Evalu- ating LLM reasoning through live computer games

Lanxiang Hu, Qiyu Li, Anze Xie, Nan Jiang, Ion Stoica, Haojian Jin, and Hao Zhang. Gamearena: Evalu- ating LLM reasoning through live computer games. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[70]

Xing, Ion Stoica, Tajana Rosing, Haojian Jin, and Hao Zhang

Lanxiang Hu, Mingjia Huo, Yuxuan Zhang, Haoyang Yu, Eric P. Xing, Ion Stoica, Tajana Rosing, Haojian Jin, and Hao Zhang. lmgame-bench: How good are LLMs at playing games? InThe Fourteenth International Conference on Learning Representations, 2026

work page 2026
[71]

Wolters-Noordhoff, Groningen, 1938

Johan Huizinga.Homo Ludens: A Study of the Play-Element in Culture. Wolters-Noordhoff, Groningen, 1938. English translation: Routledge & Kegan Paul, London, 1949

work page 1938
[72]

BC-z: Zero-shot task generalization with robotic imitation learning

EricJang, Alex Irpan, MohiKhansari, DanielKappler, FrederikEbert, Corey Lynch, Sergey Levine, and Chelsea Finn. BC-z: Zero-shot task generalization with robotic imitation learning. In5th Annual Conference on Robot Learning, 2021

work page 2021
[73]

The malmo platform for artificial in- telligence experimentation

Matthew Johnson, Katja Hofmann, Tim Hutton, and David Bignell. The malmo platform for artificial in- telligence experimentation. InProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, page 4246–4247. AAAI Press, 2016. ISBN 9781577357704

work page 2016
[74]

MIT Press, Cambridge, MA,

Jesper Juul.Half-Real: Video Games between Real Rules and Fictional Worlds. MIT Press, Cambridge, MA,

work page
[75]

ISBN 978-0-262-10110-3

work page
[76]

Campbell, Konrad Czechowski, DumitruErhan, ChelseaFinn, PiotrKozakowski, SergeyLevine, AfrozMohiuddin, RyanSepassi, GeorgeTucker, and Henryk Michalewski

Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Campbell, Konrad Czechowski, DumitruErhan, ChelseaFinn, PiotrKozakowski, SergeyLevine, AfrozMohiuddin, RyanSepassi, GeorgeTucker, and Henryk Michalewski. Model-based reinforcement learning for atari. InInternational Conference on Learning Representations, 2020

work page 2020
[77]

Pokéchamp: an expert-level minimax language agent, 2025

Seth Karten, Andy Luu Nguyen, and Chi Jin. Pokéchamp: an expert-level minimax language agent, 2025

work page 2025
[78]

The pokeagent challenge: Competitive and long-context learning at scale.arXiv preprint arXiv:2603.15563, 2026

Seth Karten, Jake Grigsby, Tersoo Upaa Jr, Junik Bae, Seonghun Hong, Hyunyoung Jeong, Jaeyoon Jung, Kun Kerdthaisong, Gyungbo Kim, Hyeokgi Kim, et al. The pokeagent challenge: Competitive and long-context learning at scale.arXiv preprint arXiv:2603.15563, 2026

work page arXiv 2026
[79]

OpenVLA: An Open-Source Vision-Language-Action Model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. Openvla: An open-source vision-language-action model, 2024. URLhttps://arxiv.org/abs/2406.09246

work page internal anchor Pith review Pith/arXiv arXiv 2024
[80]

Bridging symbolic control and neural reasoning in llm agents: The structured cognitive loop

Myung Ho Kim. Bridging symbolic control and neural reasoning in llm agents: The structured cognitive loop. arXiv preprint arXiv:2511.17673, 2025

work page arXiv 2025

Showing first 80 references.