pith. machine review for the scientific record. sign in

arxiv: 2605.09965 · v2 · submitted 2026-05-11 · 💻 cs.CV

Recognition: no theorem link

Towards Generalist Game Players: An Investigation of Foundation Models in the Game Multiverse

Authors on Pith no claims yet

Pith reviewed 2026-05-13 06:37 UTC · model grok-4.3

classification 💻 cs.CV
keywords foundation modelsgeneralist agentsgame AImultitask learningreinforcement learninggame environmentsAGI
0
0 comments X

The pith

Foundation models can be trained across games with different rules to become generalist players that master any challenge and eventually create new worlds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper maps how foundation models are being used to build agents that play many different games. It contrasts the single set of physics in the real world with the multiverse of games that have entirely different rules, aesthetics, and goals, arguing this variety makes games the ideal training ground for AGI-level generalization. The work organizes the entire lifecycle of such agents into four pillars—Dataset, Model, Harness, and Benchmark—and shows how advances in these pillars break five key trade-offs. It then lays out a five-level roadmap that starts with single-game mastery and ends at a creator stage where the agent invents new game environments and evolves inside them.

Core claim

By treating the full range of games as a multiverse with varying rules, foundation models can serve as generalist players whose development follows four interdependent pillars of Dataset, Model, Harness, and Benchmark. Progress consists of breaking five fundamental trade-offs that currently limit the system. This produces a clear five-level path from mastering one game to simultaneously creating and adapting within an expanding theoretical game multiverse, which the authors present as a concrete route to AGI.

What carries the argument

The four pillars—Dataset, Model, Harness, and Benchmark—that together define the lifecycle of a generalist game player and provide the means to break five fundamental trade-offs.

If this is right

  • Agents advance through five defined levels, reaching the point where they both create new game worlds and continually adapt inside them.
  • Any improvement in one of the four pillars directly reduces one of the five binding trade-offs for the whole system.
  • Foundation models become the central component that enables seamless mastery across games governed by unrelated rules and objectives.
  • The same training process that produces generalist players also supplies the data and mechanisms needed for the final creator stage.
  • The game multiverse serves as the complete training and evaluation ground for AGI-level generalization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Skills learned in the game multiverse could transfer to real-world tasks that require rapid adaptation to novel constraints.
  • The creator stage implies agents could generate entirely new environments with custom physics and win conditions for further self-improvement.
  • Future benchmarks would need to measure zero-shot performance on games whose rule sets are generated on the fly rather than drawn from existing libraries.
  • The roadmap could be tested by checking whether models at higher levels show faster adaptation when the underlying game engine is swapped for a different one.

Load-bearing premise

That experiences gathered from games with entirely different rules can produce the same kind of omni-reality adaptability that humans show when they move from one set of real-world physics to many invented game worlds.

What would settle it

Train a foundation model on a broad collection of games and then test whether it can play a completely new game whose rules, physics, and objectives were never encountered during training, without any additional fine-tuning or data.

read the original abstract

The real world unfolds along a single set of physics laws, yet human intelligence demonstrates a remarkable capacity to generalize experiences from this singular physical existence into a multiverse of games, each governed by entirely different rules, aesthetics, physics, and objectives. This omni-reality adaptability is a hallmark of general intelligence. As Artificial Intelligence progresses towards Artificial General Intelligence, the multiverse of games has evolved from mere entertainment into the ultimate ground for training and evaluating AGI. The pursuit of this generality has unfolded across four eras: from environment-specific symbolic and reinforcement learning agents, to current large foundation models acting as generalist players, and toward a future creator stage where agent both creates new game worlds and continually evolves within them. We trace the full lifecycle of a generalist game player along four interdependent pillars: Dataset, Model, Harness, and Benchmark. Every advance across these pillars can be read as an attempt to break one of five fundamental trade-offs that currently bound the whole system. Building on this end-to-end view, we chart a five-level roadmap, progressing from single-game mastery to the ultimate creator stage in which the agent simultaneously creates and evolves within theoretical game multiverse. Taken together, our work offers a unified lens onto a rapidly shifting field,and a principled path toward the omnipotent generalist agent capable of seamlessly mastering any challenge within the multiverse of games, thereby paving the way for AGI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript claims that the pursuit of generalist game players has progressed through four eras and can be organized along four interdependent pillars (Dataset, Model, Harness, Benchmark). It argues that every advance in these pillars breaks one of five fundamental trade-offs, and it charts a five-level roadmap ending in a creator stage where agents both create and evolve within game multiverses, thereby supplying a unified lens and principled path toward AGI via foundation models.

Significance. If the proposed organization and roadmap prove useful to the community, the work could help structure ongoing research on foundation-model game agents by clarifying interdependencies and long-term directions. As a high-level conceptual survey without new empirical results, derivations, or quantitative validation of the trade-off mappings, its significance rests on the clarity and adoption of the synthesis rather than on novel technical contributions.

major comments (1)
  1. Abstract and §1: the central claim that 'every advance across these pillars can be read as an attempt to break one of five fundamental trade-offs' is load-bearing for the unified-lens contribution, yet the five trade-offs are never enumerated and no concrete mapping from cited prior works to specific trade-offs is supplied, leaving the 'principled path' as an author-defined taxonomy rather than a substantiated analysis.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The single major comment identifies a clarity issue in how our central claim is presented; we agree this requires revision and will update the manuscript accordingly to strengthen the substantiation of our framework.

read point-by-point responses
  1. Referee: [—] Abstract and §1: the central claim that 'every advance across these pillars can be read as an attempt to break one of five fundamental trade-offs' is load-bearing for the unified-lens contribution, yet the five trade-offs are never enumerated and no concrete mapping from cited prior works to specific trade-offs is supplied, leaving the 'principled path' as an author-defined taxonomy rather than a substantiated analysis.

    Authors: We agree that an explicit enumeration of the five trade-offs and direct mappings to prior works would make the load-bearing claim more rigorous and less reliant on reader inference. The manuscript discusses the trade-offs implicitly through the pillar interdependencies and roadmap levels, but we acknowledge they are not listed as a numbered set with concrete examples in §1. In the revision we will (1) add a concise enumerated list of the five trade-offs in the abstract and §1, (2) insert a new table that maps representative works from each pillar (e.g., large-scale pre-training datasets to the data-efficiency vs. coverage trade-off, instruction-tuned models to the specialization vs. generality trade-off) to the specific trade-off each advance targets, and (3) briefly reference these mappings when describing the five-level progression. These additions will be confined to the introductory sections and will not change the paper’s scope or conclusions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; conceptual survey with independent organization

full rationale

The manuscript is a high-level survey that organizes prior game-AI literature into four pillars (Dataset, Model, Harness, Benchmark) and maps progress onto five trade-offs plus a five-level roadmap. No equations, fitted parameters, predictions, or deductive derivations appear in the provided text. The central claim—that the framework supplies a unified lens and principled path—is presented explicitly as an author-defined organizational synthesis rather than a result derived from premises that reduce to the paper's own inputs or self-citations. All references to prior work function as external citations without load-bearing uniqueness theorems or ansatzes imported from the same authors. The opening contrast between single-physics reality and multi-rule games is motivational framing, not a premise requiring proof that loops back on itself.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that games with differing rules can train general intelligence comparable to real-world adaptability, without independent evidence supplied in the abstract.

axioms (2)
  • domain assumption The real world unfolds along a single set of physics laws while games have entirely different rules, allowing generalization to test AGI
    Invoked in the first sentence to frame human intelligence and the role of games.
  • ad hoc to paper Every advance across Dataset, Model, Harness, and Benchmark pillars breaks one of five fundamental trade-offs
    Stated as the basis for the end-to-end view and roadmap.

pith-pipeline@v0.9.0 · 5600 in / 1432 out tokens · 121699 ms · 2026-05-13T06:37:32.650515+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

228 extracted references · 228 canonical work pages · 23 internal anchors

  1. [1]

    Wolf: Werewolf-based observations for llm deception and falsehoods, 2025

    Mrinal Agarwal, Saad Rana, Theo Sundoro, Hermela Berhe, Spencer Kim, Vasu Sharma, Sean O’Brien, and Kevin Zhu. Wolf: Werewolf-based observations for llm deception and falsehoods, 2025. URLhttps://arxiv. org/abs/2512.09187

  2. [2]

    LLM -Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

    Saaket Agashe, Yue Fan, Anthony Reyna, and Xin Eric Wang. LLM-coordination: Evaluating and analyzing multi-agent coordination abilities in large language models. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 8053–8072, 2025. doi: 10.18653/v1/2025.findings-naacl.448. URLhttps: //aclanthology.org/2025.findings-naacl.448/

  3. [3]

    Flashadventure: A benchmark for gui agents solving full story arcs in diverse adventure games

    Jaewoo Ahn, Junseo Kim, Heeseung Yun, Jaehyeon Son, Dongmin Park, Jaewoong Cho, and Gunhee Kim. Flashadventure: A benchmark for gui agents solving full story arcs in diverse adventure games. InEMNLP, 2025

  4. [4]

    Storkey, Tim Pearce, and Francois Fleuret

    Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos J. Storkey, Tim Pearce, and Francois Fleuret. Diffusion for world modeling: Visual details matter in atari. InAdvances in Neural Information Processing Systems, 2024

  5. [5]

    Anderson

    John R. Anderson. Acquisition of cognitive skill.Psychological Review, 89:369–406, 1982

  6. [6]

    Qwen Technical Report

    Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng X...

  7. [7]

    Qwen3-VL Technical Report

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, et al. Qwen3-vl technical report.arXiv:2511.21631, 2025

  8. [8]

    Qwen2.5-VL Technical Report

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report, 2...

  9. [9]

    Werewolf arena: A case study in llm evaluation via social deduction, 2024

    Suma Bailis, Jane Friedhoff, and Feiyang Chen. Werewolf arena: A case study in llm evaluation via social deduction, 2024. URLhttps://arxiv.org/abs/2407.13943

  10. [10]

    Video pretraining (VPT): Learning to act by watching unlabeled online videos

    Bowen Baker, Ilge Akkaya, Peter Zhokov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, and Jeff Clune. Video pretraining (VPT): Learning to act by watching unlabeled online videos. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022

  11. [11]

    Philip J. Ball, Jakob Bauer, Frank Belletti, Bethanie Brownfield, Ariel Ephrat, Shlomi Fruchter, Agrim Gupta, Kristian Holsheimer, Aleksander Holynski, Jiri Hron, Christos Kaplanis, Marjorie Limont, Matt McGill, Yanko Oliveira, Jack Parker-Holder, Frank Perbet, Guy Scully, Jeremy Shar, Stephen Spencer, Omer Tov, Ruben Villegas, Emma Wang, Jessica Yung, Ci...

  12. [12]

    M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents.Journal of Artificial Intelligence Research, 47:253–279, jun 2013

  13. [13]

    clembench-2024: Achallenging, dynamic, complementary, multilingualbenchmarkandunderlyingflexibleframe- work for llms as multi-action agents, 2024

    Anne Beyer, Kranti Chalamalasetti, Sherzod Hakimov, Brielen Madureira, Philipp Sadler, and David Schlangen. clembench-2024: Achallenging, dynamic, complementary, multilingualbenchmarkandunderlyingflexibleframe- work for llms as multi-action agents, 2024. URLhttps://arxiv.org/abs/2405.20859

  14. [14]

    Enhancing vision- language model training with reinforcement learning in synthetic worlds for real-world success, 2025

    George Bredis, Stanislav Dereka, Viacheslav Sinii, Ruslan Rakhimov, and Daniil Gavrilov. Enhancing vision- language model training with reinforcement learning in synthetic worlds for real-world success, 2025. URL https://arxiv.org/abs/2508.04280

  15. [15]

    Language mod- els are few-shot learners

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, et al. Language mod- els are few-shot learners. InAdvances in Neural Information Processing Systems, volume 33, pages 1877–

  16. [16]

    URLhttps://proceedings.neurips.cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

    Curran Associates, Inc., 2020. URLhttps://proceedings.neurips.cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

  17. [17]

    Genie: Generative interactive environments

    Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Hughes, et al. Genie: Generative interactive environments. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProce...

  18. [18]

    GROOT: Learning to follow instructionsbywatchinggameplayvideos

    Shaofei Cai, Bowei Zhang, Zihao Wang, Xiaojian Ma, Anji Liu, and Yitao Liang. GROOT: Learning to follow instructionsbywatchinggameplayvideos. InThe Twelfth International Conference on Learning Representations,

  19. [19]

    URLhttps://openreview.net/forum?id=uleDLeiaT3

  20. [20]

    Rocket-1: Mastering open-world interaction with visual-temporal context prompting

    Shaofei Cai, Zihao Wang, Kewei Lian, Zhancun Mu, Xiaojian Ma, Anji Liu, and Yitao Liang. Rocket-1: Mastering open-world interaction with visual-temporal context prompting. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12122–12131, 2025

  21. [21]

    Deep blue.Artificial Intelligence, 134(1):57–83, 2002

    Murray Campbell, A.Joseph Hoane, and Feng hsiung Hsu. Deep blue.Artificial Intelligence, 134(1):57–83, 2002. ISSN 0004-3702. doi: https://doi.org/10.1016/S0004-3702(01)00129-1. URLhttps://www.sciencedirect.com/ science/article/pii/S0004370201001291

  22. [22]

    clembench: Using game play to evaluate chat-optimized language models as conversational agents

    Kranti Chalamalasetti, Jana Götze, Sherzod Hakimov, Brielen Madureira, Philipp Sadler, and David Schlangen. clembench: Using game play to evaluate chat-optimized language models as conversational agents. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages11174–...

  23. [23]

    Gamegen-x: Interactive open-world 39 game video generation

    Haoxuan Che, Xuanhua He, Quande Liu, Cheng Jin, and Hao Chen. Gamegen-x: Interactive open-world 39 game video generation. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=8VG8tpPZhe

  24. [24]

    Model as a game: On numerical and spatial consistency for generative games

    Jingye Chen, Yuzhong Zhao, Yupan Huang, Lei Cui, Li Dong, Tengchao Lv, Qifeng Chen, and Furu Wei. Model as a game: On numerical and spatial consistency for generative games. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 1958–1967, October 2025

  25. [25]

    G1: Bootstrapping perception and reasoning abilities of vision-language model via reinforcement learning.arXiv preprint arXiv:2505.13426,

    Liang Chen, Hongcheng Gao, Tianyu Liu, Zhiqi Huang, Flood Sung, Xinyu Zhou, Yuxin Wu, and Baobao Chang. G1: Bootstrapping perception and reasoning abilities of vision-language model via reinforcement learning, 2025. URLhttps://arxiv.org/abs/2505.13426

  26. [26]

    Can vlms play action role-playing games? take black myth wukong as a study case, 2024

    Peng Chen, Pi Bu, Jun Song, Yuan Gao, and Bo Zheng. Can vlms play action role-playing games? take black myth wukong as a study case, 2024. URLhttps://arxiv.org/abs/2409.12889

  27. [27]

    Combatvla: An efficient vision- language-action model for combat tasks in 3d action role-playing games

    Peng Chen, Pi Bu, Yingyao Wang, Xinyi Wang, Ziming Wang, Jie Guo, et al. Combatvla: An efficient vision- language-action model for combat tasks in 3d action role-playing games. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10919–10928, October 2025

  28. [28]

    Agentverse: Fa- cilitating multi-agent collaboration and exploring emergent behaviors

    Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, and Jie Zhou. Agentverse: Fa- cilitating multi-agent collaboration and exploring emergent behaviors. InThe Twelfth International Conference on Learning Representations, 2024. ...

  29. [29]

    Scaling offline model-based RL via jointly-optimized world-action model pretraining

    Jie Cheng, Ruixi Qiao, YINGWEI MA, Binhua Li, Gang Xiong, Qinghai Miao, Yongbin Li, and Yisheng Lv. Scaling offline model-based RL via jointly-optimized world-action model pretraining. InThe Thirteenth Interna- tional Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=T1OvCSFaum

  30. [30]

    Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 44:1684 – 1704, 2023

    ChengChi, Siyuan Feng, Yilun Du, ZhenjiaXu, Eric Cousineau, BenjaminBurchfiel, andShuran Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 44:1684 – 1704, 2023. URLhttps://api.semanticscholar.org/CorpusID:257378658

  31. [31]

    Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025

  32. [32]

    PrismarineJS/mineflayer: Create Minecraft bots with a powerful, stable, and high- level JavaScript API.https://github.com/PrismarineJS/mineflayer, 2013

    PrismarineJS contributors. PrismarineJS/mineflayer: Create Minecraft bots with a powerful, stable, and high- level JavaScript API.https://github.com/PrismarineJS/mineflayer, 2013. GitHub repository

  33. [33]

    Beyond intuition and instinct blindness: Toward an evolutionarily rigorous cognitive science.Cognition, 50(1-3):41–77, 1994

    Leda Cosmides and John Tooby. Beyond intuition and instinct blindness: Toward an evolutionarily rigorous cognitive science.Cognition, 50(1-3):41–77, 1994

  34. [34]

    Gamebench: EvaluatingstrategicreasoningabilitiesofLLMagents

    Anthony Costarelli, Mat Allen, Roman Hauksson, Grace Sodunke, Suhas Hariharan, Carlson Cheng, Wenjie Li, JoshuaMClymer, andArjunYadav. Gamebench: EvaluatingstrategicreasoningabilitiesofLLMagents. InLan- guage Gamification - NeurIPS 2024 Workshop, 2024. URLhttps://openreview.net/forum?id=qrzKE533Jr

  35. [35]

    Oasis: A universe in a transformer

    Decart and Spruce Campbell Xinlei Chen Robert Wachen Julian Quevedo, Quinn McIntyre. Oasis: A universe in a transformer. 2024. URLhttps://oasis-model.github.io/

  36. [36]

    Reinforcedlanguagemodelsforsequentialdecisionmaking

    JimDilkes, VahidYazdanpanah, andSebastianStein. Reinforcedlanguagemodelsforsequentialdecisionmaking. arXiv preprint arXiv:2508.10839, 2025

  37. [37]

    Minenpc-task: Task suite for memory-aware minecraft agents.arXiv preprint arXiv:2601.05215, 2026

    Tamil Sudaravan Mohan Doss, Michael Xu, Sudha Rao, Andrew D Wilson, and Balasaravanan Thoravi Kumar- avel. Minenpc-task: Task suite for memory-aware minecraft agents.arXiv preprint arXiv:2601.05215, 2026

  38. [38]

    Gtbench: Uncovering the strategic reasoning capabilities of llms via game-theoretic evaluations.Advances in Neural Information Processing Systems, 37:28219–28253, 2024

    Jinhao Duan, Renming Zhang, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Elias Stengel-Eskin, Mohit Bansal, Tianlong Chen, and Kaidi Xu. Gtbench: Uncovering the strategic reasoning capabilities of llms via game-theoretic evaluations.Advances in Neural Information Processing Systems, 37:28219–28253, 2024

  39. [39]

    Rachit Dubey, Pulkit Agrawal, Deepak Pathak, Tom Griffiths, and Alexei A. Efros. Investigating human priors for playing video games. InICML, pages 1348–1356, 2018. URLhttp://proceedings.mlr.press/v80/ dubey18a.html

  40. [40]

    Mazeeval: A benchmark for testing sequential decision-making in language models, 2025

    Hafsteinn Einarsson. Mazeeval: A benchmark for testing sequential decision-making in language models, 2025. URLhttps://arxiv.org/abs/2507.20395. 40

  41. [41]

    Minedojo: Building open-ended embodied agents with internet- scale knowledge.Advances in Neural Information Processing Systems, 35:18343–18362, 2022

    Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, and Anima Anandkumar. Minedojo: Building open-ended embodied agents with internet- scale knowledge.Advances in Neural Information Processing Systems, 35:18343–18362, 2022

  42. [42]

    Towards ef- ficient online tuning of VLM agents via counterfactual soft reinforcement learning

    Lang Feng, Weihao Tan, Zhiyi Lyu, Longtao Zheng, Haiyang Xu, Ming Yan, Fei Huang, and Bo An. Towards ef- ficient online tuning of VLM agents via counterfactual soft reinforcement learning. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=H76PMm7hf2

  43. [43]

    Llama-rider: Spurring large language models to explore the open world

    Yicheng Feng, Yuxuan Wang, Jiazheng Liu, Sipeng Zheng, and Zongqing Lu. Llama-rider: Spurring large language models to explore the open world. InFindings of the Association for Computational Linguistics: NAACL 2024, pages 4705–4724, 2024

  44. [44]

    Human performance

    Paul M Fitts and Michael I Posner. Human performance. 1967

  45. [45]

    Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry

    John H Flavell. Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry. American psychologist, 34(10):906, 1979

  46. [46]

    ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence

    ARC Foundation. Arc-agi-3: A new challenge for frontier agentic intelligence.arXiv preprint arXiv:2603.24621, 2026

  47. [47]

    Mixture of Masters: Sparse Chess Language Models with Player Routing

    Giacomo Frisoni, Lorenzo Molfetta, Davide Freddi, and Gianluca Moro. Mixture of masters: Sparse chess language models with player routing, 2026. URLhttps://arxiv.org/abs/2602.04447

  48. [48]

    Vistawise: Building cost-effective agent with cross-modal knowledge graph for minecraft

    Honghao Fu, Junlong Ren, Qi Chai, Deheng Ye, Yujun Cai, and Hao Wang. Vistawise: Building cost-effective agent with cross-modal knowledge graph for minecraft. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 21895–21909, 2025

  49. [49]

    Structure-mapping: A theoretical framework for analogy.Cognitive science, 7(2):155–170, 1983

    Dedre Gentner. Structure-mapping: A theoretical framework for analogy.Cognitive science, 7(2):155–170, 1983

  50. [50]

    Mary Gick and Keith J. Holyoak. Schema induction and analogical transfer.Cognitive Psychology, 15:1–38, 1983

  51. [51]

    doi: 10.1093/acprof:oso/9780199563029

    Gerd Gigerenzer and Daniel Goldstein. Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 62:650–669, 10 1996. doi: 10.1093/acprof:oso/9780199744282.003.0002

  52. [52]

    Visualizing and understanding atari agents

    Samuel Greydanus, Anurag Koul, Jonathan Dodge, and Alan Fern. Visualizing and understanding atari agents. InInternational Conference on Machine Learning, pages 1792–1801. PMLR, 2018

  53. [53]

    Human-level competitive pokémon via scalable offline reinforcement learning with transformers

    Jake Grigsby, Yuqi Xie, Justin Sasek, Steven Zheng, and Yuke Zhu. Human-level competitive pokémon via scalable offline reinforcement learning with transformers. InReinforcement Learning Conference, 2025

  54. [54]

    Textarena

    Leon Guertler, Bobby Cheng, Simon Yu, Bo Liu, Leshem Choshen, and Cheston Tan. Textarena.arXiv preprint arXiv:2504.11442, 2025

  55. [55]

    Rl unplugged: A suite of benchmarks for offline reinforcement learning

    Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, et al. Rl unplugged: A suite of benchmarks for offline reinforcement learning. volume 33, pages 7248–7259, 2020

  56. [56]

    Deepseek-r1 incentivizes reasoning in llms through reinforcement learning.Nature, 645(8081):633–638, 2025

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning.Nature, 645(8081):633–638, 2025

  57. [57]

    Mineworld: a real-time and open-source interactive world model on minecraft.arXiv preprint arXiv:2504.08388, 2025

    Junliang Guo, Yang Ye, Tianyu He, Haoyu Wu, Yushu Jiang, Tim Pearce, and Jiang Bian. Mineworld: a real-time and open-source interactive world model on minecraft.arXiv preprint arXiv:2504.08388, 2025

  58. [58]

    Ctrl-world: A controllable generative world model for robot manipulation

    Yanjiang Guo, Lucy Xiaoyang Shi, Jianyu Chen, and Chelsea Finn. Ctrl-world: A controllable generative world model for robot manipulation. InThe Fourteenth International Conference on Learning Representations, 2026

  59. [59]

    Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, and Ruslan Salakhutdinov

    William H. Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, and Ruslan Salakhutdinov. Minerl: a large-scale dataset of minecraft demonstrations. InProceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI’19, page 2442–2448. AAAI Press, 2019. ISBN 9780999241141

  60. [60]

    Recurrent world models facilitate policy evolution

    David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. volume 31, 2018

  61. [61]

    Benchmarking the spectrum of agent capabilities, 2022

    Danijar Hafner. Benchmarking the spectrum of agent capabilities, 2022

  62. [62]

    Dream to control: Learning behaviors by latent imagination

    Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representations, 2020. 41

  63. [63]

    Mastering atari with discrete world models

    Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models. InInternational Conference on Learning Representations, 2021

  64. [64]

    Nature , year=

    Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse control tasks through world models.Nature, 640:647–653, 2025. doi: 10.1038/s41586-025-08744-2

  65. [65]

    The formation of learning sets.Psychological review, 56(1):51, 1949

    Harry F Harlow. The formation of learning sets.Psychological review, 56(1):51, 1949

  66. [66]

    Interactive fiction games: A colossal adventure

    Matthew Hausknecht, Prithviraj Ammanabrolu, Marc-Alexandre Côté, and Xingdi Yuan. Interactive fiction games: A colossal adventure. InProceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7903–7910, 2020

  67. [67]

    Matrix-game 2.0: An open-source real-time and streaming interactive world model

    Xianglong He, Chunli Peng, Zexiang Liu, Boyang Wang, Yifan Zhang, Qi Cui, et al. Matrix-game 2.0: An open-source, real-time, and streaming interactive world model.arXiv preprint arXiv:2508.13009, 2025

  68. [68]

    Plaicraft: Large-scale time-aligned vision-speech-action dataset for embodied ai.arXiv preprint arXiv:2505.12707, 2025

    Yingchen He, Christian D Weilbach, Martyna E Wojciechowska, Yuxuan Zhang, and Frank Wood. Plaicraft: Large-scale time-aligned vision-speech-action dataset for embodied ai.arXiv preprint arXiv:2505.12707, 2025

  69. [69]

    Gamearena: Evalu- ating LLM reasoning through live computer games

    Lanxiang Hu, Qiyu Li, Anze Xie, Nan Jiang, Ion Stoica, Haojian Jin, and Hao Zhang. Gamearena: Evalu- ating LLM reasoning through live computer games. InThe Thirteenth International Conference on Learning Representations, 2025

  70. [70]

    Xing, Ion Stoica, Tajana Rosing, Haojian Jin, and Hao Zhang

    Lanxiang Hu, Mingjia Huo, Yuxuan Zhang, Haoyang Yu, Eric P. Xing, Ion Stoica, Tajana Rosing, Haojian Jin, and Hao Zhang. lmgame-bench: How good are LLMs at playing games? InThe Fourteenth International Conference on Learning Representations, 2026

  71. [71]

    Wolters-Noordhoff, Groningen, 1938

    Johan Huizinga.Homo Ludens: A Study of the Play-Element in Culture. Wolters-Noordhoff, Groningen, 1938. English translation: Routledge & Kegan Paul, London, 1949

  72. [72]

    BC-z: Zero-shot task generalization with robotic imitation learning

    EricJang, Alex Irpan, MohiKhansari, DanielKappler, FrederikEbert, Corey Lynch, Sergey Levine, and Chelsea Finn. BC-z: Zero-shot task generalization with robotic imitation learning. In5th Annual Conference on Robot Learning, 2021

  73. [73]

    The malmo platform for artificial in- telligence experimentation

    Matthew Johnson, Katja Hofmann, Tim Hutton, and David Bignell. The malmo platform for artificial in- telligence experimentation. InProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, page 4246–4247. AAAI Press, 2016. ISBN 9781577357704

  74. [74]

    MIT Press, Cambridge, MA,

    Jesper Juul.Half-Real: Video Games between Real Rules and Fictional Worlds. MIT Press, Cambridge, MA,

  75. [75]

    ISBN 978-0-262-10110-3

  76. [76]

    Campbell, Konrad Czechowski, DumitruErhan, ChelseaFinn, PiotrKozakowski, SergeyLevine, AfrozMohiuddin, RyanSepassi, GeorgeTucker, and Henryk Michalewski

    Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Campbell, Konrad Czechowski, DumitruErhan, ChelseaFinn, PiotrKozakowski, SergeyLevine, AfrozMohiuddin, RyanSepassi, GeorgeTucker, and Henryk Michalewski. Model-based reinforcement learning for atari. InInternational Conference on Learning Representations, 2020

  77. [77]

    Pokéchamp: an expert-level minimax language agent, 2025

    Seth Karten, Andy Luu Nguyen, and Chi Jin. Pokéchamp: an expert-level minimax language agent, 2025

  78. [78]

    The pokeagent challenge: Competitive and long-context learning at scale.arXiv preprint arXiv:2603.15563, 2026

    Seth Karten, Jake Grigsby, Tersoo Upaa Jr, Junik Bae, Seonghun Hong, Hyunyoung Jeong, Jaeyoon Jung, Kun Kerdthaisong, Gyungbo Kim, Hyeokgi Kim, et al. The pokeagent challenge: Competitive and long-context learning at scale.arXiv preprint arXiv:2603.15563, 2026

  79. [79]

    OpenVLA: An Open-Source Vision-Language-Action Model

    Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. Openvla: An open-source vision-language-action model, 2024. URLhttps://arxiv.org/abs/2406.09246

  80. [80]

    Bridging symbolic control and neural reasoning in llm agents: The structured cognitive loop

    Myung Ho Kim. Bridging symbolic control and neural reasoning in llm agents: The structured cognitive loop. arXiv preprint arXiv:2511.17673, 2025

Showing first 80 references.