Recognition: no theorem link
Towards Generalist Game Players: An Investigation of Foundation Models in the Game Multiverse
Pith reviewed 2026-05-13 06:37 UTC · model grok-4.3
The pith
Foundation models can be trained across games with different rules to become generalist players that master any challenge and eventually create new worlds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By treating the full range of games as a multiverse with varying rules, foundation models can serve as generalist players whose development follows four interdependent pillars of Dataset, Model, Harness, and Benchmark. Progress consists of breaking five fundamental trade-offs that currently limit the system. This produces a clear five-level path from mastering one game to simultaneously creating and adapting within an expanding theoretical game multiverse, which the authors present as a concrete route to AGI.
What carries the argument
The four pillars—Dataset, Model, Harness, and Benchmark—that together define the lifecycle of a generalist game player and provide the means to break five fundamental trade-offs.
If this is right
- Agents advance through five defined levels, reaching the point where they both create new game worlds and continually adapt inside them.
- Any improvement in one of the four pillars directly reduces one of the five binding trade-offs for the whole system.
- Foundation models become the central component that enables seamless mastery across games governed by unrelated rules and objectives.
- The same training process that produces generalist players also supplies the data and mechanisms needed for the final creator stage.
- The game multiverse serves as the complete training and evaluation ground for AGI-level generalization.
Where Pith is reading between the lines
- Skills learned in the game multiverse could transfer to real-world tasks that require rapid adaptation to novel constraints.
- The creator stage implies agents could generate entirely new environments with custom physics and win conditions for further self-improvement.
- Future benchmarks would need to measure zero-shot performance on games whose rule sets are generated on the fly rather than drawn from existing libraries.
- The roadmap could be tested by checking whether models at higher levels show faster adaptation when the underlying game engine is swapped for a different one.
Load-bearing premise
That experiences gathered from games with entirely different rules can produce the same kind of omni-reality adaptability that humans show when they move from one set of real-world physics to many invented game worlds.
What would settle it
Train a foundation model on a broad collection of games and then test whether it can play a completely new game whose rules, physics, and objectives were never encountered during training, without any additional fine-tuning or data.
read the original abstract
The real world unfolds along a single set of physics laws, yet human intelligence demonstrates a remarkable capacity to generalize experiences from this singular physical existence into a multiverse of games, each governed by entirely different rules, aesthetics, physics, and objectives. This omni-reality adaptability is a hallmark of general intelligence. As Artificial Intelligence progresses towards Artificial General Intelligence, the multiverse of games has evolved from mere entertainment into the ultimate ground for training and evaluating AGI. The pursuit of this generality has unfolded across four eras: from environment-specific symbolic and reinforcement learning agents, to current large foundation models acting as generalist players, and toward a future creator stage where agent both creates new game worlds and continually evolves within them. We trace the full lifecycle of a generalist game player along four interdependent pillars: Dataset, Model, Harness, and Benchmark. Every advance across these pillars can be read as an attempt to break one of five fundamental trade-offs that currently bound the whole system. Building on this end-to-end view, we chart a five-level roadmap, progressing from single-game mastery to the ultimate creator stage in which the agent simultaneously creates and evolves within theoretical game multiverse. Taken together, our work offers a unified lens onto a rapidly shifting field,and a principled path toward the omnipotent generalist agent capable of seamlessly mastering any challenge within the multiverse of games, thereby paving the way for AGI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that the pursuit of generalist game players has progressed through four eras and can be organized along four interdependent pillars (Dataset, Model, Harness, Benchmark). It argues that every advance in these pillars breaks one of five fundamental trade-offs, and it charts a five-level roadmap ending in a creator stage where agents both create and evolve within game multiverses, thereby supplying a unified lens and principled path toward AGI via foundation models.
Significance. If the proposed organization and roadmap prove useful to the community, the work could help structure ongoing research on foundation-model game agents by clarifying interdependencies and long-term directions. As a high-level conceptual survey without new empirical results, derivations, or quantitative validation of the trade-off mappings, its significance rests on the clarity and adoption of the synthesis rather than on novel technical contributions.
major comments (1)
- Abstract and §1: the central claim that 'every advance across these pillars can be read as an attempt to break one of five fundamental trade-offs' is load-bearing for the unified-lens contribution, yet the five trade-offs are never enumerated and no concrete mapping from cited prior works to specific trade-offs is supplied, leaving the 'principled path' as an author-defined taxonomy rather than a substantiated analysis.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The single major comment identifies a clarity issue in how our central claim is presented; we agree this requires revision and will update the manuscript accordingly to strengthen the substantiation of our framework.
read point-by-point responses
-
Referee: [—] Abstract and §1: the central claim that 'every advance across these pillars can be read as an attempt to break one of five fundamental trade-offs' is load-bearing for the unified-lens contribution, yet the five trade-offs are never enumerated and no concrete mapping from cited prior works to specific trade-offs is supplied, leaving the 'principled path' as an author-defined taxonomy rather than a substantiated analysis.
Authors: We agree that an explicit enumeration of the five trade-offs and direct mappings to prior works would make the load-bearing claim more rigorous and less reliant on reader inference. The manuscript discusses the trade-offs implicitly through the pillar interdependencies and roadmap levels, but we acknowledge they are not listed as a numbered set with concrete examples in §1. In the revision we will (1) add a concise enumerated list of the five trade-offs in the abstract and §1, (2) insert a new table that maps representative works from each pillar (e.g., large-scale pre-training datasets to the data-efficiency vs. coverage trade-off, instruction-tuned models to the specialization vs. generality trade-off) to the specific trade-off each advance targets, and (3) briefly reference these mappings when describing the five-level progression. These additions will be confined to the introductory sections and will not change the paper’s scope or conclusions. revision: yes
Circularity Check
No significant circularity; conceptual survey with independent organization
full rationale
The manuscript is a high-level survey that organizes prior game-AI literature into four pillars (Dataset, Model, Harness, Benchmark) and maps progress onto five trade-offs plus a five-level roadmap. No equations, fitted parameters, predictions, or deductive derivations appear in the provided text. The central claim—that the framework supplies a unified lens and principled path—is presented explicitly as an author-defined organizational synthesis rather than a result derived from premises that reduce to the paper's own inputs or self-citations. All references to prior work function as external citations without load-bearing uniqueness theorems or ansatzes imported from the same authors. The opening contrast between single-physics reality and multi-rule games is motivational framing, not a premise requiring proof that loops back on itself.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The real world unfolds along a single set of physics laws while games have entirely different rules, allowing generalization to test AGI
- ad hoc to paper Every advance across Dataset, Model, Harness, and Benchmark pillars breaks one of five fundamental trade-offs
Reference graph
Works this paper leans on
-
[1]
Wolf: Werewolf-based observations for llm deception and falsehoods, 2025
Mrinal Agarwal, Saad Rana, Theo Sundoro, Hermela Berhe, Spencer Kim, Vasu Sharma, Sean O’Brien, and Kevin Zhu. Wolf: Werewolf-based observations for llm deception and falsehoods, 2025. URLhttps://arxiv. org/abs/2512.09187
-
[2]
Saaket Agashe, Yue Fan, Anthony Reyna, and Xin Eric Wang. LLM-coordination: Evaluating and analyzing multi-agent coordination abilities in large language models. InFindings of the Association for Computational Linguistics: NAACL 2025, pages 8053–8072, 2025. doi: 10.18653/v1/2025.findings-naacl.448. URLhttps: //aclanthology.org/2025.findings-naacl.448/
-
[3]
Flashadventure: A benchmark for gui agents solving full story arcs in diverse adventure games
Jaewoo Ahn, Junseo Kim, Heeseung Yun, Jaehyeon Son, Dongmin Park, Jaewoong Cho, and Gunhee Kim. Flashadventure: A benchmark for gui agents solving full story arcs in diverse adventure games. InEMNLP, 2025
work page 2025
-
[4]
Storkey, Tim Pearce, and Francois Fleuret
Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos J. Storkey, Tim Pearce, and Francois Fleuret. Diffusion for world modeling: Visual details matter in atari. InAdvances in Neural Information Processing Systems, 2024
work page 2024
- [5]
-
[6]
Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng X...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[7]
Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, et al. Qwen3-vl technical report.arXiv:2511.21631, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report, 2...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[9]
Werewolf arena: A case study in llm evaluation via social deduction, 2024
Suma Bailis, Jane Friedhoff, and Feiyang Chen. Werewolf arena: A case study in llm evaluation via social deduction, 2024. URLhttps://arxiv.org/abs/2407.13943
-
[10]
Video pretraining (VPT): Learning to act by watching unlabeled online videos
Bowen Baker, Ilge Akkaya, Peter Zhokov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, and Jeff Clune. Video pretraining (VPT): Learning to act by watching unlabeled online videos. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors,Advances in Neural Information Processing Systems, 2022
work page 2022
-
[11]
Philip J. Ball, Jakob Bauer, Frank Belletti, Bethanie Brownfield, Ariel Ephrat, Shlomi Fruchter, Agrim Gupta, Kristian Holsheimer, Aleksander Holynski, Jiri Hron, Christos Kaplanis, Marjorie Limont, Matt McGill, Yanko Oliveira, Jack Parker-Holder, Frank Perbet, Guy Scully, Jeremy Shar, Stephen Spencer, Omer Tov, Ruben Villegas, Emma Wang, Jessica Yung, Ci...
work page 2025
-
[12]
M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents.Journal of Artificial Intelligence Research, 47:253–279, jun 2013
work page 2013
-
[13]
Anne Beyer, Kranti Chalamalasetti, Sherzod Hakimov, Brielen Madureira, Philipp Sadler, and David Schlangen. clembench-2024: Achallenging, dynamic, complementary, multilingualbenchmarkandunderlyingflexibleframe- work for llms as multi-action agents, 2024. URLhttps://arxiv.org/abs/2405.20859
-
[14]
George Bredis, Stanislav Dereka, Viacheslav Sinii, Ruslan Rakhimov, and Daniil Gavrilov. Enhancing vision- language model training with reinforcement learning in synthetic worlds for real-world success, 2025. URL https://arxiv.org/abs/2508.04280
-
[15]
Language mod- els are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, et al. Language mod- els are few-shot learners. InAdvances in Neural Information Processing Systems, volume 33, pages 1877–
-
[16]
Curran Associates, Inc., 2020. URLhttps://proceedings.neurips.cc/paper_files/paper/2020/file/ 1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
work page 2020
-
[17]
Genie: Generative interactive environments
Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Hughes, et al. Genie: Generative interactive environments. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProce...
work page 2024
-
[18]
GROOT: Learning to follow instructionsbywatchinggameplayvideos
Shaofei Cai, Bowei Zhang, Zihao Wang, Xiaojian Ma, Anji Liu, and Yitao Liang. GROOT: Learning to follow instructionsbywatchinggameplayvideos. InThe Twelfth International Conference on Learning Representations,
-
[19]
URLhttps://openreview.net/forum?id=uleDLeiaT3
-
[20]
Rocket-1: Mastering open-world interaction with visual-temporal context prompting
Shaofei Cai, Zihao Wang, Kewei Lian, Zhancun Mu, Xiaojian Ma, Anji Liu, and Yitao Liang. Rocket-1: Mastering open-world interaction with visual-temporal context prompting. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 12122–12131, 2025
work page 2025
-
[21]
Deep blue.Artificial Intelligence, 134(1):57–83, 2002
Murray Campbell, A.Joseph Hoane, and Feng hsiung Hsu. Deep blue.Artificial Intelligence, 134(1):57–83, 2002. ISSN 0004-3702. doi: https://doi.org/10.1016/S0004-3702(01)00129-1. URLhttps://www.sciencedirect.com/ science/article/pii/S0004370201001291
-
[22]
clembench: Using game play to evaluate chat-optimized language models as conversational agents
Kranti Chalamalasetti, Jana Götze, Sherzod Hakimov, Brielen Madureira, Philipp Sadler, and David Schlangen. clembench: Using game play to evaluate chat-optimized language models as conversational agents. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages11174–...
-
[23]
Gamegen-x: Interactive open-world 39 game video generation
Haoxuan Che, Xuanhua He, Quande Liu, Cheng Jin, and Hao Chen. Gamegen-x: Interactive open-world 39 game video generation. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=8VG8tpPZhe
work page 2025
-
[24]
Model as a game: On numerical and spatial consistency for generative games
Jingye Chen, Yuzhong Zhao, Yupan Huang, Lei Cui, Li Dong, Tengchao Lv, Qifeng Chen, and Furu Wei. Model as a game: On numerical and spatial consistency for generative games. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 1958–1967, October 2025
work page 1958
-
[25]
Liang Chen, Hongcheng Gao, Tianyu Liu, Zhiqi Huang, Flood Sung, Xinyu Zhou, Yuxin Wu, and Baobao Chang. G1: Bootstrapping perception and reasoning abilities of vision-language model via reinforcement learning, 2025. URLhttps://arxiv.org/abs/2505.13426
-
[26]
Can vlms play action role-playing games? take black myth wukong as a study case, 2024
Peng Chen, Pi Bu, Jun Song, Yuan Gao, and Bo Zheng. Can vlms play action role-playing games? take black myth wukong as a study case, 2024. URLhttps://arxiv.org/abs/2409.12889
-
[27]
Peng Chen, Pi Bu, Yingyao Wang, Xinyi Wang, Ziming Wang, Jie Guo, et al. Combatvla: An efficient vision- language-action model for combat tasks in 3d action role-playing games. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10919–10928, October 2025
work page 2025
-
[28]
Agentverse: Fa- cilitating multi-agent collaboration and exploring emergent behaviors
Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, and Jie Zhou. Agentverse: Fa- cilitating multi-agent collaboration and exploring emergent behaviors. InThe Twelfth International Conference on Learning Representations, 2024. ...
work page 2024
-
[29]
Scaling offline model-based RL via jointly-optimized world-action model pretraining
Jie Cheng, Ruixi Qiao, YINGWEI MA, Binhua Li, Gang Xiong, Qinghai Miao, Yongbin Li, and Yisheng Lv. Scaling offline model-based RL via jointly-optimized world-action model pretraining. InThe Thirteenth Interna- tional Conference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=T1OvCSFaum
work page 2025
-
[30]
ChengChi, Siyuan Feng, Yilun Du, ZhenjiaXu, Eric Cousineau, BenjaminBurchfiel, andShuran Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 44:1684 – 1704, 2023. URLhttps://api.semanticscholar.org/CorpusID:257378658
work page 2023
-
[31]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[32]
PrismarineJS contributors. PrismarineJS/mineflayer: Create Minecraft bots with a powerful, stable, and high- level JavaScript API.https://github.com/PrismarineJS/mineflayer, 2013. GitHub repository
work page 2013
-
[33]
Leda Cosmides and John Tooby. Beyond intuition and instinct blindness: Toward an evolutionarily rigorous cognitive science.Cognition, 50(1-3):41–77, 1994
work page 1994
-
[34]
Gamebench: EvaluatingstrategicreasoningabilitiesofLLMagents
Anthony Costarelli, Mat Allen, Roman Hauksson, Grace Sodunke, Suhas Hariharan, Carlson Cheng, Wenjie Li, JoshuaMClymer, andArjunYadav. Gamebench: EvaluatingstrategicreasoningabilitiesofLLMagents. InLan- guage Gamification - NeurIPS 2024 Workshop, 2024. URLhttps://openreview.net/forum?id=qrzKE533Jr
work page 2024
-
[35]
Oasis: A universe in a transformer
Decart and Spruce Campbell Xinlei Chen Robert Wachen Julian Quevedo, Quinn McIntyre. Oasis: A universe in a transformer. 2024. URLhttps://oasis-model.github.io/
work page 2024
-
[36]
Reinforcedlanguagemodelsforsequentialdecisionmaking
JimDilkes, VahidYazdanpanah, andSebastianStein. Reinforcedlanguagemodelsforsequentialdecisionmaking. arXiv preprint arXiv:2508.10839, 2025
-
[37]
Minenpc-task: Task suite for memory-aware minecraft agents.arXiv preprint arXiv:2601.05215, 2026
Tamil Sudaravan Mohan Doss, Michael Xu, Sudha Rao, Andrew D Wilson, and Balasaravanan Thoravi Kumar- avel. Minenpc-task: Task suite for memory-aware minecraft agents.arXiv preprint arXiv:2601.05215, 2026
-
[38]
Jinhao Duan, Renming Zhang, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Elias Stengel-Eskin, Mohit Bansal, Tianlong Chen, and Kaidi Xu. Gtbench: Uncovering the strategic reasoning capabilities of llms via game-theoretic evaluations.Advances in Neural Information Processing Systems, 37:28219–28253, 2024
work page 2024
-
[39]
Rachit Dubey, Pulkit Agrawal, Deepak Pathak, Tom Griffiths, and Alexei A. Efros. Investigating human priors for playing video games. InICML, pages 1348–1356, 2018. URLhttp://proceedings.mlr.press/v80/ dubey18a.html
work page 2018
-
[40]
Mazeeval: A benchmark for testing sequential decision-making in language models, 2025
Hafsteinn Einarsson. Mazeeval: A benchmark for testing sequential decision-making in language models, 2025. URLhttps://arxiv.org/abs/2507.20395. 40
-
[41]
Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, and Anima Anandkumar. Minedojo: Building open-ended embodied agents with internet- scale knowledge.Advances in Neural Information Processing Systems, 35:18343–18362, 2022
work page 2022
-
[42]
Towards ef- ficient online tuning of VLM agents via counterfactual soft reinforcement learning
Lang Feng, Weihao Tan, Zhiyi Lyu, Longtao Zheng, Haiyang Xu, Ming Yan, Fei Huang, and Bo An. Towards ef- ficient online tuning of VLM agents via counterfactual soft reinforcement learning. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=H76PMm7hf2
work page 2025
-
[43]
Llama-rider: Spurring large language models to explore the open world
Yicheng Feng, Yuxuan Wang, Jiazheng Liu, Sipeng Zheng, and Zongqing Lu. Llama-rider: Spurring large language models to explore the open world. InFindings of the Association for Computational Linguistics: NAACL 2024, pages 4705–4724, 2024
work page 2024
- [44]
-
[45]
Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry
John H Flavell. Metacognition and cognitive monitoring: A new area of cognitive–developmental inquiry. American psychologist, 34(10):906, 1979
work page 1979
-
[46]
ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence
ARC Foundation. Arc-agi-3: A new challenge for frontier agentic intelligence.arXiv preprint arXiv:2603.24621, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[47]
Mixture of Masters: Sparse Chess Language Models with Player Routing
Giacomo Frisoni, Lorenzo Molfetta, Davide Freddi, and Gianluca Moro. Mixture of masters: Sparse chess language models with player routing, 2026. URLhttps://arxiv.org/abs/2602.04447
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[48]
Vistawise: Building cost-effective agent with cross-modal knowledge graph for minecraft
Honghao Fu, Junlong Ren, Qi Chai, Deheng Ye, Yujun Cai, and Hao Wang. Vistawise: Building cost-effective agent with cross-modal knowledge graph for minecraft. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 21895–21909, 2025
work page 2025
-
[49]
Structure-mapping: A theoretical framework for analogy.Cognitive science, 7(2):155–170, 1983
Dedre Gentner. Structure-mapping: A theoretical framework for analogy.Cognitive science, 7(2):155–170, 1983
work page 1983
-
[50]
Mary Gick and Keith J. Holyoak. Schema induction and analogical transfer.Cognitive Psychology, 15:1–38, 1983
work page 1983
-
[51]
doi: 10.1093/acprof:oso/9780199563029
Gerd Gigerenzer and Daniel Goldstein. Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 62:650–669, 10 1996. doi: 10.1093/acprof:oso/9780199744282.003.0002
work page doi:10.1093/acprof:oso/9780199744282.003.0002 1996
-
[52]
Visualizing and understanding atari agents
Samuel Greydanus, Anurag Koul, Jonathan Dodge, and Alan Fern. Visualizing and understanding atari agents. InInternational Conference on Machine Learning, pages 1792–1801. PMLR, 2018
work page 2018
-
[53]
Human-level competitive pokémon via scalable offline reinforcement learning with transformers
Jake Grigsby, Yuqi Xie, Justin Sasek, Steven Zheng, and Yuke Zhu. Human-level competitive pokémon via scalable offline reinforcement learning with transformers. InReinforcement Learning Conference, 2025
work page 2025
- [54]
-
[55]
Rl unplugged: A suite of benchmarks for offline reinforcement learning
Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, et al. Rl unplugged: A suite of benchmarks for offline reinforcement learning. volume 33, pages 7248–7259, 2020
work page 2020
-
[56]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, et al. Deepseek-r1 incentivizes reasoning in llms through reinforcement learning.Nature, 645(8081):633–638, 2025
work page 2025
-
[57]
Junliang Guo, Yang Ye, Tianyu He, Haoyu Wu, Yushu Jiang, Tim Pearce, and Jiang Bian. Mineworld: a real-time and open-source interactive world model on minecraft.arXiv preprint arXiv:2504.08388, 2025
-
[58]
Ctrl-world: A controllable generative world model for robot manipulation
Yanjiang Guo, Lucy Xiaoyang Shi, Jianyu Chen, and Chelsea Finn. Ctrl-world: A controllable generative world model for robot manipulation. InThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[59]
William H. Guss, Brandon Houghton, Nicholay Topin, Phillip Wang, Cayden Codel, Manuela Veloso, and Ruslan Salakhutdinov. Minerl: a large-scale dataset of minecraft demonstrations. InProceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI’19, page 2442–2448. AAAI Press, 2019. ISBN 9780999241141
work page 2019
-
[60]
Recurrent world models facilitate policy evolution
David Ha and Jürgen Schmidhuber. Recurrent world models facilitate policy evolution. volume 31, 2018
work page 2018
-
[61]
Benchmarking the spectrum of agent capabilities, 2022
Danijar Hafner. Benchmarking the spectrum of agent capabilities, 2022
work page 2022
-
[62]
Dream to control: Learning behaviors by latent imagination
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representations, 2020. 41
work page 2020
-
[63]
Mastering atari with discrete world models
Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models. InInternational Conference on Learning Representations, 2021
work page 2021
-
[64]
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse control tasks through world models.Nature, 640:647–653, 2025. doi: 10.1038/s41586-025-08744-2
-
[65]
The formation of learning sets.Psychological review, 56(1):51, 1949
Harry F Harlow. The formation of learning sets.Psychological review, 56(1):51, 1949
work page 1949
-
[66]
Interactive fiction games: A colossal adventure
Matthew Hausknecht, Prithviraj Ammanabrolu, Marc-Alexandre Côté, and Xingdi Yuan. Interactive fiction games: A colossal adventure. InProceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7903–7910, 2020
work page 2020
-
[67]
Matrix-game 2.0: An open-source real-time and streaming interactive world model
Xianglong He, Chunli Peng, Zexiang Liu, Boyang Wang, Yifan Zhang, Qi Cui, et al. Matrix-game 2.0: An open-source, real-time, and streaming interactive world model.arXiv preprint arXiv:2508.13009, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[68]
Yingchen He, Christian D Weilbach, Martyna E Wojciechowska, Yuxuan Zhang, and Frank Wood. Plaicraft: Large-scale time-aligned vision-speech-action dataset for embodied ai.arXiv preprint arXiv:2505.12707, 2025
-
[69]
Gamearena: Evalu- ating LLM reasoning through live computer games
Lanxiang Hu, Qiyu Li, Anze Xie, Nan Jiang, Ion Stoica, Haojian Jin, and Hao Zhang. Gamearena: Evalu- ating LLM reasoning through live computer games. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[70]
Xing, Ion Stoica, Tajana Rosing, Haojian Jin, and Hao Zhang
Lanxiang Hu, Mingjia Huo, Yuxuan Zhang, Haoyang Yu, Eric P. Xing, Ion Stoica, Tajana Rosing, Haojian Jin, and Hao Zhang. lmgame-bench: How good are LLMs at playing games? InThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[71]
Wolters-Noordhoff, Groningen, 1938
Johan Huizinga.Homo Ludens: A Study of the Play-Element in Culture. Wolters-Noordhoff, Groningen, 1938. English translation: Routledge & Kegan Paul, London, 1949
work page 1938
-
[72]
BC-z: Zero-shot task generalization with robotic imitation learning
EricJang, Alex Irpan, MohiKhansari, DanielKappler, FrederikEbert, Corey Lynch, Sergey Levine, and Chelsea Finn. BC-z: Zero-shot task generalization with robotic imitation learning. In5th Annual Conference on Robot Learning, 2021
work page 2021
-
[73]
The malmo platform for artificial in- telligence experimentation
Matthew Johnson, Katja Hofmann, Tim Hutton, and David Bignell. The malmo platform for artificial in- telligence experimentation. InProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, page 4246–4247. AAAI Press, 2016. ISBN 9781577357704
work page 2016
-
[74]
Jesper Juul.Half-Real: Video Games between Real Rules and Fictional Worlds. MIT Press, Cambridge, MA,
-
[75]
ISBN 978-0-262-10110-3
-
[76]
Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Campbell, Konrad Czechowski, DumitruErhan, ChelseaFinn, PiotrKozakowski, SergeyLevine, AfrozMohiuddin, RyanSepassi, GeorgeTucker, and Henryk Michalewski. Model-based reinforcement learning for atari. InInternational Conference on Learning Representations, 2020
work page 2020
-
[77]
Pokéchamp: an expert-level minimax language agent, 2025
Seth Karten, Andy Luu Nguyen, and Chi Jin. Pokéchamp: an expert-level minimax language agent, 2025
work page 2025
-
[78]
Seth Karten, Jake Grigsby, Tersoo Upaa Jr, Junik Bae, Seonghun Hong, Hyunyoung Jeong, Jaeyoon Jung, Kun Kerdthaisong, Gyungbo Kim, Hyeokgi Kim, et al. The pokeagent challenge: Competitive and long-context learning at scale.arXiv preprint arXiv:2603.15563, 2026
-
[79]
OpenVLA: An Open-Source Vision-Language-Action Model
Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn. Openvla: An open-source vision-language-action model, 2024. URLhttps://arxiv.org/abs/2406.09246
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[80]
Bridging symbolic control and neural reasoning in llm agents: The structured cognitive loop
Myung Ho Kim. Bridging symbolic control and neural reasoning in llm agents: The structured cognitive loop. arXiv preprint arXiv:2511.17673, 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.