arxiv: 2604.16592 · v1 · submitted 2026-04-17 · 💻 cs.RO · cs.AI· cs.CV· cs.ET

Recognition: unknown

Human Cognition in Machines: A Unified Perspective of World Models

Timothy Rupprecht , Pu Zhao , Amir Taherin , Arash Akbari , Arman Akbari , Yumei He , Sean Duffy , Juyi Lin

show 14 more authors

Yixiao Chen Rahul Chowdhury Enfu Nan Yixin Shen Yifan Cao Haochen Zeng Weiwei Chen Geng Yuan Jennifer Dy Sarah Ostadabbas Silvia Zhang David Kaeli Edmund Yeh Yanzhi Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:08 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.CVcs.ET

keywords world modelscognitive architecture theoryunified frameworkmeta-cognitionmotivationepistemic world modelsAI taxonomycognitive functions

0 comments

The pith

A unified framework based on cognitive architecture theory requires world models to incorporate all human cognitive functions including motivation and meta-cognition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish a conceptual unified framework for world models that draws directly from Cognitive Architecture Theory to include the full set of cognitive functions: memory, perception, language, reasoning, imagining, motivation, and meta-cognition. This matters because many existing AI systems assert near human-like capabilities without a shared standard for what those capabilities entail or how to measure completeness. The framework distinguishes prior work by which functions each model addresses, applies a taxonomy across video, embodied, and a newly defined epistemic category, and flags large gaps in motivation and meta-cognition. By doing so it supplies concrete directions for future models that aim at scientific discovery and self-aware behavior.

Core claim

The paper establishes that world models can be unified and evaluated by mapping them onto the complete set of cognitive functions supplied by Cognitive Architecture Theory. Prior models are shown to be partial, with motivation (especially intrinsic motivation) and meta-cognition remaining drastically under-researched. The work introduces epistemic world models as a distinct category for agent frameworks that operate over structured knowledge for scientific discovery. The resulting taxonomy, when applied to video, embodied, and epistemic models, identifies specific gaps and proposes targeted research directions to close them.

What carries the argument

The unified conceptual framework that maps every world model onto the full list of cognitive functions from Cognitive Architecture Theory, using this mapping both to classify existing systems and to expose missing elements.

If this is right

Any world model claiming human-like cognition must be assessed against all seven cognitive functions rather than a subset.
Motivation and meta-cognition constitute the largest and most consequential research gaps that future models must address.
Epistemic world models form a new category that focuses on structured knowledge and scientific discovery tasks.
The taxonomy supplies a classification scheme that can be applied uniformly to video, embodied, and epistemic world models to guide development.
Concrete directions for filling the identified gaps can be pursued to produce more complete agent architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adoption of the framework could create a common evaluation language across different subfields of world-model research.
Models that remain incomplete in motivation and meta-cognition may continue to struggle with sustained autonomous exploration.
The new epistemic category suggests that world-model research could overlap more directly with automated scientific reasoning systems.
Future empirical tests could compare agents built with the full function set against current partial models on discovery-oriented benchmarks.

Load-bearing premise

Cognitive Architecture Theory supplies the complete and correct set of cognitive functions needed to ground and evaluate all world models in AI.

What would settle it

A world model that achieves claimed human-like performance on tasks involving self-reflection, long-term planning, or open-ended discovery while omitting explicit mechanisms for motivation or meta-cognition would falsify the necessity of the full framework.

Figures

Figures reproduced from arXiv: 2604.16592 by Amir Taherin, Arash Akbari, Arman Akbari, David Kaeli, Edmund Yeh, Enfu Nan, Geng Yuan, Haochen Zeng, Jennifer Dy, Juyi Lin, Pu Zhao, Rahul Chowdhury, Sarah Ostadabbas, Sean Duffy, Silvia Zhang, Timothy Rupprecht, Weiwei Chen, Yanzhi Wang, Yifan Cao, Yixiao Chen, Yixin Shen, Yumei He.

**Figure 1.** Figure 1: Our survey studies the convergence of three different but inter-related fields: human cognition, machine cognition, and World Models. 2. We propose a unified World Model as a conceptual road-map for incorporating all the component parts of cognitive architecture for robust world representation and generation. 3. We identify and propose solutions to research gaps in World Model motivation and meta-cogniti… view at source ↗

**Figure 2.** Figure 2: The taxonomy of World Models covered in our survey correspond to the component parts of cognitive architecture theory [123] they innovate most. simulators that 1) represent current world structure and 2) predict future world dynamics [40]. Recent works also survey advances in video World Models [211], embodiment [101], temporal–spatial modeling [110], and physical realism [109], all highlighting challenge… view at source ↗

**Figure 3.** Figure 3: The component-parts of our Unified World Model built from first principles in cognitive architecture theory [123] and meta-cognition [10]. This serves as a conceptual road-map for World Model research. images in a lower-dimension latent space functionally constituting memory, JEPA is innovative in it’s encoded latent space in how they novelly train their model to extend representations of patched images to… view at source ↗

**Figure 4.** Figure 4: Above are the typical architectures encountered when reviewing video World Models [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

**Figure 5.** Figure 5: Above are the typical architectures encountered when reviewing embodied World Models [PITH_FULL_IMAGE:figures/full_fig_p030_5.png] view at source ↗

**Figure 6.** Figure 6: Above are the typical architecture and Global Workspace frameworks encountered when reviewing World Models for scientific discovery used with a human-in-theloop subject-matter expert [PITH_FULL_IMAGE:figures/full_fig_p034_6.png] view at source ↗

read the original abstract

This comprehensive report distinguishes prior works by the cognitive functions they innovate. Many works claim an almost "human-like" cognitive capability in their world models. To evaluate these claims requires a proper grounding in first principles in Cognitive Architecture Theory (CAT). We present a conceptual unified framework for world models that fully incorporates all the cognitive functions associated with CAT (i.e. memory, perception, language, reasoning, imagining, motivation, and meta-cognition) and identify gaps in the research as a guide for future states of the art. In particular, we find that motivation (especially intrinsic motivation) and meta-cognition remain drastically under-researched, and we propose concrete directions informed by active inference and global workspace theory to address them. We further introduce Epistemic World Models, a new category encompassing agent frameworks for scientific discovery that operate over structured knowledge. Our taxonomy, applied across video, embodied, and epistemic world models, suggests research directions where prior taxonomies have not.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a taxonomy paper that maps world models to cognitive functions via CAT and adds an epistemic category while noting gaps in motivation and meta-cognition, but the unified framework claim stays high-level and untested.

read the letter

The paper surveys recent world model work in video prediction, embodied agents, and a proposed new epistemic category for structured knowledge and discovery. It uses Cognitive Architecture Theory to group models by coverage of memory, perception, language, reasoning, imagining, motivation, and meta-cognition, then argues that the last two areas are especially thin and points to active inference and global workspace ideas as ways forward.

Referee Report

1 major / 1 minor

Summary. The manuscript presents a conceptual unified framework for world models in AI, grounded in Cognitive Architecture Theory (CAT). It claims this framework fully incorporates the cognitive functions of memory, perception, language, reasoning, imagining, motivation, and meta-cognition. The work applies a taxonomy across video, embodied, and epistemic world models, identifies research gaps (particularly in motivation and meta-cognition), proposes directions informed by active inference and global workspace theory, and introduces Epistemic World Models as a new category for structured-knowledge discovery agents.

Significance. If the taxonomy and framework are adopted, the paper could offer a structured lens for evaluating world models against human-like cognitive capabilities and highlight under-explored areas such as intrinsic motivation. The introduction of Epistemic World Models provides a novel categorization that may stimulate targeted research in scientific discovery agents. As a high-level synthesis without new derivations, empirical tests, or formal mappings, its significance lies in guiding future conceptual and experimental work rather than providing immediately actionable technical advances.

major comments (1)

[Abstract] Abstract and framework presentation: The central claim that the proposed framework 'fully incorporates' all listed CAT functions rests on the unexamined selection of one specific enumeration of those functions. The manuscript does not justify this choice against competing cognitive architectures (e.g., ACT-R, SOAR, LIDA, Global Workspace) that differ on whether functions such as attention, emotion, or procedural learning are primitive or emergent, nor does it demonstrate a complete mapping without omissions or unaddressed interactions. This assumption is load-bearing for the 'unified' and 'fully incorporates' assertions.

minor comments (1)

The application of the taxonomy to video, embodied, and epistemic categories would benefit from clearer notation or a summary table distinguishing how each CAT function is realized (or not) in representative prior works.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the major comment regarding the justification of our selected cognitive functions and the strength of the 'unified' and 'fully incorporates' claims below.

read point-by-point responses

Referee: The central claim that the proposed framework 'fully incorporates' all listed CAT functions rests on the unexamined selection of one specific enumeration of those functions. The manuscript does not justify this choice against competing cognitive architectures (e.g., ACT-R, SOAR, LIDA, Global Workspace) that differ on whether functions such as attention, emotion, or procedural learning are primitive or emergent, nor does it demonstrate a complete mapping without omissions or unaddressed interactions. This assumption is load-bearing for the 'unified' and 'fully incorporates' assertions.

Authors: We agree that the manuscript would benefit from an explicit justification of the chosen cognitive functions and a clearer qualification of our claims. Our enumeration (memory, perception, language, reasoning, imagining, motivation, and meta-cognition) was selected as a representative synthesis of functions most directly relevant to world modeling in AI, drawing from common elements across CAT literature. However, we acknowledge that the current text does not compare this selection to specific architectures such as ACT-R, SOAR, LIDA, or Global Workspace, nor does it provide a detailed mapping that addresses potential omissions or interactions (e.g., attention or emotion as modulators). In the revised manuscript, we will add a dedicated paragraph in the introduction that (1) motivates the selection by referencing overlaps with major CATs, (2) notes that functions like attention and emotion can emerge from or modulate the core set, and (3) qualifies 'fully incorporates' to mean that the framework supplies structural mechanisms for these functions while recognizing that complete mappings and interaction details remain open for future work. This revision will make the assumptions explicit and reduce the load-bearing nature of the claim. revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual synthesis of external CAT literature

full rationale

The paper offers a high-level taxonomy and gap analysis for world models by mapping established cognitive functions from Cognitive Architecture Theory (CAT) onto AI components. No equations, fitted parameters, or derivations appear; the 'fully incorporates' claim is presented as an organizational synthesis of prior external literature rather than a self-referential reduction. No self-citation chains, ansatzes, or renamings of known results are load-bearing for the central assertions. The framework remains self-contained against external benchmarks and does not force its outputs by construction from its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on CAT as the authoritative grounding for cognitive functions and introduces Epistemic World Models without independent empirical support or falsifiable predictions.

axioms (1)

domain assumption Cognitive Architecture Theory provides the complete and necessary list of cognitive functions for evaluating human-like world models.
The entire unified framework and gap analysis are built directly on this premise.

invented entities (1)

Epistemic World Models no independent evidence
purpose: A new category of agent frameworks for scientific discovery that operate over structured knowledge.
Introduced in the paper as an addition to existing video and embodied world model categories.

pith-pipeline@v0.9.0 · 5550 in / 1299 out tokens · 41120 ms · 2026-05-10T08:08:23.281682+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

PhyGround: Benchmarking Physical Reasoning in Generative World Models
cs.CV 2026-05 accept novelty 7.0

PhyGround is a new benchmark with curated prompts, a 13-law taxonomy, large-scale human annotations, and an open physics-specialized VLM judge for evaluating physical reasoning in generative video models.

Reference graph

Works this paper leans on

234 extracted references · 137 canonical work pages · cited by 1 Pith paper · 28 internal anchors

[1]

preprint (2026), https://research.beingbeyond.com/projects/being-h07/being-h07.pdf

Being-h0.7: A latent world-action model from egocentric videos. preprint (2026), https://research.beingbeyond.com/projects/being-h07/being-h07.pdf

2026
[2]

Cosmos World Foundation Model Platform for Physical AI

Agarwal, N., Ali, A., Bala, M., Balaji, Y., Barker, E., Cai, T., Chattopadhyay, P., Chen, Y., Cui, Y., Ding, Y., et al.: Cosmos world foundation model platform for physical ai. arXiv preprint arXiv:2501.03575 (2025)

work page internal anchor Pith review arXiv 2025
[3]

World Simulation with Video Foundation Models for Physical AI

Ali, A., Bai, J., Bala, M., Balaji, Y., Blakeman, A., Cai, T., Cao, J., Cao, T., Cha, E., Chao, Y.W., et al.: World simulation with video foundation models for physical ai. arXiv preprint arXiv:2511.00062 (2025)

work page internal anchor Pith review arXiv 2025
[4]

Alonso, E., Jelley, A., Micheli, V., Kanervisto, A., Storkey, A., Pearce, T., Fleuret, F.:Diffusionforworldmodeling:Visualdetailsmatterinatari.AdvancesinNeural Information Processing Systems37, 58757–58791 (2024)

2024
[5]

Concrete Problems in AI Safety

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in ai safety. arXiv preprint arXiv:1606.06565 (2016)

work page internal anchor Pith review arXiv 2016
[6]

Human–Computer Interaction12(4), 439–462 (1997)

Anderson, J.R., Matessa, M., Lebiere, C.: Act-r: A theory of higher level cognition and its relation to visual attention. Human–Computer Interaction12(4), 439–462 (1997)

1997
[7]

Sensors (Basel, Switzerland)25(18), 5877 (2025)

Arshid, K., Krayani, A., Marcenaro, L., Gomez, D.M., Regazzoni, C.: Toward au- tonomous uav swarm navigation: a review of trajectory design paradigms. Sensors (Basel, Switzerland)25(18), 5877 (2025)

2025
[8]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Assran, M., Duval, Q., Misra, I., Bojanowski, P., Vincent, P., Rabbat, M., LeCun, Y., Ballas, N.: Self-supervised learning from images with a joint-embedding pre- Human Cognition in Machines: A Unified Perspective of World Models 43 dictive architecture. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 15619–15629 (2023)

2023
[9]

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Assran, M., Bardes, A., Fan, D., Garrido, Q., Howes, R., Muckley, M., Rizvi, A., Roberts, C., Sinha, K., Zholus, A., et al.: V-jepa 2: Self-supervised video models enable understanding, prediction and planning. arXiv preprint arXiv:2506.09985 (2025)

work page internal anchor Pith review arXiv 2025
[10]

Cambridge University Press (1993)

Baars, B.J.: A cognitive theory of consciousness. Cambridge University Press (1993)

1993
[11]

arXiv preprint arXiv:2601.15284 (2026) 5

Bagchi, A., Bao, Z., Bharadhwaj, H., Wang, Y.X., Tokmakov, P., Hebert, M.: Walk through paintings: Egocentric world models from internet priors. arXiv preprint arXiv:2601.15284 (2026)

work page arXiv 2026
[12]

Dream to manipulate: Compositional world models empowering robot imitation learning with imagination, 2025

Barcellona, L., Zadaianchuk, A., Allegro, D., Papa, S., Ghidoni, S., Gavves, E.: Dream to manipulate: Compositional world models empowering robot imitation learning with imagination. arXiv preprint arXiv:2412.14957 (2024)

work page arXiv 2024
[13]

Revisiting Feature Prediction for Learning Visual Representations from Video

Bardes, A., Garrido, Q., Ponce, J., Chen, X., Rabbat, M., LeCun, Y., Assran, M., Ballas, N.: Revisiting feature prediction for learning visual representations from video. arXiv preprint arXiv:2404.08471 (2024)

work page internal anchor Pith review arXiv 2024
[14]

Bardes, A., Garrido, Q., Ponce, J., Chen, X., Rabbat, M., LeCun, Y., Assran, M., Ballas, N.: V-jepa: Latent video prediction for visual representation learning (2023)

2023
[15]

arXiv preprint arXiv:1912.05510 (2019)

Berseth, G., Geng, D., Devin, C., Rhinehart, N., Finn, C., Jayaraman, D., Levine, S.: Smirl: Surprise minimizing reinforcement learning in unstable environments. arXiv preprint arXiv:1912.05510 (2019)

work page arXiv 1912
[16]

arXiv preprint arXiv:2503.21232 (2025)

Bheemaiah, A., Yang, S.: Knowledge graphs as world models for seman- tic material-aware obstacle handling in autonomous vehicles. arXiv preprint arXiv:2503.21232 (2025)

work page arXiv 2025
[17]

Motus: A Unified Latent Action World Model

Bi, H., Tan, H., Xie, S., Wang, Z., Huang, S., Liu, H., Zhao, R., Feng, Y., Xiang, C., Rong, Y., et al.: Motus: A unified latent action world model. arXiv preprint arXiv:2512.13030 (2025)

work page internal anchor Pith review arXiv 2025
[18]

Meta-thinking in llms via multi-agent reinforcement learning: A survey,

Bilal, A., Mohsin, M.A., Umer, M., Bangash, M.A.K., Jamshed, M.A.: Meta- thinking in llms via multi-agent reinforcement learning: A survey. arXiv preprint arXiv:2504.14520 (2025)

work page arXiv 2025
[19]

Zero-shot robotic manipu- lation with pretrained image-editing diffusion models,

Black, K., Nakamoto, M., Atreya, P., Walke, H., Finn, C., Kumar, A., Levine, S.: Zero-shot robotic manipulation with pretrained image-editing diffusion models. arXiv preprint arXiv:2310.10639 (2023)

work page arXiv 2023
[20]

Cognitive Systems Research91, 101353 (2025)

Boggs, J.: Towards visual-symbolic integration in the soar cognitive architecture. Cognitive Systems Research91, 101353 (2025)

2025
[21]

arXiv preprint arXiv:2509.19789 (2025)

Bosio, C., Woelki, G., Hendy, N., Roy, N., Kim, B.: Rdar: Reward-driven agent relevance estimation for autonomous driving. arXiv preprint arXiv:2509.19789 (2025)

work page arXiv 2025
[22]

Bühler, K.: Sprachtheorie, vol. 2. Jena Fischer (1934)

1934
[23]

arXiv preprint arXiv:2507.04075 (2025)

Burchi,M.,Timofte,R.:Accurateandefficientworldmodelingwithmaskedlatent transformers. arXiv preprint arXiv:2507.04075 (2025)

work page arXiv 2025
[24]

arXiv preprint arXiv:2601.16471 (2026)

Cao, M., Tang, H., Zhao, H., Han, M., Liu, R., Sun, Q., Chang, X., Reid, I., Liang, X.:Orderfromchaos:Physicalworldunderstandingfromglitchygameplayvideos. arXiv preprint arXiv:2601.16471 (2026)

work page arXiv 2026
[25]

WorldVLA: Towards Autoregressive Action World Model

Cen, J., Yu, C., Yuan, H., Jiang, Y., Huang, S., Guo, J., Li, X., Song, Y., Luo, H., Wang, F., et al.: Worldvla: Towards autoregressive action world model. arXiv preprint arXiv:2506.21539 (2025) 44 Authors Suppressed Due to Excessive Length

work page internal anchor Pith review arXiv 2025
[26]

Advances in Neural Information Processing Systems34, 965–979 (2021)

Chang, J., Uehara, M., Sreenivas, D., Kidambi, R., Sun, W.: Mitigating covariate shift in imitation learning via offline data with partial coverage. Advances in Neural Information Processing Systems34, 965–979 (2021)

2021
[27]

In: International Conference on Product-Focused Software Process Improvement

Chatlatanagulchai, W., Thonglek, K., Reid, B., Kashiwa, Y., Leelaprute, P., Rungsawang, A., Manaskasemsak, B., Iida, H.: On the use of agentic coding manifests: An empirical study of claude code. In: International Conference on Product-Focused Software Process Improvement. pp. 543–551. Springer (2025)

2025
[28]

GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation

Cheang, C.L., Chen, G., Jing, Y., Kong, T., Li, H., Li, Y., Liu, Y., Wu, H., Xu, J., Yang, Y., et al.: Gr-2: A generative video-language-action model with web-scale knowledge for robot manipulation. arXiv preprint arXiv:2410.06158 (2024)

work page internal anchor Pith review arXiv 2024
[29]

Large Video Planner Enables Generalizable Robot Control

Chen, B., Zhang, T., Geng, H., Song, K., Zhang, C., Li, P., Freeman, W.T., Malik, J., Abbeel, P., Tedrake, R., et al.: Large video planner enables generalizable robot control. arXiv preprint arXiv:2512.15840 (2025)

work page internal anchor Pith review arXiv 2025
[30]

Chen, S., Ma, S., Yu, S., Zhang, H., Zhao, S., Lu, C.: Exploring consciousness in llms: A systematic survey of theories, implementations, and frontier risks (2025), https://arxiv.org/abs/2505.19806

work page arXiv 2025
[31]

arXiv preprint arXiv:2412.18607 , year=

Chen, Y., et al.: Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers. arXiv preprint arXiv:2412.18607 (2024)

work page arXiv 2024
[32]

Neuroscience of Consciousness2024(1), niae013 (2024)

Colombatto, C., Fleming, S.M.: Folk psychological attributions of consciousness to large language models. Neuroscience of Consciousness2024(1), niae013 (2024)

2024
[33]

arXiv preprint arXiv:2603.24327 (2026)

Cornelissen, C., Leroux, S., Simoens, P.: Le mumo jepa: Multi-modal self- supervised representation learning with learnable fusion tokens. arXiv preprint arXiv:2603.24327 (2026)

work page arXiv 2026
[34]

arXiv preprint arXiv:2510.17482 (2025)

Dang, C., et al.: Sparseworld: A flexible, adaptive, and efficient 4d occu- pancy world model powered by sparse and dynamic queries. arXiv preprint arXiv:2510.17482 (2025)

work page arXiv 2025
[35]

Darwin, C.: The descent of man, and selection in relation to sex, vol. 2. D. Ap- pleton (1872)
[36]

WW Norton & Company (1998)

Deacon, T.W.: The symbolic species: The co-evolution of language and the brain. WW Norton & Company (1998)

1998
[37]

Scientific Reports14(1), 28083 (2024)

Dentella, V., Günther, F., Murphy, E., Marcus, G., Leivada, E.: Testing ai on lan- guage comprehension tasks reveals insensitivity to underlying meaning. Scientific Reports14(1), 28083 (2024)

2024
[38]

arXiv preprint arXiv:2601.00844 , year=

Destrade, M., Bounou, O., Lidec, Q.L., Ponce, J., LeCun, Y.: Value-guided action planning with jepa world models. arXiv preprint arXiv:2601.00844 (2025)

work page arXiv 2025
[39]

Dream2Flow: Bridging video generation and open-world manipulation with 3D object flow.arXiv preprint arXiv:2512.24766, 2025

Dharmarajan, K., Huang, W., Wu, J., Fei-Fei, L., Zhang, R.: Dream2flow: Bridg- ing video generation and open-world manipulation with 3d object flow. arXiv preprint arXiv:2512.24766 (2025)

work page arXiv 2025
[40]

ACM Computing Surveys58(3), 1–38 (2025)

Ding, J., Zhang, Y., Shang, Y., Zhang, Y., Zong, Z., Feng, J., Yuan, Y., Su, H., Li, N., Sukiennik, N., et al.: Understanding world or predicting future? a comprehensive survey of world models. ACM Computing Surveys58(3), 1–38 (2025)

2025
[41]

Nature Machine Intelligence pp

Doerig, A., Kietzmann, T.C., Allen, E., Wu, Y., Naselaris, T., Kay, K., Charest, I.: High-level visual representations in the human brain are aligned with large language models. Nature Machine Intelligence pp. 1–15 (2025)

2025
[42]

Harvard university press (1993)

Donald, M.: Origins of the modern mind: Three stages in the evolution of culture and cognition. Harvard university press (1993)

1993
[43]

Authorea Preprints (2026) Human Cognition in Machines: A Unified Perspective of World Models 45

Dong, J., Lyu, Q., Liu, B., Wang, X., Liang, W., Zhang, D., Tu, J., Li, H., Zhao, H., Ding, H., et al.: Learning to model the world: A survey of world models in artificial intelligence. Authorea Preprints (2026) Human Cognition in Machines: A Unified Perspective of World Models 45

2026
[44]

Advances in neural information processing systems36, 9156–9172 (2023)

Du, Y., Yang, S., Dai, B., Dai, H., Nachum, O., Tenenbaum, J., Schuurmans, D., Abbeel, P.: Learning universal policies via text-guided video generation. Advances in neural information processing systems36, 9156–9172 (2023)

2023
[45]

arXiv e-prints pp

Dung Nguyen, V., Yang, Z., Buckley, C.L., Ororbia, A.: R-aif: Solving sparse- reward robotic tasks from pixels with active inference and world models. arXiv e-prints pp. arXiv–2409 (2024)

2024
[46]

arXiv preprint arXiv:2601.06309 (2026)

Durante, Z., Singh, S., Khatua, A., Agarwal, S., Tan, R., Lee, Y.J., Gao, J., Adeli, E., Fei-Fei, L.: Videoweave: A data-centric approach for efficient video understanding. arXiv preprint arXiv:2601.06309 (2026)

work page arXiv 2026
[47]

Friston, et al., Active inference and artificial reasoning, arXiv preprint (2025).arXiv:2512.21129

Friston, K., Da Costa, L., Tschantz, A., Heins, C., Buckley, C., Verbelen, T., Parr, T.: Active inference and artificial reasoning. arXiv preprint arXiv:2512.21129 (2025)

work page arXiv 2025
[48]

Neuroscience & Biobehavioral Reviews68, 862– 879 (2016)

Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G., et al.: Active inference and learning. Neuroscience & Biobehavioral Reviews68, 862– 879 (2016)

2016
[49]

Fung, P., Bachrach, Y., Celikyilmaz, A., Chaudhuri, K., Chen, D., Chung, W., Dupoux, E., Gong, H., et al.: Embodied ai agents: Modeling the world (2025)

2025
[50]

Adaworld: Learning adaptable world models with latent actions.arXiv preprint arXiv:2503.18938, 2025

Gao, S., Zhou, S., Du, Y., Zhang, J., Gan, C.: Adaworld: Learning adaptable world models with latent actions. arXiv preprint arXiv:2503.18938 (2025)

work page arXiv 2025
[51]

IEEE Transactions on Intelligent Vehicles (2024)

Gao, Y., Zhang, Q., Ding, D.W., Zhao, D.: Dream to drive with predictive indi- vidual world model. IEEE Transactions on Intelligent Vehicles (2024)

2024
[52]

arXiv preprint arXiv:2601.05230 (2026)

Garrido, Q., Nagarajan, T., Terver, B., Ballas, N., LeCun, Y., Rabbat, M.: Learn- ing latent action world models in the wild. arXiv preprint arXiv:2601.05230 (2026)

work page arXiv 2026
[53]

In: Proceedings of the 32nd ACM International Conference on Multimedia

Ge, Z., Huang, H., Zhou, M., Li, J., Wang, G., Tang, S., Zhuang, Y.: Worldgpt: Empowering llm as multimodal world model. In: Proceedings of the 32nd ACM International Conference on Multimedia. pp. 7346–7355 (2024)

2024
[54]

Journal of Personalized Medicine16(4), 181 (2026)

Gentile, G., Morello, G., La Cognata, V., Guarnaccia, M., Cavallaro, S.: Artificial intelligence in transcriptomics: From human-in-the-loop to agentic ai. Journal of Personalized Medicine16(4), 181 (2026)

2026
[55]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Goff, M., Hogan, G., Hotz, G., du Parc Locmaria, A., Raczy, K., Schäfer, H., Shihadeh, A., Zhang, W., Yousfi, Y.: Learning to drive from a world model. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 1964–1973 (2025)

1964
[56]

Towards an AI co-scientist

Gottweis, J., Weng, W.H., Daryin, A., Tu, T., Palepu, A., Sirkovic, P., Myaskovsky, A., Weissenberger, F., Rong, K., Tanno, R., et al.: Towards an ai co-scientist. arXiv preprint arXiv:2502.18864 (2025)

work page internal anchor Pith review arXiv 2025
[57]

arXiv preprint arXiv:2411.06559 , year=

Gu, Y., Zhang, K., Ning, Y., Zheng, B., Gou, B., Xue, T., Chang, C., Srivastava, S., Xie, Y., Qi, P., et al.: Is your llm secretly a world model of the internet? model-based planning for web agents. arXiv preprint arXiv:2411.06559 (2024)

work page arXiv 2024
[58]

Gumbsch, C., Sajid, N., Martius, G., Butz, M.V.: In: The Twelfth International Conference on Learning Representations (2023)

2023
[59]

IEEE Robotics and Automation Letters11(3), 2466–2473 (2026)

Guo, J., Ma, X., Wang, Y., Yang, M., Liu, H., Li, Q.: Flowdreamer: A rgb- d world model with flow-based motion representations for robot manipulation. IEEE Robotics and Automation Letters11(3), 2466–2473 (2026)

2026
[60]

World Models

Ha, D., Schmidhuber, J.: World models. arXiv preprint arXiv:1803.101222(3) (2018)

work page internal anchor Pith review arXiv 2018
[61]

LTX-Video: Realtime Video Latent Diffusion

HaCohen, Y., Chiprut, N., Brazowski, B., Shalem, D., Moshe, D., Richardson, E., Levin, E., Shiran, G., Zabari, N., Gordon, O., et al.: Ltx-video: Realtime video latent diffusion. arXiv preprint arXiv:2501.00103 (2024)

work page internal anchor Pith review arXiv 2024
[62]

In: AAAI Workshops (2017) 46 Authors Suppressed Due to Excessive Length

Hadfield-Menell, D., Dragan, A.D., Abbeel, P., Russell, S.: The off-switch game. In: AAAI Workshops (2017) 46 Authors Suppressed Due to Excessive Length

2017
[63]

TD-MPC2: Scalable, Robust World Models for Continuous Control

Hansen, N., Su, H., Wang, X.: Td-mpc2: Scalable, robust world models for con- tinuous control. arXiv preprint arXiv:2310.16828 (2023)

work page internal anchor Pith review arXiv 2023
[64]

Hierarchical world models as visual whole-body humanoid controllers

Hansen, N., SV, J., Sobal, V., LeCun, Y., Wang, X., Su, H.: Hierarchi- cal world models as visual whole-body humanoid controllers. arXiv preprint arXiv:2405.18418 (2024)

work page arXiv 2024
[65]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Hao, C., Lu, W., Xu, Y., Chen, Y.: Neural motion simulator pushing the limit of world models in reinforcement learning. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 27608–27617 (2025)

2025
[67]

GAIA-1: A Generative World Model for Autonomous Driving

Hu, A., et al.: Gaia-1: A generative world model for autonomous driving. arXiv preprint arXiv:2309.17080 (2023)

work page internal anchor Pith review arXiv 2023
[68]

Enerverse: Envisioning embodied future space for robotics manipulation

Huang, S., Chen, L., Zhou, P., Chen, S., Jiang, Z., Hu, Y., Liao, Y., Gao, P., Li, H., Yao, M., et al.: Enerverse: Envisioning embodied future space for robotics manipulation. arXiv preprint arXiv:2501.01895 (2025)

work page arXiv 2025
[69]

SafeDreamer: Safe reinforcement learning with world models

Huang, W., Ji, J., Xia, C., Zhang, B., Yang, Y.: Safedreamer: Safe reinforcement learning with world models. arXiv preprint arXiv:2307.07176 (2023)

work page arXiv 2023
[70]

PointWorld: Scaling 3D world models for in-the-wild robotic manipulation.arXiv preprint arXiv:2601.03782, 2026

Huang, W., Chao, Y.W., Mousavian, A., Liu, M.Y., Fox, D., Mo, K., Fei-Fei, L.: Pointworld: Scaling 3d world models for in-the-wild robotic manipulation. arXiv preprint arXiv:2601.03782 (2026)

work page arXiv 2026
[71]

Huang, X., Li, Z., He, G., Zhou, M., Shechtman, E.: Self forcing: Bridging the train-test gap in autoregressive video diffusion (2025),https://arxiv.org/abs/ 2506.08009

work page internal anchor Pith review arXiv 2025
[72]

arXiv preprint arXiv:2505.11528 (2025)

Huang,Y.,Zhang,J.,Zou,S.,Liu,X.,Hu,R.,Xu,K.:Ladi-wm:Alatentdiffusion- based world model for predictive manipulation. arXiv preprint arXiv:2505.11528 (2025)

work page arXiv 2025
[73]

The Platonic Representation Hypothesis

Huh, M., Cheung, B., Wang, T., Isola, P.: The platonic representation hypothesis. arXiv preprint arXiv:2405.07987 (2024)

work page Pith review arXiv 2024
[74]

https://www.reddit.com/r/AmItheAsshole/

Ibrahim, L., Cheng, M.: Thinking beyond the anthropomorphic paradigm benefits llm research. arXiv preprint arXiv:2502.09192 (2025)

work page arXiv 2025
[75]

Intelligence, P., Black, K., Brown, N., Darpinian, J., Dhabalia, K., Driess, D., Esmail, A., Equi, M., Finn, C., Fusai, N., Galliker, M.Y., et al.:π0.5: a vision- language-action model with open-world generalization (2025)

2025
[76]

arXiv preprint arXiv:2601.22647 (2026)

Jang, J., Yoo, M., Yoon, S., Woo, H.: Test-time mixture of world models for em- bodied agents in dynamic environments. arXiv preprint arXiv:2601.22647 (2026)

work page arXiv 2026
[77]

Dreamgen: Unlocking generalization in robot learning through video world models.arXiv preprint arXiv:2505.12705, 2025

Jang, J., Ye, S., Lin, Z., Xiang, J., Bjorck, J., Fang, Y., Hu, F., Huang, S., Kun- dalia, K., Lin, Y.C., et al.: Dreamgen: Unlocking generalization in robot learning through video world models. arXiv preprint arXiv:2505.12705 (2025)

work page arXiv 2025
[78]

Self-refining video sampling.arXiv preprint arXiv:2601.18577, 2026

Jang, S., Ki, T., Jo, J., Xie, S., Yoon, J., Hwang, S.J.: Self-refining video sampling. arXiv preprint arXiv:2601.18577 (2026)

work page arXiv 2026
[79]

In: Creative Writing, pp

Jaynes, J.: from the origin of consciousness in the breakdown of the bicameral mind. In: Creative Writing, pp. 541–543. Routledge (2013)

2013
[80]

IRL-VLA: Training an vision-language-action policy via reward world model,

Jiang, A., Gao, Y., Wang, Y., Sun, Z., Wang, S., Heng, Y., Sun, H., Tang, S., Zhu, L., Chai, J., et al.: Irl-vla: Training an vision-language-action policy via reward world model. arXiv preprint arXiv:2508.06571 (2025)

work page arXiv 2025
[81]

Trends in Cognitive Sciences (2025) Human Cognition in Machines: A Unified Perspective of World Models 47

Johnson, S.G., Karimi, A.H., Bengio, Y., Chater, N., Gerstenberg, T., Larson, K., Levine, S., Mitchell, M., Rahwan, I., Schölkopf, B., et al.: Imagining and building wise machines: The centrality of ai metacognition. Trends in Cognitive Sciences (2025) Human Cognition in Machines: A Unified Perspective of World Models 47

2025

Showing first 80 references.