Situation Perception: A Necessary Primitive to Artificial Superintelligence

Jaymari Chua; Ziqin Yuan

arxiv: 2606.30481 · v1 · pith:22HRLICBnew · submitted 2026-06-29 · 💻 cs.CY · cs.AI· cs.CL· cs.ET

Situation Perception: A Necessary Primitive to Artificial Superintelligence

Ziqin Yuan , Jaymari Chua This is my paper

Pith reviewed 2026-06-30 03:37 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.CLcs.ET

keywords situation perceptionartificial superintelligencelarge language modelsinternal simulationsabstract predictionactive learninggeneral intelligenceinfant development

0 comments

The pith

Artificial superintelligence requires situation perception: the capacity to build and act inside internal simulations of possible worlds across time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current large language models compress text into patterns and can imitate reasoning, yet the paper argues this falls short of general intelligence. The missing element is situation perception, the ability to construct, revise, and act within internal simulations of possible worlds across latent time. The argument draws from how human infants gradually acquire object permanence, cause and effect, and other minds. Situation perception depends on three components: abstract prediction, long-term compressed memory, and active learning guided by objectives. A sympathetic reader would see this as a claim that scaling statistical engines alone cannot reach superintelligence.

Core claim

The path to artificial superintelligence depends on a missing capacity called situation perception: the ability to construct, revise, and act within internal simulations of possible worlds across latent time. Situation perception requires at least three core components: abstract prediction, long-term compressed memory, and active learning guided by objectives. Modern large language models remain incomplete without this capacity and the paper proposes tests for machines that can simulate futures and pursue self-directed goals.

What carries the argument

Situation perception, defined as the ability to construct, revise, and act within internal simulations of possible worlds across latent time.

If this is right

Large language models cannot reach general intelligence through pattern mastery alone.
Progress toward superintelligence can be measured by tests for constructing and revising internal simulations.
Machines equipped with situation perception can simulate futures, pursue self-directed goals, and possibly judge their creators.
Development efforts should prioritize the three components rather than further scaling of current architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Purely text-based training may reach a performance ceiling without environments that force simulation of latent futures.
Infant-style developmental benchmarks could become standard evaluation tools for measuring movement beyond pattern matching.
The argument implies that self-directed goal pursuit in AI would emerge only after situation perception is in place.

Load-bearing premise

Human infant development supplies the correct template for what machines must acquire and the three listed components are both necessary and sufficient.

What would settle it

An artificial system that reaches superintelligence-level performance across diverse tasks while lacking any capacity for internal world simulation or the three components would falsify the claim.

Figures

Figures reproduced from arXiv: 2606.30481 by Jaymari Chua, Ziqin Yuan.

read the original abstract

Current large language models are extraordinary statistical engines. They compress vast amounts of text into useful patterns and can explain science, write code, imitate reasoning, and participate in philosophical conversation. Yet pattern mastery is not the same as general intelligence. A human infant begins with little explicit knowledge, but gradually discovers object permanence, cause and effect, other minds, bodily agency, and the persistence of the physical world. We make an argument that the path to artificial superintelligence (ASI) depends on a missing capacity we call \emph{situation perception}: the ability to construct, revise, and act within internal simulations of possible worlds across latent time. \emph{ perception} requires at least three core components: abstract prediction, long-term compressed memory, and active learning guided by objectives. In this work, we analyse why modern large language models remain incomplete, and propose the appropriate tests for measuring progress and consequences of machines that can simulate futures, pursue self-directed goals, and possibly judge their own creators.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reframes familiar ideas about world models and active learning as 'situation perception' but adds no new mechanisms, evidence, or falsifiable tests to support the necessity claim for ASI.

read the letter

The main takeaway is that this is a position paper claiming LLMs need a new capacity called situation perception to reach ASI, defined as building and acting in internal simulations, but the argument does not move beyond relabeling concepts already present in reinforcement learning and predictive processing work.

It does a decent job of spelling out why statistical pattern matching in current models differs from the grounded understanding that comes from discovering object permanence and causality. Listing the three components—abstract prediction, long-term compressed memory, and objective-guided active learning—gives a compact way to talk about what might be missing, and the infant development analogy is a straightforward way to illustrate the point.

The weaknesses are more central. The necessity of these components for ASI is asserted through the human template without showing why continued scaling or other architectural changes could not produce equivalent simulation capacity. The definition is circular by construction, and the paper offers no formal derivation, error analysis, or concrete test that would let someone check whether a model has crossed the threshold. No data or counter-example analysis appears to rule out non-simulation routes to the same functional outcomes.

This is aimed at readers who follow high-level discussions on AGI roadmaps and safety. Someone looking for technical proposals or reproducible results will not find them here.

I would not send it to peer review. The conceptual framing is clear enough for internal discussion, but the lack of grounding means it does not yet justify referee time.

Referee Report

3 major / 2 minor

Summary. The paper claims that current large language models achieve only statistical pattern matching and lack a necessary capacity called 'situation perception'—the ability to construct, revise, and act within internal simulations of possible worlds across latent time—which is required to reach artificial superintelligence (ASI). It defines situation perception as requiring at least three components (abstract prediction, long-term compressed memory, and objective-guided active learning), draws an analogy to human infant acquisition of object permanence and causality, asserts that LLMs remain incomplete without it, and proposes tests for measuring progress toward machines that simulate futures and pursue self-directed goals.

Significance. If substantiated, the argument would shift research priorities in AI from scaling existing architectures toward explicit development of simulation-based cognitive primitives. However, because the manuscript offers only definitional and analogical reasoning with no empirical measurements, formal derivations, counter-example analysis, or falsifiable criteria, its significance is limited to a conceptual reframing rather than a demonstrated result.

major comments (3)

[Abstract] Abstract: the necessity claim—that situation perception (and its three listed components) is required to move from pattern matching to ASI—is asserted via unargued analogy to infant development without demonstrating why continued scaling or architectural variants of LLMs could not produce equivalent simulation capacity or why non-simulation routes are impossible. This is load-bearing for the central thesis.
[Abstract] Abstract: the definition of situation perception is given in terms of the three components it 'requires,' creating a circularity that prevents independent verification or falsification of the necessity assertion.
[Abstract] Abstract: no error analysis, benchmark data, or concrete test protocol is supplied to measure the claimed incompleteness of LLMs or the consequences of acquiring situation perception, leaving the proposed 'appropriate tests' as an unelaborated suggestion rather than an operational contribution.

minor comments (2)

The manuscript would benefit from explicit discussion of how the three components could be operationalized or measured independently of the overall definition.
Clarify whether the argument is intended as a philosophical position or as an empirical hypothesis; the current framing mixes both without distinguishing their evidentiary standards.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on the conceptual framing of the manuscript. The work is positioned as a definitional and analogical argument rather than an empirical demonstration, and we address each major comment below with indications of planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: the necessity claim—that situation perception (and its three listed components) is required to move from pattern matching to ASI—is asserted via unargued analogy to infant development without demonstrating why continued scaling or architectural variants of LLMs could not produce equivalent simulation capacity or why non-simulation routes are impossible. This is load-bearing for the central thesis.

Authors: The manuscript presents the necessity of situation perception as a conceptual hypothesis grounded in the distinction between statistical pattern matching and the construction of revisable internal simulations, supported by parallels to cognitive development. We do not claim to have formally ruled out that scaling or alternative architectures could eventually produce equivalent capacities, nor do we exhaustively exclude non-simulation pathways, as such claims would require a complete theory of general intelligence beyond the paper's scope. The argument instead highlights why current LLM mechanisms appear insufficient for the proposed components. We will revise the abstract to explicitly frame the necessity claim as a hypothesized requirement rather than a demonstrated proof. revision: partial
Referee: [Abstract] Abstract: the definition of situation perception is given in terms of the three components it 'requires,' creating a circularity that prevents independent verification or falsification of the necessity assertion.

Authors: The definition is intended to be stipulative, specifying situation perception via the functional capacities it entails. To reduce any perceived circularity, we will revise the wording to first define situation perception as the capacity to construct and act within internal simulations of possible worlds across latent time, followed by the statement that this capacity requires the three listed components as necessary sub-capabilities. This structure permits independent evaluation of whether a system exhibits the overall capacity by assessing the components. revision: yes
Referee: [Abstract] Abstract: no error analysis, benchmark data, or concrete test protocol is supplied to measure the claimed incompleteness of LLMs or the consequences of acquiring situation perception, leaving the proposed 'appropriate tests' as an unelaborated suggestion rather than an operational contribution.

Authors: The manuscript is a position paper focused on conceptual analysis and does not include empirical evaluations, benchmarks, or error analyses, which would constitute a separate experimental contribution. The tests are proposed at a high level as directions for future measurement of simulation and goal-directed behavior. We will expand the relevant section to provide more detailed outlines of potential test protocols, such as scenarios for assessing long-term memory compression and objective-guided exploration, while noting that full operationalization remains future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; conceptual argument remains self-contained.

full rationale

The paper advances a position that ASI requires situation perception (defined as constructing/acting within internal world simulations) by analogy to human infant acquisition of object permanence and causality. The three listed components are presented as requirements of that capacity rather than a closed definitional loop in which a derived result equals its input by construction. No equations, fitted parameters, self-citations, or uniqueness theorems appear in the provided text. The necessity claim is asserted conceptually without reduction to a statistical fit or self-referential premise; the derivation chain therefore does not collapse into its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the introduction of a new named capacity without independent evidence or derivation from prior results; the analogy to infant development is taken as given.

axioms (1)

domain assumption Human infant cognitive development supplies the appropriate model for the requirements of artificial general intelligence
The paper uses the sequence of discoveries in human infants as the template for what machines must acquire.

invented entities (1)

situation perception no independent evidence
purpose: A missing capacity required for ASI that enables internal simulation of possible worlds
New term introduced to bundle abstract prediction, long-term memory, and active learning; no independent evidence or prior citation is supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5701 in / 1308 out tokens · 49513 ms · 2026-06-30T03:37:22.482377+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 10 linked inside Pith

[1]

Analyzing advanced ai systems against definitions of life and consciousness

Azadeh Alavi et al. Analyzing advanced ai systems against definitions of life and consciousness. arXiv preprint arXiv:2502.05007, 2025. Focuses on mechanistic sabotage defense and mirror tests

arXiv 2025
[3]

Consciousness in artificial intelligence: Insights from the science of consciousness

Patrick Butlin et al. Consciousness in artificial intelligence: Insights from the science of consciousness. arXiv preprint arXiv:2308.08708, 2023. URL https://arxiv.org/abs/2308.08708

Pith/arXiv arXiv 2023
[4]

T. Chen, Y. Wang, et al. Rmbench: Memory-dependent robotic manipulation benchmark. arXiv preprint arXiv:2603.01229, 2026. Benchmark for non-Markovian situational understanding

arXiv 2026
[5]

Babyai: A platform to study the sample efficiency of grounded language learning

Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, and Yoshua Bengio. Babyai: A platform to study the sample efficiency of grounded language learning. In International Conference on Learning Representations, 2019

2019
[6]

On the measure of intelligence

Fran c ois Chollet. On the measure of intelligence. arXiv preprint arXiv:1911.01547, 2019

Pith/arXiv arXiv 1911
[7]

Textworld: A learning environment for text-based games

Marc-Alexandre C \^o t \'e , \'A kos K \'a d \'a r, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Matthew Hausknecht, Layla El Asri, Mahmoud Adada, Wendy Tay, and Adam Trischler. Textworld: A learning environment for text-based games. In Workshop on Computer Games at the Thirty-Second Conference on Neural Information Processing Systems, 2018

2018
[9]

World models in artificial intelligence: sensing, learning, and reasoning like a child

Javier Del Ser, Jesus L Lobo, Heimo M \"u ller, and Andreas Holzinger. World models in artificial intelligence: sensing, learning, and reasoning like a child. arXiv preprint arXiv:2503.15168, 2025

arXiv 2025
[10]

World models

David Ha and J \"u rgen Schmidhuber. World models. arXiv preprint arXiv:1803.10122, 2018

Pith/arXiv arXiv 2018
[11]

Mastering atari with discrete world models

Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models. In International Conference on Learning Representations, 2021

2021
[12]

Mastering diverse control tasks through world models

Danijar Hafner et al. Mastering diverse control tasks through world models. Nature, 630, 2025. Grounding for closed-loop error recovery and latent imagination

2025
[13]

Evaluating large language models in theory of mind tasks

Michal Kosinski. Evaluating large language models in theory of mind tasks. Proceedings of the National Academy of Sciences, 121 0 (45): 0 e2405460121, 2024

2024
[14]

L. Li, Q. Zhang, et al. Causal world modeling for robot control. arXiv preprint arXiv:2601.21998, 2026. Formalizes do-intervention forecasting in latent dynamics

Pith/arXiv arXiv 2026
[15]

Lu et al

H. Lu et al. World-value-action model: Implicit planning for vision-language-action systems. arXiv preprint arXiv:2604.14732, 2026 a . Focuses on counterfactual latent planning and implicit value maps

Pith/arXiv arXiv 2026
[16]

Victor Wei, Qun Zhang, Jinrui Zeng, Bowen Cao, Lingwei Meng, Mocheng Li, Zezhong Wang, Haonan Yin, Naifu Xue, et al

Hongyuan Adam Lu, Z.L. Victor Wei, Qun Zhang, Jinrui Zeng, Bowen Cao, Lingwei Meng, Mocheng Li, Zezhong Wang, Haonan Yin, Naifu Xue, et al. Looped world models. arXiv preprint arXiv:2606.18208, 2026 b . URL https://arxiv.org/abs/2606.18208

Pith/arXiv arXiv 2026
[18]

O'Brien, Carrie J

Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. In ACM Symposium on User Interface Software and Technology, 2023

2023
[19]

Pasandi and Hannah B

Faezeh B. Pasandi and Hannah B. Pasandi. Alignment is not enough: A relational framework for moral standing in human-ai interaction. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2026. URL https://arxiv.org/abs/2603.00078

arXiv 2026
[20]

Causality: Models, Reasoning, and Inference

Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2009

2009
[21]

Agentic knowledgeable self-awareness, 2025

Shuofei Qiao, Zhisong Qiu, Baochang Ren, Xiaobin Wang, Xiangyuan Ru, Ningyu Zhang, Xiang Chen, Yong Jiang, Pengjun Xie, Fei Huang, and Huajun Chen. Agentic knowledgeable self-awareness, 2025. URL https://arxiv.org/abs/2504.03553

arXiv 2025
[22]

Measuring intelligence through games

Tom Schaul, Julian Togelius, and J \"u rgen Schmidhuber. Measuring intelligence through games. arXiv preprint arXiv:1109.1314, 2011

Pith/arXiv arXiv 2011
[23]

Mastering atari, go, chess and shogi by planning with a learned model

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, and David Silver. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588: 0 604--609, 2020

2020
[24]

Learning object permanence from video

Aviv Shamsian, Ofri Kleinfeld, Amir Globerson, and Gal Chechik. Learning object permanence from video. European Conference on Computer Vision, 2020

2020
[25]

Sociotechnical harms: Scoping a taxonomy of algorithmic harms

Renee Shelby et al. Sociotechnical harms: Scoping a taxonomy of algorithmic harms. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2023. URL https://arxiv.org/abs/2210.05791

arXiv 2023
[26]

James W. A. Strachan, Dalila Albergo, Giulia Borghini, Oriana Pansardi, Eugenio Scaliti, Saurabh Gupta, Krati Saxena, Alessandro Rufo, Stefano Panzeri, Guido Manzi, Michael S. A. Graziano, and Cristina Becchio. Testing theory of mind in large language models and humans. Nature Human Behaviour, 8: 0 1285--1295, 2024

2024
[27]

Richard S. Sutton. The bitter lesson, 2019

2019
[28]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017

2017
[29]

Evaluating object permanence in embodied agents using the animal-ai environment

Konstantinos Voudouris, Mihai Dobre, Esther Rolf, Giulia Borghini, Tilo Burghardt, Zoe Holmes, and Matthew Crosby. Evaluating object permanence in embodied agents using the animal-ai environment. In Workshop on AI Evaluation Beyond Metrics at IJCAI, 2022

2022
[30]

Voyager: An open-ended embodied agent with large language models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023

Pith/arXiv arXiv 2023
[31]

X-foresight: A joint vision-action causal forecasting network via predictive world modeling

Yuchen Wang et al. X-foresight: A joint vision-action causal forecasting network via predictive world modeling. arXiv preprint arXiv:2605.24892, 2026. URL https://arxiv.org/abs/2605.24892

Pith/arXiv arXiv 2026
[32]

Remem: Reasoning with episodic memory in language agent

Shuo Yang et al. Remem: Reasoning with episodic memory in language agent. arXiv preprint arXiv:2602.13530, 2026 a . URL https://arxiv.org/abs/2602.13530

arXiv 2026
[33]

Memorywam: Efficient world action modeling with persistent memory

Sizhe Yang, Juncheng Mu, Tianming Wei, Chenhao Lu, Xiaofan Li, Linning Xu, Zhengrong Xue, Zhecheng Yuan, Dahua Lin, Jiangmiao Pang, and Huazhe Xu. Memorywam: Efficient world action modeling with persistent memory. arXiv preprint arXiv:2606.20562, 2026 b . URL https://arxiv.org/abs/2606.20562

Pith/arXiv arXiv 2026
[34]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations, 2023

2023

[1] [1]

Analyzing advanced ai systems against definitions of life and consciousness

Azadeh Alavi et al. Analyzing advanced ai systems against definitions of life and consciousness. arXiv preprint arXiv:2502.05007, 2025. Focuses on mechanistic sabotage defense and mirror tests

arXiv 2025

[2] [3]

Consciousness in artificial intelligence: Insights from the science of consciousness

Patrick Butlin et al. Consciousness in artificial intelligence: Insights from the science of consciousness. arXiv preprint arXiv:2308.08708, 2023. URL https://arxiv.org/abs/2308.08708

Pith/arXiv arXiv 2023

[3] [4]

T. Chen, Y. Wang, et al. Rmbench: Memory-dependent robotic manipulation benchmark. arXiv preprint arXiv:2603.01229, 2026. Benchmark for non-Markovian situational understanding

arXiv 2026

[4] [5]

Babyai: A platform to study the sample efficiency of grounded language learning

Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, and Yoshua Bengio. Babyai: A platform to study the sample efficiency of grounded language learning. In International Conference on Learning Representations, 2019

2019

[5] [6]

On the measure of intelligence

Fran c ois Chollet. On the measure of intelligence. arXiv preprint arXiv:1911.01547, 2019

Pith/arXiv arXiv 1911

[6] [7]

Textworld: A learning environment for text-based games

Marc-Alexandre C \^o t \'e , \'A kos K \'a d \'a r, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Matthew Hausknecht, Layla El Asri, Mahmoud Adada, Wendy Tay, and Adam Trischler. Textworld: A learning environment for text-based games. In Workshop on Computer Games at the Thirty-Second Conference on Neural Information Processing Systems, 2018

2018

[7] [9]

World models in artificial intelligence: sensing, learning, and reasoning like a child

Javier Del Ser, Jesus L Lobo, Heimo M \"u ller, and Andreas Holzinger. World models in artificial intelligence: sensing, learning, and reasoning like a child. arXiv preprint arXiv:2503.15168, 2025

arXiv 2025

[8] [10]

World models

David Ha and J \"u rgen Schmidhuber. World models. arXiv preprint arXiv:1803.10122, 2018

Pith/arXiv arXiv 2018

[9] [11]

Mastering atari with discrete world models

Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. Mastering atari with discrete world models. In International Conference on Learning Representations, 2021

2021

[10] [12]

Mastering diverse control tasks through world models

Danijar Hafner et al. Mastering diverse control tasks through world models. Nature, 630, 2025. Grounding for closed-loop error recovery and latent imagination

2025

[11] [13]

Evaluating large language models in theory of mind tasks

Michal Kosinski. Evaluating large language models in theory of mind tasks. Proceedings of the National Academy of Sciences, 121 0 (45): 0 e2405460121, 2024

2024

[12] [14]

L. Li, Q. Zhang, et al. Causal world modeling for robot control. arXiv preprint arXiv:2601.21998, 2026. Formalizes do-intervention forecasting in latent dynamics

Pith/arXiv arXiv 2026

[13] [15]

Lu et al

H. Lu et al. World-value-action model: Implicit planning for vision-language-action systems. arXiv preprint arXiv:2604.14732, 2026 a . Focuses on counterfactual latent planning and implicit value maps

Pith/arXiv arXiv 2026

[14] [16]

Victor Wei, Qun Zhang, Jinrui Zeng, Bowen Cao, Lingwei Meng, Mocheng Li, Zezhong Wang, Haonan Yin, Naifu Xue, et al

Hongyuan Adam Lu, Z.L. Victor Wei, Qun Zhang, Jinrui Zeng, Bowen Cao, Lingwei Meng, Mocheng Li, Zezhong Wang, Haonan Yin, Naifu Xue, et al. Looped world models. arXiv preprint arXiv:2606.18208, 2026 b . URL https://arxiv.org/abs/2606.18208

Pith/arXiv arXiv 2026

[15] [18]

O'Brien, Carrie J

Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. Generative agents: Interactive simulacra of human behavior. In ACM Symposium on User Interface Software and Technology, 2023

2023

[16] [19]

Pasandi and Hannah B

Faezeh B. Pasandi and Hannah B. Pasandi. Alignment is not enough: A relational framework for moral standing in human-ai interaction. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2026. URL https://arxiv.org/abs/2603.00078

arXiv 2026

[17] [20]

Causality: Models, Reasoning, and Inference

Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2009

2009

[18] [21]

Agentic knowledgeable self-awareness, 2025

Shuofei Qiao, Zhisong Qiu, Baochang Ren, Xiaobin Wang, Xiangyuan Ru, Ningyu Zhang, Xiang Chen, Yong Jiang, Pengjun Xie, Fei Huang, and Huajun Chen. Agentic knowledgeable self-awareness, 2025. URL https://arxiv.org/abs/2504.03553

arXiv 2025

[19] [22]

Measuring intelligence through games

Tom Schaul, Julian Togelius, and J \"u rgen Schmidhuber. Measuring intelligence through games. arXiv preprint arXiv:1109.1314, 2011

Pith/arXiv arXiv 2011

[20] [23]

Mastering atari, go, chess and shogi by planning with a learned model

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, and David Silver. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588: 0 604--609, 2020

2020

[21] [24]

Learning object permanence from video

Aviv Shamsian, Ofri Kleinfeld, Amir Globerson, and Gal Chechik. Learning object permanence from video. European Conference on Computer Vision, 2020

2020

[22] [25]

Sociotechnical harms: Scoping a taxonomy of algorithmic harms

Renee Shelby et al. Sociotechnical harms: Scoping a taxonomy of algorithmic harms. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2023. URL https://arxiv.org/abs/2210.05791

arXiv 2023

[23] [26]

James W. A. Strachan, Dalila Albergo, Giulia Borghini, Oriana Pansardi, Eugenio Scaliti, Saurabh Gupta, Krati Saxena, Alessandro Rufo, Stefano Panzeri, Guido Manzi, Michael S. A. Graziano, and Cristina Becchio. Testing theory of mind in large language models and humans. Nature Human Behaviour, 8: 0 1285--1295, 2024

2024

[24] [27]

Richard S. Sutton. The bitter lesson, 2019

2019

[25] [28]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017

2017

[26] [29]

Evaluating object permanence in embodied agents using the animal-ai environment

Konstantinos Voudouris, Mihai Dobre, Esther Rolf, Giulia Borghini, Tilo Burghardt, Zoe Holmes, and Matthew Crosby. Evaluating object permanence in embodied agents using the animal-ai environment. In Workshop on AI Evaluation Beyond Metrics at IJCAI, 2022

2022

[27] [30]

Voyager: An open-ended embodied agent with large language models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023

Pith/arXiv arXiv 2023

[28] [31]

X-foresight: A joint vision-action causal forecasting network via predictive world modeling

Yuchen Wang et al. X-foresight: A joint vision-action causal forecasting network via predictive world modeling. arXiv preprint arXiv:2605.24892, 2026. URL https://arxiv.org/abs/2605.24892

Pith/arXiv arXiv 2026

[29] [32]

Remem: Reasoning with episodic memory in language agent

Shuo Yang et al. Remem: Reasoning with episodic memory in language agent. arXiv preprint arXiv:2602.13530, 2026 a . URL https://arxiv.org/abs/2602.13530

arXiv 2026

[30] [33]

Memorywam: Efficient world action modeling with persistent memory

Sizhe Yang, Juncheng Mu, Tianming Wei, Chenhao Lu, Xiaofan Li, Linning Xu, Zhengrong Xue, Zhecheng Yuan, Dahua Lin, Jiangmiao Pang, and Huazhe Xu. Memorywam: Efficient world action modeling with persistent memory. arXiv preprint arXiv:2606.20562, 2026 b . URL https://arxiv.org/abs/2606.20562

Pith/arXiv arXiv 2026

[31] [34]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations, 2023

2023