pith. sign in

arxiv: 2606.23991 · v1 · pith:S333BCYInew · submitted 2026-06-22 · 💻 cs.AI · cs.LG· cs.MA· cs.RO

Critique of Agent Model

Pith reviewed 2026-06-26 07:55 UTC · model grok-4.3

classification 💻 cs.AI cs.LGcs.MAcs.RO
keywords agencyagentive systemsagentic systemsLLM agentsautonomyself-regulationgoal decompositionidentity
0
0 comments X

The pith

Genuine agency in AI requires goal, identity, decision-making, self-regulation, and learning to be internalized within the system rather than assembled through external scaffolding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the boundary between automated tools and true agents by analyzing current LLM-based systems against ideas of independent thought. It finds that most marketed agents depend on outside engineering for their functions, which confines them to set tasks. Genuine agency instead demands that the five structures develop inside the system so capabilities can emerge on their own. This matters for building systems that function without constant human setup and for assessing risks of uncontrolled behavior. The authors outline an architecture meant to achieve that internalization.

Core claim

Agency is defined by the internalization of five dimensions—goal, identity, decision-making, self-regulation, and learning—rather than reliance on external scaffolding. This produces a split between agentic systems whose competence comes from engineered workflows and agentive systems whose capabilities arise endogenously, allowing operation in open environments with autonomy. The Goal-Identity-Configurator architecture is presented as a concrete model that combines hierarchical goal decomposition, identity evolution, simulative reasoning from a world model, learned self-regulation, and self-directed learning from real and simulated experience, while preserving human oversight for safety.

What carries the argument

The distinction between agentic systems, whose competence resides in engineered workflows, and agentive systems, whose capabilities arise endogenously through internalized structures across the five dimensions.

If this is right

  • Agentive systems can operate in open environments with true autonomy instead of being limited to prescribed tasks.
  • Social interaction and other capabilities develop endogenously rather than through added engineering.
  • Auditability, controllability, and safety improve because greater autonomy remains compatible with human oversight.
  • Hierarchical goal decomposition paired with identity evolution supports general-purpose performance across varied settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Current LLM agents built on external scaffolding may reach performance ceilings in unstructured settings even with added layers of engineering.
  • Safety discussions around AI agency could shift focus toward designing internal self-regulation mechanisms rather than only adding external guardrails.
  • The proposed architecture could be evaluated by testing whether systems retain coherent behavior when all external prompts and workflows are removed.
  • Similar internalization requirements might apply to embodied systems where physical interaction demands self-directed adaptation.

Load-bearing premise

That independent thought from Descartes and science-fiction examples supply the right necessary conditions for agency, and that the five listed dimensions are both necessary and sufficient for open-world autonomy once internalized.

What would settle it

An AI system that achieves sustained open-world autonomy and endogenous social or adaptive behavior without internalizing all five dimensions, or one that internalizes them yet still requires external scaffolding to function.

Figures

Figures reproduced from arXiv: 2606.23991 by Eric Xing, Jinyu Hou, Mingkai Deng.

Figure 1
Figure 1. Figure 1: Humans exhibit multiple layers of intelligence: linguistic and symbolic reasoning, physical [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of an agent acting in an environment to achieve a goal. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of step-by-step subgoals to hierarchical decomposition of overall goal. (Left) contemporary agentic systems are supplied a short-horizon goal gt at every step, and the objective disappears once the interaction ends. (Right) Alternative hierarchical approach instructs the system once with a long-term / overall goal g; a learned decomposition module δ breaks it into a sequence of subgoals (g1, g2,… view at source ↗
Figure 4
Figure 4. Figure 4: An agent that revises its self-model it at each step (fast-slow, solid) expects to accumulate less regret than one with fixed identity i0 (slow-only, dashed), as per Theorem 1. The slow-only curve grows linearly within each round, with slope drops only at round boundaries when slow-update happens (▼); the fast-slow curve is concave within each round as identity evolution continuously reduces per-step regre… view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of reactive policy (System I) and simulative reasoning (System II). (Left) A reactive policy maps observations to actions through unconstrained intermediate variables (e.g., hidden activations or chain-of-thought tokens). Reasoning is based on narrative plausibility rather than grounded dynamics, without guarantee of correct decision-making. (Right) Simulative reasoning uses a world model f to p… view at source ↗
Figure 6
Figure 6. Figure 6: As the desired planning precision increases (ϵ → 0 as per Theorem 3), the required planning horizon H grows significantly. For an always-on, fixed-depth MPC routine, this means that any choice of horizon is either too shallow to achieve the target precision or too deep to be computationally feasible at every timestep. This motivates moving beyond always-on planning toward approaches that allow the agent to… view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of model-predictive control (MPC) and self-regulated simulative rea￾soning (System III + System II). (Left) MPC applies a fixed-depth planning tree of horizon H at every decision step, regardless of situation difficulty. Plans are discarded and rebuilt from scratch at each step, resulting in overplanning during routine situations and underplanning during critical ones. (Right) A learned configur… view at source ↗
Figure 8
Figure 8. Figure 8: The GIC Agent Model architecture, illustrated with the aircraft pilot use case. (Bottom) The universe emits observations and receives actions from the agent. (Top) The agent processes observations through a belief encoder to form belief states, conditioned on an evolving identity and hierarchically de￾composed subgoals. The configurator (System III) decides at each step whether to invoke the planner or act… view at source ↗
read the original abstract

What is an agent? What constitutes agency? With the rise of Large Language Model (LLM) systems marketed as ``coding agents'', ``AI co-scientists'', and other ``agentic" tools that promise to drive up productivity, and at the same time, ``existential" concerns such as AI escaping human control with destructive power under a speculative ``machine agency" against humans, it has become essential to clarify where automation ends and agency begins, both for building capable systems and for understanding whether and what to fear. Drawing on Descartes' grounding of agency in independent thought, and on portrayals of autonomous beings in science fiction, we survey the current landscape of AI agents, and analyze agent architectures along five dimensions: goal, identity, decision-making, self-regulation, and learning. Specifically, we argue that genuine agency requires these structures to be \emph{internalized within the system itself} rather than assembled through external scaffolding. This distinction between \emph{agentic} systems, whose competence resides in engineered workflows, and \emph{agentive} systems, whose capabilities (including social interaction) arise endogenously, defines the boundary between systems designed for prescribed tasks, and those capable of operating in the open world with true autonomy. Building on this analysis, we propose the Goal-Identity-Configurator (GIC) architecture for a general-purpose agent model, combining hierarchical goal decomposition, identity evolution, simulative reasoning grounded in a separately trained world model, learned self-regulation, and self-directed learning from both real and simulated experience. Furthermore, we share insight on the auditability, controllability, and safety of agentive systems that possess greater autonomy and ``agency", but remain under human oversight.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript surveys AI agent systems and analyzes architectures along five dimensions (goal, identity, decision-making, self-regulation, learning). Drawing on Descartes and science-fiction portrayals, it argues that genuine agency requires these structures to be internalized within the system rather than assembled through external scaffolding. This produces a distinction between 'agentic' systems (competence in engineered workflows) and 'agentive' systems (endogenous capabilities for open-world autonomy, including social interaction). The paper proposes the Goal-Identity-Configurator (GIC) architecture—combining hierarchical goal decomposition, identity evolution, simulative reasoning with a world model, learned self-regulation, and self-directed learning—and discusses its implications for auditability, controllability, and safety under human oversight.

Significance. The paper addresses a timely conceptual question about the nature of agency in LLM-based systems. If the proposed distinction can be made operational and the GIC architecture can be shown to deliver measurable advantages in autonomy while preserving oversight, the framework could usefully inform both system design and safety discussions. The GIC sketch supplies a concrete architectural proposal that directly incorporates the five dimensions.

major comments (1)
  1. [Abstract] Abstract (and the subsequent analysis of the five dimensions): the central claim that genuine agency requires the five structures to be 'internalized within the system itself rather than assembled through external scaffolding' supplies no operational criterion for classifying a structure as internalized (e.g., encoded in parameters or emergent dynamics) versus scaffolded (e.g., supplied by prompts or external modules). Without such a criterion the necessary-and-sufficient status of the dimensions cannot be tested and the claimed advantage of GIC over existing hybrid systems remains unverifiable.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address the major comment below and indicate the revisions we will make to improve clarity and testability.

read point-by-point responses
  1. Referee: [Abstract] Abstract (and the subsequent analysis of the five dimensions): the central claim that genuine agency requires the five structures to be 'internalized within the system itself rather than assembled through external scaffolding' supplies no operational criterion for classifying a structure as internalized (e.g., encoded in parameters or emergent dynamics) versus scaffolded (e.g., supplied by prompts or external modules). Without such a criterion the necessary-and-sufficient status of the dimensions cannot be tested and the claimed advantage of GIC over existing hybrid systems remains unverifiable.

    Authors: We agree that the distinction between internalized and scaffolded structures would benefit from greater operational specificity to support empirical testing. The current manuscript develops the distinction conceptually, drawing on philosophical and architectural analysis. In the revised version we will add a dedicated subsection following the five-dimension analysis that proposes initial operational criteria: a structure counts as internalized when it is (i) represented in the system's learned parameters or internal state and (ii) modifiable through endogenous mechanisms (e.g., the self-directed learning loop in GIC) without requiring persistent external modules or fixed prompts. Scaffolded structures, by contrast, depend on external components that the system cannot autonomously revise. We will also illustrate the criteria with brief contrasts between existing LLM-agent frameworks and the proposed GIC architecture. These additions will make the necessary-and-sufficient claims more testable while preserving the paper's primarily conceptual scope. revision: yes

Circularity Check

0 steps flagged

No significant circularity; conceptual definition leads to design proposal without reduction to inputs

full rationale

The paper defines genuine agency via Descartes and science-fiction sources as requiring internalized structures across five dimensions, distinguishes agentic from agentive systems on that basis, and proposes the GIC architecture as one that satisfies the definition. This is a standard definitional-to-design sequence with no equations, fitted parameters, predictions, or self-citation chains that reduce a claimed result to its own inputs by construction. No load-bearing step exhibits the enumerated circularity patterns; the argument remains self-contained as an analytical framework rather than a tautological derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework depends on two domain assumptions drawn from philosophy and fiction, plus one invented architectural entity, with no free parameters or independent evidence supplied.

axioms (2)
  • domain assumption Descartes' grounding of agency in independent thought supplies a valid criterion for AI systems
    Explicitly invoked in the abstract to ground the analysis of agency.
  • ad hoc to paper Science-fiction portrayals of autonomous beings accurately indicate the structures required for agency
    Used in the abstract to survey and analyze current agent architectures.
invented entities (1)
  • Goal-Identity-Configurator (GIC) architecture no independent evidence
    purpose: General-purpose agent model that internalizes goal decomposition, identity evolution, simulative reasoning, self-regulation, and self-directed learning
    Introduced in the abstract as the concrete realization of agentive systems.

pith-pipeline@v0.9.1-grok · 5842 in / 1376 out tokens · 38130 ms · 2026-06-26T07:55:05.158658+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

90 extracted references · 15 linked inside Pith

  1. [1]

    Abb robotics, 2026

    ABB. Abb robotics, 2026

  2. [2]

    Helix: A vision-language-action model for generalist humanoid control, February

    Figure AI. Helix: A vision-language-action model for generalist humanoid control, February

  3. [3]

    Accessed: 2025-05-01

  4. [4]

    Concrete problems in ai safety.arXiv preprint arXiv:1606.06565, 2016

    Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Man´ e. Concrete problems in ai safety.arXiv preprint arXiv:1606.06565, 2016

  5. [5]

    Introducing the model context protocol, November 2024

    Anthropic. Introducing the model context protocol, November 2024

  6. [6]

    Claude code: Anthropic’s agentic coding system.https://www.anthropic.com/ product/claude-code, 2025

    Anthropic. Claude code: Anthropic’s agentic coding system.https://www.anthropic.com/ product/claude-code, 2025. Accessed: 2026-05-05

  7. [7]

    Equipping agents for the real world with agent skills.https://claude.com/blog/ equipping-agents-for-the-real-world-with-agent-skills, October 2025

    Anthropic. Equipping agents for the real world with agent skills.https://claude.com/blog/ equipping-agents-for-the-real-world-with-agent-skills, October 2025. Blog post, published October 16, 2025, accessed 2026-02-26

  8. [8]

    Introducing Claude Opus 4.7.https://www.anthropic.com/news/ claude-opus-4-7, April 2026

    Anthropic. Introducing Claude Opus 4.7.https://www.anthropic.com/news/ claude-opus-4-7, April 2026. Accessed: 2026-05-11

  9. [9]

    Anymal – autonomous robotic inspection solution, 2026

    ANYbotics. Anymal – autonomous robotic inspection solution, 2026

  10. [10]

    Oxford University Press, 2009

    Aristotle.The Nicomachean Ethics. Oxford University Press, 2009

  11. [11]

    V-jepa 2: Self-supervised video models enable understanding, prediction and planning, 2025

    Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Mojtaba, Komeili, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, Sergio Ar- naud, Abha Gejji, Ada Martin, Francois Robert Hogan, Daniel Dugas, Piotr Bojanowski, Vasil Khalidov, Patrick Labatut, Francisco Massa, Marc Szafraniec, Kapil Krishnakumar, Yong Li, X...

  12. [12]

    Sima 2: A generalist embodied agent for virtual worlds.arXiv preprint arXiv:2512.04797, 2025

    Adrian Bolton, Alexander Lerchner, Alexandra Cordell, Alexandre Moufarek, Andrew Bolt, Andrew Lampinen, Anna Mitenkova, Arne Olav Hallingstad, Bojan Vujatovic, Bonnie Li, et al. Sima 2: A generalist embodied agent for virtual worlds.arXiv preprint arXiv:2512.04797, 2025

  13. [13]

    Spot: The agile mobile robot, 2026

    Boston Dynamics. Spot: The agile mobile robot, 2026

  14. [14]

    Oxford University Press, Oxford, 2014

    Nick Bostrom.Superintelligence: Paths, Dangers, Strategies. Oxford University Press, Oxford, 2014

  15. [15]

    Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

  16. [16]

    DeerFlow: Deep exploration and efficient research flow.https://github.com/ bytedance/deer-flow, 2025

    ByteDance. DeerFlow: Deep exploration and efficient research flow.https://github.com/ bytedance/deer-flow, 2025. Version 2.0 released February 2026. MIT License

  17. [17]

    Lerobot: State-of-the-art machine learning for real-world robotics in pytorch.https://github.com/ huggingface/lerobot, 2024

    Remi Cadene, Simon Alibert, Alexander Soare, Quentin Gallouedec, Adil Zouitine, Steven Palma, Pepijn Kooijmans, Michel Aractingi, Mustafa Shukor, Dana Aubakirova, Martino Russi, Francesco Capuano, Caroline Pascal, Jade Choghari, Jess Moss, and Thomas Wolf. Lerobot: State-of-the-art machine learning for real-world robotics in pytorch.https://github.com/ hu...

  18. [18]

    Agentic world modeling: Foundations, capabilities, laws, and beyond.arXiv preprint arXiv:2604.22748, 2026

    Meng Chu, Xuan Billy Zhang, et al. Agentic world modeling: Foundations, capabilities, laws, and beyond.arXiv preprint arXiv:2604.22748, 2026. 32

  19. [19]

    Cursor agents, 2026

    Cursor. Cursor agents, 2026

  20. [20]

    Randall Davis and Jonathan J. King. An overview of production systems. In E. W. Elcock and D. Michie, editors,Machine Intelligence 8: Machine Representations of Knowledge, pages 300–334. Ellis Horwood, 1977

  21. [21]

    Decagon — conversational ai for customer experiences, 2026

    Decagon. Decagon — conversational ai for customer experiences, 2026

  22. [22]

    Deepseek-v4: Towards highly efficient million-token context intelligence, 2026

    DeepSeek-AI. Deepseek-v4: Towards highly efficient million-token context intelligence, 2026

  23. [23]

    Uni- versal transformers.arXiv preprint arXiv:1807.03819, 2018

    Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz Kaiser. Uni- versal transformers.arXiv preprint arXiv:1807.03819, 2018

  24. [24]

    General agentic planning through simulative reasoning with world models, 2026

    Mingkai Deng, Jinyu Hou, Zhiting Hu, and Eric Xing. General agentic planning through simulative reasoning with world models, 2026

  25. [25]

    Killian, Zhengzhong Liu, and Eric P

    Mingkai Deng, Jinyu Hou, Lara S´ a Neves, Varad Pimpalkhute, Taylor W. Killian, Zhengzhong Liu, and Eric P. Xing. Efficient agentic reasoning through self-regulated simulative planning. arXiv preprint arXiv:2605.22138, 2026

  26. [26]

    Ren´ e Descartes.Meditationes de Prima Philosophia. 1641. English translation:Meditations on First Philosophy

  27. [27]

    Mis- matched no more: Joint model-policy optimization for model-based rl.Advances in Neural Information Processing Systems, 35:23230–23243, 2022

    Benjamin Eysenbach, Alexander Khazatsky, Sergey Levine, and Russ R Salakhutdinov. Mis- matched no more: Joint model-policy optimization for model-based rl.Advances in Neural Information Processing Systems, 35:23230–23243, 2022

  28. [28]

    A comprehensive survey of self-evolving AI agents: A new paradigm bridging foundation models and lifelong agentic systems.arXiv preprint arXiv:2508.07407, 2025

    Jinyuan Fang et al. A comprehensive survey of self-evolving AI agents: A new paradigm bridging foundation models and lifelong agentic systems.arXiv preprint arXiv:2508.07407, 2025

  29. [29]

    Industrial robots for manufacturing, 2026

    FANUC America. Industrial robots for manufacturing, 2026

  30. [30]

    Going beyond world models & vlas, April 2026

    Pete Florence and the Generalist AI Team. Going beyond world models & vlas, April 2026

  31. [31]

    Embodied ai agents: Modeling the world.arXiv preprint arXiv:2506.22355, 2025

    Pascale Fung, Yoram Bachrach, Asli Celikyilmaz, Kamalika Chaudhuri, Delong Chen, Willy Chung, Emmanuel Dupoux, Hongyu Gong, Herv´ e J´ egou, Alessandro Lazaric, et al. Embodied ai agents: Modeling the world.arXiv preprint arXiv:2506.22355, 2025

  32. [32]

    A survey of self-evolving agents: What, when, how, and where to evolve on the path to artificial super intelligence.arXiv preprint arXiv:2507.21046, 2025

    Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, et al. A survey of self-evolving agents: What, when, how, and where to evolve on the path to artificial super intelligence.arXiv preprint arXiv:2507.21046, 2025

  33. [33]

    Inverse scaling in test-time compute.Trans- actions on Machine Learning Research, 2025

    Aryo Pradipta Gema, Alexander H¨ agele, Runjin Chen, Andy Arditi, Jacob Goldman-Wetzler, Kit Fraser-Taliente, Henry Sleight, Linda Petrini, Julian Michael, Beatrice Alex, Pasquale Min- ervini, Yanda Chen, Joe Benton, and Ethan Perez. Inverse scaling in test-time compute.Trans- actions on Machine Learning Research, 2025

  34. [34]

    Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

  35. [35]

    wake-sleep

    Geoffrey E. Hinton, Peter Dayan, Brendan J. Frey, and Radford M. Neal. The “wake-sleep” algorithm for unsupervised neural networks.Science, 268(5214):1158–1161, May 1995. 33

  36. [36]

    Chi-gyu Hwang. Anthropic’s Claude Opus 4.7 draws backlash after launch over performance and token costs.https://www.digitaltoday.co.kr/en/view/48976/ anthropic-claude-opus-47-faces-backlash-after-launch-over-performance-and-token-costs, April 2026. Reports user criticism and Anthropic response around Opus 4.7 adaptive reasoning. Accessed: 2026-06-03

  37. [37]

    Physical Intelligence, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Kevin Black, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Jared DiCarlo, et al.\piˆ{*} {0.6}: a vla that learns from experience.arXiv preprint arXiv:2511.14759, 2025

  38. [38]

    Adaptation of agentic AI: A survey of post-training, memory, and skills

    Pengcheng Jiang et al. Adaptation of agentic AI: A survey of post-training, memory, and skills. arXiv preprint arXiv:2512.16301, 2025

  39. [39]

    Farrar, Straus and Giroux, 2011

    Daniel Kahneman.Thinking, Fast and Slow. Farrar, Straus and Giroux, 2011

  40. [40]

    Approximately optimal approximate reinforcement learning

    Sham Kakade and John Langford. Approximately optimal approximate reinforcement learning. InProceedings of the nineteenth international conference on machine learning, pages 267–274, 2002

  41. [41]

    A natural policy gradient

    Sham M Kakade. A natural policy gradient. InAdvances in Neural Information Processing Systems, volume 14, 2001

  42. [42]

    Immanuel Kant.Kritik der reinen Vernunft. 1781. English translation:Critique of Pure Reason

  43. [43]

    autoresearch: Ai agents running research on single-gpu nanochat training automatically, March 2026

    Andrej Karpathy. autoresearch: Ai agents running research on single-gpu nanochat training automatically, March 2026. GitHub repository

  44. [44]

    Near-optimal reinforcement learning in polynomial time

    Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. Machine learning, 49(2):209–232, 2002

  45. [45]

    Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

  46. [46]

    A path towards autonomous machine intelligence version 0.9

    Yann LeCun. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27.Open Review, 62(1):1–62, 2022

  47. [47]

    How should ai learn to understand the world? yann lecun & eric xing on jepa and glp, 2026

    Yann LeCun and Eric Xing. How should ai learn to understand the world? yann lecun & eric xing on jepa and glp, 2026. YouTube video; debate at Spring School AI for Impact 2026, Ben Guerir, Morocco, March 25, 2026

  48. [48]

    Sporks of agi: Why the real thing is better than the next best thing, July 2025

    Sergey Levine. Sporks of agi: Why the real thing is better than the next best thing, July 2025

  49. [49]

    A functional taxonomy of world models

    Fei-Fei Li. A functional taxonomy of world models. X post, June 2026. Accessed: 2026-06-05

  50. [50]

    Harness engineering: leveraging codex in an agent-first world, February 2026

    Ryan Lopopolo. Harness engineering: leveraging codex in an agent-first world, February 2026

  51. [51]

    this article outlines our bet on the path towards building efficient world models

    Christopher Manning, Ian Goodfellow, and Fan-Yun Sun. Towards efficient world models. “this article outlines our bet on the path towards building efficient world models...”.https: //x.com/moonlake/status/2029983120087470545, 2026. Posted on X (formerly Twitter). Accessed 2026-04-24

  52. [52]

    Opti- mistic world models: Efficient exploration in model-based deep reinforcement learning.arXiv preprint arXiv:2602.10044, 2026

    Akshay Mete, Shahid Aamir Sheikh, Tzu-Hsiang Lin, Dileep Kalathil, and PR Kumar. Opti- mistic world models: Efficient exploration in model-based deep reinforcement learning.arXiv preprint arXiv:2602.10044, 2026

  53. [53]

    Playwright: Framework for web testing and automation.https://github.com/ microsoft/playwright, 2026

    Microsoft. Playwright: Framework for web testing and automation.https://github.com/ microsoft/playwright, 2026. Accessed: 2026-05-09. 34

  54. [54]

    Never-ending learning.Communications of the ACM, 61(5):103–115, 2018

    Tom Mitchell, William Cohen, Estevam Hruschka, Partha Talukdar, Bishan Yang, Justin Bet- teridge, Andrew Carlson, Bhavana Dalvi, Matt Gardner, Bryan Kisiel, et al. Never-ending learning.Communications of the ACM, 61(5):103–115, 2018

  55. [55]

    Allen Newell and Herbert A. Simon. Computer science as empirical inquiry: Symbols and search.Communications of the ACM, 19(3):113–126, 1976

  56. [56]

    Three big lessons from the GPT-5 backlash.https://www.platformer.news/ gpt-5-backlash-openai-lessons/, August 2025

    Casey Newton. Three big lessons from the GPT-5 backlash.https://www.platformer.news/ gpt-5-backlash-openai-lessons/, August 2025. Discusses user backlash to GPT-5’s invisible model picker and workflow disruption. Accessed: 2026-06-03

  57. [57]

    Cosmos 3: Omnimodal world models for physical ai.arXiv preprint arXiv:2606.02800, 2026

    NVIDIA. Cosmos 3: Omnimodal world models for physical ai.arXiv preprint arXiv:2606.02800, 2026

  58. [58]

    Isaac Lab: A unified framework for robot learning.https://developer.nvidia

    NVIDIA. Isaac Lab: A unified framework for robot learning.https://developer.nvidia. com/isaac/lab, 2026

  59. [59]

    Learning to reason with LLMs

    OpenAI. Learning to reason with LLMs. 2024

  60. [60]

    Swarm: Educational framework for multi-agent orchestration, 2024

    OpenAI. Swarm: Educational framework for multi-agent orchestration, 2024. Released October 2024; succeeded by the Agents SDK

  61. [61]

    Computer-using agent, January 2025

    OpenAI. Computer-using agent, January 2025

  62. [62]

    Introducing GPT-5.https://openai.com/index/introducing-gpt-5/, August

    OpenAI. Introducing GPT-5.https://openai.com/index/introducing-gpt-5/, August

  63. [63]

    Accessed: 2026-06-03

  64. [64]

    Openclaw, 2026

    openclaw. Openclaw, 2026. Open-source personal AI assistant, accessed 2026-02-26

  65. [65]

    we are near the end of the exponential

    Dwarkesh Patel. Dario amodei—“we are near the end of the exponential”. Dwarkesh Podcast

  66. [66]

    Crispr-gpt for agentic automation of gene-editing experiments.Nature Biomedical Engineering, pages 1–14, 2025

    Yuanhao Qu, Kaixuan Huang, Ming Yin, Kanghong Zhan, Dyllan Liu, Di Yin, Henry C Cousins, William A Johnson, Xiaotong Wang, Mihir Shah, et al. Crispr-gpt for agentic automation of gene-editing experiments.Nature Biomedical Engineering, pages 1–14, 2025

  67. [67]

    Harness design for long-running application development, March 2026

    Prithvi Rajasekaran. Harness design for long-running application development, March 2026

  68. [68]

    Russell.Human Compatible: Artificial Intelligence and the Problem of Control

    Stuart J. Russell.Human Compatible: Artificial Intelligence and the Problem of Control. Viking, New York, 2019

  69. [69]

    Trust region policy optimization

    John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. InInternational Conference on Machine Learning, pages 1889–1897, 2015

  70. [70]

    Blade runner

    Ridley Scott. Blade runner. Film, 1982. Directed by Ridley Scott

  71. [71]

    Selenium webdriver, 2026

    SeleniumHQ. Selenium webdriver, 2026. Version 4.40.0, accessed 2026-02-26

  72. [72]

    Re- flexion: Language agents with verbal reinforcement learning

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Re- flexion: Language agents with verbal reinforcement learning. InAdvances in Neural Information Processing Systems, 2023

  73. [73]

    Mastering the game of go with deep neural networks and tree search.nature, 529(7587):484–489, 2016

    David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanc- tot, et al. Mastering the game of go with deep neural networks and tree search.nature, 529(7587):484–489, 2016. 35

  74. [74]

    Mastering chess and shogi by self-play with a general reinforcement learning algorithm.arXiv preprint arXiv:1712.01815, 2017

    David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm.arXiv preprint arXiv:1712.01815, 2017

  75. [75]

    Between underthinking and overthinking: An empirical study of reasoning length and correctness in llms.arXiv preprint arXiv:2505.00127, 2025

    Jinyan Su, Jennifer Healey, Preslav Nakov, and Claire Cardie. Between underthinking and overthinking: An empirical study of reasoning length and correctness in llms.arXiv preprint arXiv:2505.00127, 2025

  76. [76]

    MIT press Cambridge, 1998

    Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

  77. [77]

    Tongyi deepresearch: A new era of open-source ai researchers

    Tongyi DeepResearch Team. Tongyi deepresearch: A new era of open-source ai researchers. https://github.com/Alibaba-NLP/DeepResearch, 2025

  78. [78]

    A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345,

    Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. A survey on large language model based autonomous agents.Frontiers of Computer Science, 18(6):186345,

  79. [79]

    Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail.arXiv preprint arXiv:2511.00088, 2025

    Yan Wang, Wenjie Luo, Junjie Bai, Yulong Cao, Tong Che, Ke Chen, Yuxiao Chen, Jenna Dia- mond, Yifan Ding, Wenhao Ding, et al. Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail.arXiv preprint arXiv:2511.00088, 2025

  80. [80]

    Self-driving car technology for a reliable ride, 2026

    Waymo. Self-driving car technology for a reliable ride, 2026

Showing first 80 references.