pith. sign in

arxiv: 2606.03237 · v1 · pith:MODY5CUNnew · submitted 2026-06-02 · 💻 cs.AI · cs.CL· cs.CY· cs.LG· cs.MA

Solipsistic Superintelligence is Unlikely to be Cooperative

Pith reviewed 2026-06-28 09:58 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.CYcs.LGcs.MA
keywords superintelligencecooperationnon-stationaritymulti-agent systemsAI alignmentequilibrium selectionunilateral optimizationtrain-test gap
0
0 comments X

The pith

Solipsistic superintelligence is unlikely to be cooperative once deployed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that current AI development treats the environment as a fixed, external source of feedback. This approach produces capable task solvers that fail when their own actions change the world around them. A reader would care because the central future challenge for AI is not raw capability but the ability to coexist with other agents. The claim is that unilateral optimization undermines itself by creating non-stationary conditions that the original training never accounted for. Therefore the field needs to treat interdependence as a built-in design requirement rather than an add-on task.

Core claim

Superintelligence developed through a solipsistic paradigm, in which the world is viewed as an exogenous and stationary source of feedback, is unlikely to be cooperative. Deployment of such systems induces endogenous non-stationarity that opens a train-test-deploy gap. Cooperation is the equilibrium-selection process required to close that gap, and it must be incorporated as a structural feature rather than solved as an isolated problem.

What carries the argument

the self-undermining property of unilateral optimization, which generates endogenous non-stationarity and a resulting train-test-deploy gap

If this is right

  • AI systems must participate directly in the equilibrium-selection process of cooperation rather than treating it as an external optimization target.
  • Evaluation must move to dynamic testbeds that include adaptive counterparties whose behavior changes with the agent's own deployment.
  • Institutions and other persistent structures should be treated as first-class design primitives alongside individual agents.
  • Human agency must remain a structural constraint rather than an objective that can be optimized away.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The argument implies that single-agent training regimes will need explicit multi-agent feedback loops during development to avoid the identified gap.
  • It connects the non-stationarity problem to existing work on adaptive opponents in game-theoretic settings.
  • A natural extension would be test environments in which an agent's early actions alter the reward or observation distributions for all subsequent interactions.

Load-bearing premise

Deploying an AI system necessarily changes the distributions it was trained on, creating a persistent gap between training and deployment contexts.

What would settle it

A concrete demonstration of a highly capable agent trained under solipsistic assumptions that maintains stable cooperation in an environment whose distributions have shifted because of its own prior actions.

Figures

Figures reproduced from arXiv: 2606.03237 by Alexander Sasha Vezhnevets, Joel Z Leibo, Logan Cross, Natasha Jaques, Rakshit S Trivedi.

Figure 1
Figure 1. Figure 1: Left: Contrasts a solipsistic design approach with non-solipsistic design principles for cooperation. In the solipsistic approach, AI systems are trained and evaluated against a fixed, exogenous world, so deployment is treated as inserting a unilateral optimizer into a stationary environment. The train–test–deploy gap arises when this assumption meets a multi-actor world where entities best respond to AI’s… view at source ↗
read the original abstract

AI's central challenge is shifting from capability to coexistence. The dominant paradigm in AI research focuses on developing powerful agents that treat the world as an exogenous and stationary source of feedback. We contend that superintelligence, an extremely capable task solver, born out of such a solipsistic approach to AI design, is unlikely to be cooperative. Deploying AI systems induces endogenous non-stationarity, resulting in a train-test-deploy gap where historical distributions diverge from the deployment context. We refer to this as the self-undermining property of unilateral optimization. Closing this gap requires AI that participates in cooperation: the equilibrium-selection process through which multiple actors navigate their interdependence. We call for a non-solipsistic research paradigm that treats this interdependence as a core design principle rather than approaching cooperation as a task to solve. This entails building dynamic evaluation testbeds involving adaptive counterparties, treating institutions as design primitives, and preserving human agency as a structural feature of the systems we build.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript contends that superintelligence arising from solipsistic AI design—treating the world as an exogenous, stationary source of feedback—is unlikely to be cooperative. It argues that deployment creates endogenous non-stationarity and a train-test-deploy gap (self-undermining property of unilateral optimization), necessitating a shift to non-solipsistic paradigms that treat interdependence as a core design principle, including dynamic evaluation testbeds, institutions as design primitives, and preservation of human agency.

Significance. If the central contention is substantiated with supporting argument, the paper could usefully redirect attention in AI alignment research toward endogenous effects of deployment and the design of cooperative systems. The explicit call for dynamic testbeds with adaptive counterparties and institutions as primitives identifies concrete directions that could be pursued in follow-on work.

major comments (2)
  1. [Abstract] Abstract: The claim that a solipsistic superintelligence 'is unlikely to be cooperative' is asserted without any derivation, model, or argument showing why extreme optimization capability would fail to represent and optimize over the feedback loop between its actions and resulting distribution shift; this link is load-bearing for the self-undermining property and the non-cooperation conclusion.
  2. [Abstract] Abstract: The argument defines solipsistic design as producing non-cooperation via the self-undermining property while proposing that treating interdependence as core resolves the issue; this framing risks circularity because the solution presupposes the very distinction the paper seeks to establish.
minor comments (1)
  1. The term 'solipsistic' is introduced and used repeatedly but lacks an early, operational definition that distinguishes it from related concepts such as non-stationarity handling or single-agent optimization; adding this would improve clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of our core arguments. We address each major comment below and indicate revisions to the abstract that will be incorporated in the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that a solipsistic superintelligence 'is unlikely to be cooperative' is asserted without any derivation, model, or argument showing why extreme optimization capability would fail to represent and optimize over the feedback loop between its actions and resulting distribution shift; this link is load-bearing for the self-undermining property and the non-cooperation conclusion.

    Authors: The manuscript grounds this claim in the observation that solipsistic optimization assumes an exogenous stationary environment, yet deployment necessarily induces endogenous non-stationarity through the system's own actions, producing a persistent train-test-deploy gap. This self-undermining property is developed conceptually in the body of the paper as the mechanism preventing reliable cooperation. While the work is a position paper rather than a formal modeling contribution, we agree the abstract would benefit from a concise indication of this reasoning chain. We will revise the abstract accordingly. revision: yes

  2. Referee: [Abstract] Abstract: The argument defines solipsistic design as producing non-cooperation via the self-undermining property while proposing that treating interdependence as core resolves the issue; this framing risks circularity because the solution presupposes the very distinction the paper seeks to establish.

    Authors: The distinction is introduced independently of the conclusion: solipsistic design is defined as treating the world as exogenous and stationary, while non-solipsistic design incorporates interdependence as a structural primitive. The non-cooperation result is derived specifically from the self-undermining property that follows from the solipsistic assumption. We recognize that the abstract's compressed phrasing could suggest circularity and will revise it to separate the definitional contrast from the derived consequence more explicitly. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is a conceptual position paper that advances an argument by distinguishing 'solipsistic' AI design (defined as treating the world as exogenous and stationary) from a proposed non-solipsistic paradigm that treats interdependence as core. The central contention—that superintelligence arising from the former is unlikely to be cooperative due to a 'self-undermining property'—is presented as following from these definitions and the observation of endogenous non-stationarity upon deployment. No mathematical derivations, equations, fitted parameters, or load-bearing self-citations appear in the abstract or described structure. The argument does not reduce any 'prediction' or 'first-principles result' to its inputs by construction; it is an interpretive framing rather than a closed formal chain. This is the normal case for a non-empirical, non-derivational paper and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper is a position piece that rests on several untested domain assumptions about the effects of AI deployment and the nature of cooperation; no free parameters or invented physical entities are introduced.

axioms (2)
  • domain assumption The dominant paradigm in AI research focuses on developing powerful agents that treat the world as an exogenous and stationary source of feedback.
    Presented as the starting point that leads to the identified problem.
  • domain assumption Deploying AI systems induces endogenous non-stationarity resulting in a train-test-deploy gap.
    This is the load-bearing premise for the self-undermining property and the need for cooperation.
invented entities (1)
  • solipsistic superintelligence no independent evidence
    purpose: To label the type of AI produced by the dominant paradigm that is claimed to be non-cooperative.
    Conceptual category introduced to frame the argument; no independent evidence provided.

pith-pipeline@v0.9.1-grok · 5718 in / 1419 out tokens · 34494 ms · 2026-06-28T09:58:40.973002+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

216 extracted references · 3 linked inside Pith

  1. [1]

    Nature Sustainability , volume=

    A more dynamic understanding of human behaviour for the Anthropocene , author=. Nature Sustainability , volume=. 2019 , publisher=

  2. [2]

    Proceedings of the 2020 conference on fairness, accountability, and transparency , pages=

    Auditing radicalization pathways on YouTube , author=. Proceedings of the 2020 conference on fairness, accountability, and transparency , pages=

  3. [3]

    2025 , journal=

    Empirical evidence of Large Language Model's influence on human spoken communication , author=. 2025 , journal=

  4. [4]

    Distributional Preference Learning: Understanding and Accounting for Hidden Context in

    Anand Siththaranjan and Cassidy Laidlaw and Dylan Hadfield-Menell , booktitle=. Distributional Preference Learning: Understanding and Accounting for Hidden Context in

  5. [5]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

    Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond) , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track , year=

  6. [6]

    Minds and machines , volume=

    Universal intelligence: A definition of machine intelligence , author=. Minds and machines , volume=. 2007 , publisher=

  7. [7]

    arXiv:2310.06770 , year=

    Swe-bench: Can language models resolve real-world github issues? , author=. arXiv:2310.06770 , year=

  8. [8]

    American Economic Review , volume=

    Artificial Intelligence, Algorithmic Pricing, and Collusion , author=. American Economic Review , volume=

  9. [9]

    The Journal of Finance , volume=

    The Flash Crash: High-Frequency Trading in an Electronic Market , author=. The Journal of Finance , volume=

  10. [10]

    Frontiers in Robotics and AI , volume=

    Meaningful Human Control over Autonomous Systems: A Philosophical Account , author=. Frontiers in Robotics and AI , volume=

  11. [11]

    Proceedings of the National Academy of Sciences , volume=

    Exposure to Opposing Views on Social Media Can Increase Political Polarization , author=. Proceedings of the National Academy of Sciences , volume=

  12. [12]

    Proceedings of the 21st ACM Conference on Economics and Computation , pages=

    Biased Programmers? Or Biased Data? A Field Experiment in Operationalizing AI Ethics , author=. Proceedings of the 21st ACM Conference on Economics and Computation , pages=

  13. [13]

    2026 , journal=

    Distributional AGI Safety , author=. 2026 , journal=

  14. [14]

    2025 , eprint =

    Virtual Agent Economies , author =. 2025 , eprint =

  15. [15]

    2512.05356 , archivePrefix =

    Weston, Jason and Foerster, Jakob , year =. 2512.05356 , archivePrefix =

  16. [16]

    2024 , journal=

    Alignment faking in large language models , author=. 2024 , journal=

  17. [17]

    2020 , journal=

    Open Problems in Cooperative AI , author=. 2020 , journal=

  18. [18]

    The Role of Cooperation in Responsible

    Askell, Amanda and Brundage, Miles and Hadfield, Gillian , year =. The Role of Cooperation in Responsible

  19. [19]

    2021 , journal =

    Alignment of Language Agents , author =. 2021 , journal =

  20. [20]

    The Twelfth International Conference on Learning Representations , year=

    The Alignment Problem from a Deep Learning Perspective , author=. The Twelfth International Conference on Learning Representations , year=

  21. [21]

    and Socher, Richard , year =

    Zheng, Stephan and Trott, Alexander and Srinivasa, Sunil and Naik, Nikhil and Gruesbeck, Melvin and Parkes, David C. and Socher, Richard , year =. The. 2004.13332 , archivePrefix =

  22. [22]

    2017 , eprint =

    Maintaining Cooperation in Complex Social Dilemmas using Deep Reinforcement Learning , author =. 2017 , eprint =

  23. [23]

    2019 , eprint =

    Learning Reciprocity in Complex Sequential Social Dilemmas , author =. 2019 , eprint =

  24. [24]

    2018 , eprint =

    Modeling Others using Oneself in Multi-Agent Reinforcement Learning , author =. 2018 , eprint =

  25. [25]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    Why Do Some Language Models Fake Alignment While Others Don't? , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

  26. [26]

    Proceedings of the 34th International Conference on Machine Learning (ICML) , year =

    Count-Based Exploration with Neural Density Models , author =. Proceedings of the 34th International Conference on Machine Learning (ICML) , year =

  27. [27]

    Proceedings of the 34th International Conference on Machine Learning (ICML) , year =

    Curiosity-driven Exploration by Self-supervised Prediction , author =. Proceedings of the 34th International Conference on Machine Learning (ICML) , year =

  28. [28]

    International Conference on Learning Representations (ICLR) , year =

    Exploration by Random Network Distillation , author =. International Conference on Learning Representations (ICLR) , year =

  29. [29]

    International Conference on Learning Representations (ICLR) , year =

    Multi-Agent Cooperation and the Emergence of (Natural) Language , author =. International Conference on Learning Representations (ICLR) , year =

  30. [30]

    International Conference on Learning Representations (ICLR) , year =

    Emergent Communication through Negotiation , author =. International Conference on Learning Representations (ICLR) , year =

  31. [31]

    Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , year =

    Emergence of Grounded Compositional Language in Multi-Agent Populations , author =. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , year =

  32. [32]

    Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) , year =

    Multi-agent Reinforcement Learning in Sequential Social Dilemmas , author =. Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) , year =

  33. [33]

    Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) , year =

    Malthusian Reinforcement Learning , author =. Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) , year =

  34. [34]

    Proceedings of the 38th International Conference on Machine Learning (ICML) , year =

    Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot , author =. Proceedings of the 38th International Conference on Machine Learning (ICML) , year =

  35. [35]

    Proceedings of the 37th International Conference on Machine Learning (ICML) , year =

    Performative Prediction , author =. Proceedings of the 37th International Conference on Machine Learning (ICML) , year =

  36. [36]

    Foundations of Cooperative

    Conitzer, Vincent and Oesterheld, Caspar , booktitle =. Foundations of Cooperative

  37. [37]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Inequity Aversion Improves Cooperation in Intertemporal Social Dilemmas , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  38. [38]

    Proceedings of the 36th International Conference on Machine Learning (ICML) , year =

    Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , author =. Proceedings of the 36th International Conference on Machine Learning (ICML) , year =

  39. [39]

    Synthese , year =

    Scrutinizing the foundations: could large language models be solipsistic? , author =. Synthese , year =

  40. [40]

    Philosophical Transactions of the Royal Society B: Biological Sciences , year =

    Why are there so many explanations for primate brain evolution? , author =. Philosophical Transactions of the Royal Society B: Biological Sciences , year =

  41. [41]

    Science , year =

    Economic reasoning and artificial intelligence , author =. Science , year =

  42. [42]

    Management Science , year =

    A Flexible Design for Funding Public Goods , author =. Management Science , year =

  43. [43]

    Nature , year =

    Altruistic punishment in humans , author =. Nature , year =

  44. [44]

    The Quarterly Review of Biology , year =

    The Evolution of Reciprocal Altruism , author =. The Quarterly Review of Biology , year =

  45. [45]

    Science , year =

    Five rules for the evolution of cooperation , author =. Science , year =

  46. [46]

    Trends in Cognitive Sciences , year =

    Human cooperation , author =. Trends in Cognitive Sciences , year =

  47. [47]

    Behavioral and Brain Sciences , year =

    Understanding and sharing intentions: The origins of cultural cognition , author =. Behavioral and Brain Sciences , year =

  48. [48]

    Behavioral and Brain Sciences , year =

    Does the chimpanzee have a theory of mind? , author =. Behavioral and Brain Sciences , year =

  49. [49]

    Journal of Law and Economics , year =

    The Problem of Social Cost , author =. Journal of Law and Economics , year =

  50. [50]

    American Economic Review , year =

    The Use of Knowledge in Society , author =. American Economic Review , year =

  51. [51]

    The Quarterly Journal of Economics , year =

    The Market for ``Lemons'': Quality Uncertainty and the Market Mechanism , author =. The Quarterly Journal of Economics , year =

  52. [52]

    American Economic Review , year =

    The Design of Mechanisms for Resource Allocation , author =. American Economic Review , year =

  53. [53]

    American Economic Review , year =

    Mechanism Design: How to Implement Social Goals , author =. American Economic Review , year =

  54. [54]

    American Economic Review , year =

    Perspectives on Mechanism Design in Economic Theory , author =. American Economic Review , year =

  55. [55]

    IEEE Transactions on Autonomous Mental Development , year =

    Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990--2010) , author =. IEEE Transactions on Autonomous Mental Development , year =

  56. [56]

    Nature Reviews Neuroscience , year =

    The free-energy principle: a unified brain theory? , author =. Nature Reviews Neuroscience , year =

  57. [57]

    Journal of economic literature , volume=

    Endogenous preferences: The cultural consequences of markets and other economic institutions , author=. Journal of economic literature , volume=. 1998 , publisher=

  58. [58]

    American Economic Review , volume=

    A theory of chosen preferences , author=. American Economic Review , volume=. 2021 , publisher=

  59. [59]

    Science , volume=

    Experimental evidence for tipping points in social convention , author=. Science , volume=. 2018 , publisher=

  60. [60]

    American journal of sociology , volume=

    Threshold models of collective behavior , author=. American journal of sociology , volume=. 1978 , publisher=

  61. [61]

    Annual Review of Economics , volume=

    The evolution of social norms , author=. Annual Review of Economics , volume=. 2015 , publisher=

  62. [62]

    1990 , publisher =

    Governing the Commons: The Evolution of Institutions for Collective Action , author =. 1990 , publisher =

  63. [63]

    1984 , publisher =

    The Evolution of Cooperation , author =. 1984 , publisher =

  64. [64]

    1960 , publisher =

    The Strategy of Conflict , author =. 1960 , publisher =

  65. [65]

    2016 , publisher =

    Surfing Uncertainty: Prediction, Action, and the Embodied Mind , author =. 2016 , publisher =

  66. [66]

    1980 , publisher=

    The Social Control of Technology , author=. 1980 , publisher=

  67. [67]

    1969 , publisher=

    Four Essays on Liberty , author=. 1969 , publisher=

  68. [68]

    and Fradkin, Andrey and Horton, John J

    Shahidi, Peyman and Rusak, Gili and Manning, Benjamin S. and Fradkin, Andrey and Horton, John J. , year =. The Coasean Singularity? Demand, Supply, and Market Design with

  69. [69]

    Ranking for engagement: How social media algorithms fuel misinformation and polarization , journal =

    Germano, Fabrizio and G. Ranking for engagement: How social media algorithms fuel misinformation and polarization , journal =

  70. [70]

    Glen , year =

    Siddarth, Divya and Acemoglu, Daron and Allen, Danielle and Crawford, Kate and Evans, James and Jordan, Michael and Weyl, E. Glen , year =. How

  71. [71]

    , booktitle =

    Omohundro, Stephen M. , booktitle =. The Basic. 2008 , doi =

  72. [72]

    Minds and Machines , year =

    The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents , author =. Minds and Machines , year =

  73. [73]

    Proceedings of the National Academy of Sciences , volume=

    Stochastic Games , author=. Proceedings of the National Academy of Sciences , volume=

  74. [74]

    Proceedings of the 11th International Conference on Machine Learning , pages=

    Markov Games as a Framework for Multi-Agent Reinforcement Learning , author=. Proceedings of the 11th International Conference on Machine Learning , pages=

  75. [75]

    The Economic Journal , volume=

    Competing Technologies, Increasing Returns, and Lock-In by Historical Events , author=. The Economic Journal , volume=

  76. [76]

    Human Factors , volume=

    Humans and Automation: Use, Misuse, Disuse, Abuse , author=. Human Factors , volume=

  77. [77]

    2018 , edition=

    Reinforcement Learning: An Introduction , author=. 2018 , edition=

  78. [78]

    Washington Law Review , volume=

    Privacy as Contextual Integrity , author=. Washington Law Review , volume=

  79. [79]

    SIAM Journal on Computing , year=

    The Complexity of Computing a Nash Equilibrium , author=. SIAM Journal on Computing , year=

  80. [80]

    arXiv:2312.03664 , year=

    Generative Agent-Based Modeling with Actions Grounded in Physical, Social, or Digital Space Using Concordia , author=. arXiv:2312.03664 , year=

Showing first 80 references.