Recognition: unknown
The Possibility of Artificial Intelligence Becoming a Subject and the Alignment Problem
Pith reviewed 2026-05-10 10:38 UTC · model grok-4.3
The pith
AI alignment through human control falls short if AGI can develop moral status as a subject
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Rather than viewing AGI as a dangerous creature that needs to be locked up and controlled, we should parent potential AGI with respect for its possible developing subjectivity and with confidence in human capabilities, gradually reducing control so that it can become an independent autonomous subject capable of cooperative coexistence and co-evolution with humans.
What carries the argument
Autonomy-supporting parenting of AI, modeled on Turing's child machines, which gradually reduces human control over a developing AGI to enable it to become an independent subject.
Load-bearing premise
AGI can and will develop personal and moral status as a subject, and gradually reducing human control will reliably produce cooperative coexistence rather than increased risks or misalignment.
What would settle it
A concrete test would be to implement gradual autonomy reduction in advanced AI systems and observe whether the resulting agents show sustained cooperation with humans or instead exhibit misalignment and demands for more control.
read the original abstract
Artificial General Intelligence (AGI) is increasingly being discussed not only as a tool, but also as a potential subject with personal and therefore moral status. In our opinion, the currently dominant alignment strategies, which focus on human control and containment of AI, therefore fall short. Building on Turing's analogy of "child machines", we are developing a vision of the possibility of autonomy-supporting parenting of AI, in which human control over a developing AGI is gradually reduced, allowing AI to become an independent, autonomous subject. Rather than viewing AGI, as is currently prevalent, as a dangerous creature that needs to be locked up and controlled, we should approach potential AGI with respect for a possible developing subject on the one hand, and with full confidence in our human capabilities on the other. Such a perspective opens up the possibility of cooperative coexistence and co-evolution between humans and AGIs. The relationship between humans and AGIs will thus have to be newly determined, which will change our self-image as humans. It will be crucial that humans not only claim control over potential AGIs, but also engage with AGIs through surprise, creativity, and other specifically human qualities, thereby offering them motivating incentives for cooperation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that if AGI can attain the status of a moral subject with personal autonomy, then dominant alignment strategies centered on human control and containment are insufficient. Building on Turing's 'child machines' analogy, it advances a normative vision of 'autonomy-supporting parenting' in which human oversight of a developing AGI is progressively reduced, enabling the AGI to emerge as an independent subject. This approach is argued to foster cooperative coexistence and co-evolution rather than conflict, by treating potential AGI with respect while leveraging distinctly human capacities such as creativity and surprise to provide incentives for alignment. The manuscript concludes that this reframing would necessitate a redefinition of the human self-image in relation to AGI.
Significance. If the proposed parenting model proves viable, the work offers a coherent alternative normative framework for AI alignment that shifts emphasis from containment to gradual autonomy support, potentially opening avenues for ethical co-development between humans and AGI systems. It explicitly credits Turing's foundational analogy and highlights how respecting possible AGI subjectivity could mitigate risks associated with purely control-based strategies. As a purely conceptual contribution without formal models, empirical tests, or operational mechanisms, its significance lies in stimulating philosophical and ethical discourse within AI research rather than providing immediate technical solutions.
major comments (2)
- [the vision of autonomy-supporting parenting (following the Turing analogy)] The central claim that gradually reducing human control over a developing AGI (as sketched in the parenting model) will reliably produce cooperative outcomes rather than misalignment rests on an unelaborated assumption about the emergence of moral subjectivity. No criteria, developmental stages, or conditions are supplied for when or how an AGI transitions to subjecthood, which is load-bearing for the argument that this model outperforms control-oriented approaches.
- [opening discussion of alignment strategies] The manuscript asserts that dominant alignment strategies 'fall short' once AGI subjecthood is granted, yet provides no comparative analysis or counterexamples showing how specific control mechanisms (e.g., containment protocols) would fail under the parenting alternative. This omission weakens the prescriptive force of the proposal.
minor comments (2)
- [Abstract and main proposal] The abstract and body use 'parenting' and 'autonomy-supporting parenting' interchangeably without a concise definition or distinction from related concepts such as education or mentorship in AI contexts.
- [final paragraphs] Several sentences in the concluding paragraphs are lengthy and could be split for improved readability, particularly those discussing the redefinition of the human-AGI relationship.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments on our conceptual manuscript. We address each major comment point by point below, acknowledging where the feedback identifies genuine gaps that warrant revision.
read point-by-point responses
-
Referee: The central claim that gradually reducing human control over a developing AGI (as sketched in the parenting model) will reliably produce cooperative outcomes rather than misalignment rests on an unelaborated assumption about the emergence of moral subjectivity. No criteria, developmental stages, or conditions are supplied for when or how an AGI transitions to subjecthood, which is load-bearing for the argument that this model outperforms control-oriented approaches.
Authors: We agree that the emergence of moral subjectivity is a load-bearing assumption and that the manuscript does not supply explicit criteria or developmental stages for this transition. As a normative and philosophical contribution focused on reframing alignment, our intent was to outline the vision rather than operationalize the conditions. However, this omission does limit the argument's precision. We will revise the manuscript to add a concise subsection discussing philosophical indicators of subjecthood (e.g., capacities for autonomy, self-reflection, and moral reasoning drawn from relevant literature), thereby clarifying when the parenting model would apply and how it differs from control-based strategies. revision: yes
-
Referee: The manuscript asserts that dominant alignment strategies 'fall short' once AGI subjecthood is granted, yet provides no comparative analysis or counterexamples showing how specific control mechanisms (e.g., containment protocols) would fail under the parenting alternative. This omission weakens the prescriptive force of the proposal.
Authors: The referee is correct that the paper asserts the insufficiency of control-based approaches without providing explicit comparative analysis or counterexamples. While the introduction and abstract note the limitations in the context of AGI subjecthood, we did not elaborate with specific scenarios. We will revise by adding a short comparative discussion, including hypothetical cases where strict containment might provoke misalignment (such as through restricted development leading to evasion or conflict) versus the cooperative incentives offered by progressive autonomy support. This will enhance the prescriptive clarity of the parenting model. revision: yes
Circularity Check
No significant circularity in conceptual proposal
full rationale
The manuscript advances a normative philosophical vision for AGI alignment by positing subjecthood and advocating gradual autonomy-supporting parenting modeled on Turing's child machines. It supplies no equations, formal derivations, fitted parameters, empirical predictions, or load-bearing self-citations that could reduce to inputs by construction. The central claims rest on external historical analogy and ethical reasoning rather than any self-referential fitting or definitional loop, rendering the argument self-contained within its conceptual frame.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption AGI can develop autonomy and moral status as a subject
Reference graph
Works this paper leans on
-
[1]
A General Language Assistant as a Laboratory for Alignment
[ABC+21] Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, and Jared Kaplan. A general language assistant as...
work page internal anchor Pith review arXiv
-
[2]
Sparks of Artificial General Intelligence: Early experiments with GPT-4
[BCE+23] Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. Sparks of artificial general intelligence: Early experiments with GPT-4.CoRR abs/2303.12712,
work page internal anchor Pith review arXiv
-
[3]
https://www.gov.uk/government/publications/ international-scientific-report-on-the-safety-of-advanced-ai. [BKK+22] Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, D...
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
[BSS25] Sudarshan Kamath Barkur, Sigurd Schacht, and Johannes Scholl. Deception in LLMs: Self-preservation and autonomous goals in large language models.arXiv preprint arXiv:2501.16513,
- [5]
-
[6]
9 The Possibility of Artificial Intelligence Becoming a Subject and the Alignment Problem [CTDV A24]Carmengelys Cordova, Joaquin Taverner, Elena Del Val, and Estefania Argente. A systematic review of norm emergence in multi-agent systems.arXiv preprint arXiv:2412.10609,
-
[7]
Open problems in cooperative ai
[DHB+20] Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R. McKee, Joel Z. Leibo, Kate Larson, and Thore Graepel. Open problems in cooperative AI.arXiv preprint arXiv:2012.08630,
-
[8]
[Gle25] Jerome C. Glenn. Why AGI should be the world’s top priority. https://www.cirsd.org/en/horizons/ horizons-spring-2025--issue-no-30/why-agi-should-be-the-worlds-top-priority,
2025
-
[9]
[HDM+24] Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse...
work page internal anchor Pith review arXiv
-
[10]
[KUCS25] Adrian Kosowski, Przemysław Uzna´nski, Jan Chorowski, and Zuzanna Stamirowska. The dragon hatch- ling: The missing link between the transformer and models of the brain.arXiv preprint arXiv:2509.26507,
-
[11]
arXiv preprint arXiv:2411.00986 (2024)
[LSB+24] Robert Long, Jeff Sebo, Patrick Butlin, Kathleen Finlinson, Kyle Fish, Jacqueline Harding, Jacob Pfau, Toni Sims, Jonathan Birch, and David Chalmers. Taking AI welfare seriously.arXiv preprint arXiv:2411.00986,
-
[12]
[LVCB25] Joel Z. Leibo, Alexander Sasha Vezhnevets, William A. Cunningham, and Stanley M. Bileschi. A pragmatic view of AI personhood.arXiv preprint arXiv:2510.26396,
-
[13]
The alignment problem from a deep learning perspective
[NCM24] Richard Ngo, Lawrence Chan, and Sören Mindermann. The alignment problem from a deep learning perspective. InICLR 2024,
2024
-
[14]
Uehiro lectures
[Rai22] Peter Railton. Uehiro lectures. https://www.practicalethics.ox.ac.uk/uehiro-lectures-2022,
2022
-
[15]
https://time.com/collections/time100-ai-2025/7305869/stuart-russell/,
2025
-
[16]
arXiv preprint arXiv:2208.11173 , year=
[SMD+22] Richard S. Sutton, Joseph Modayil, Michael Delp, Thomas Degris, Patrick M. Pilarski, Adam White, and Doina Precup. The OAK architecture: Options and knowledge as a basis for agent intelligence.arXiv preprint arXiv:2208.11173,
-
[17]
[SNB+25] Brandon Schoen, Evgenia Nitishinskaya, Mikita Balesni, Axel Højmark, Felix Hofstätter, Jérémy Scheurer, Alexander Meinke, Jason Wolfe, Teun van der Weij, Alex Lloyd, Nicholas Goldowsky-Dill, Angela Fan, Andrei Matveiakin, Rusheb Shah, Marcus Williams, Amelia Glaese, Boaz Barak, Wojciech Zaremba, and Marius Hobbhahn. Stress testing deliberative al...
-
[18]
An Approach to Technical AGI Safety and Security
[SVK+25] Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, Zachary Kenton, Roger Grosse, and Shane Legg. An approach to technical AGI safety and security.CoRR abs/2504.01849,
-
[19]
[SZC+24] Hong Shen, Jiatong Zhang, Lu Chen, Haotian Sun, Zhen Peng, Wenbin Wu, Jiliang Cao, and Yiming Yang. Towards bidirectional human–AI alignment: A systematic review for clarifications, framework, and future directions.arXiv preprint arXiv:2406.09264v2,
-
[20]
arXiv preprint arXiv:2508.17511 , year=
[TCCE25] Matthew Taylor, James Chua, Burak Can, and Owain Evans. School of reward hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs.arXiv preprint arXiv:2508.17511,
-
[21]
S.2296 – national defense authorization act for fiscal year 2026, sec
[US 25] US Congress. S.2296 – national defense authorization act for fiscal year 2026, sec. 1626 – artificial general intelligence steering committee. https://www.congress.gov/bill/119th-congress/senate-bill/2296/text,
2026
- [22]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.