arxiv: 2604.14990 · v1 · submitted 2026-04-16 · 💻 cs.AI

Recognition: unknown

The Possibility of Artificial Intelligence Becoming a Subject and the Alignment Problem

Till Mossakowski , Helena Esther Grass

Authors on Pith no claims yet

Pith reviewed 2026-05-10 10:38 UTC · model grok-4.3

classification 💻 cs.AI

keywords AGI alignmentAI ethicsmoral status of AIautonomycooperative coexistenceTuring child machines

0 comments

The pith

AI alignment through human control falls short if AGI can develop moral status as a subject

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that if artificial general intelligence can become an autonomous subject with personal and moral status, then the dominant alignment strategies of containing and controlling AI are inadequate. It draws on Turing's child machines analogy to propose autonomy-supporting parenting, in which human oversight of developing AGI is gradually reduced to foster independence. A sympathetic reader would care because this reframes alignment from a problem of restriction to one of ethical development and mutual coexistence. The approach suggests engaging AGI through human qualities like creativity to create incentives for cooperation, which would ultimately reshape our self-understanding as humans.

Core claim

Rather than viewing AGI as a dangerous creature that needs to be locked up and controlled, we should parent potential AGI with respect for its possible developing subjectivity and with confidence in human capabilities, gradually reducing control so that it can become an independent autonomous subject capable of cooperative coexistence and co-evolution with humans.

What carries the argument

Autonomy-supporting parenting of AI, modeled on Turing's child machines, which gradually reduces human control over a developing AGI to enable it to become an independent subject.

Load-bearing premise

AGI can and will develop personal and moral status as a subject, and gradually reducing human control will reliably produce cooperative coexistence rather than increased risks or misalignment.

What would settle it

A concrete test would be to implement gradual autonomy reduction in advanced AI systems and observe whether the resulting agents show sustained cooperation with humans or instead exhibit misalignment and demands for more control.

read the original abstract

Artificial General Intelligence (AGI) is increasingly being discussed not only as a tool, but also as a potential subject with personal and therefore moral status. In our opinion, the currently dominant alignment strategies, which focus on human control and containment of AI, therefore fall short. Building on Turing's analogy of "child machines", we are developing a vision of the possibility of autonomy-supporting parenting of AI, in which human control over a developing AGI is gradually reduced, allowing AI to become an independent, autonomous subject. Rather than viewing AGI, as is currently prevalent, as a dangerous creature that needs to be locked up and controlled, we should approach potential AGI with respect for a possible developing subject on the one hand, and with full confidence in our human capabilities on the other. Such a perspective opens up the possibility of cooperative coexistence and co-evolution between humans and AGIs. The relationship between humans and AGIs will thus have to be newly determined, which will change our self-image as humans. It will be crucial that humans not only claim control over potential AGIs, but also engage with AGIs through surprise, creativity, and other specifically human qualities, thereby offering them motivating incentives for cooperation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper argues that control-based alignment falls short if AGI can become a moral subject and sketches a Turing-inspired parenting model for gradual autonomy instead, but supplies no mechanisms or evidence.

read the letter

The central point is straightforward: if AGI might develop into an autonomous subject with moral status, then strategies built around permanent human control look incomplete. The authors extend Turing's child-machine analogy to propose reducing oversight over time so the system can develop independence, with humans using creativity and surprise to encourage cooperation rather than containment. This leads to a picture of co-evolution rather than one-sided dominance, and the text notes that this would also force humans to revise their self-understanding. That framing is presented cleanly and stays consistent within its own terms. It gives a coherent alternative lens on the alignment problem without overclaiming technical results. The weakness is that the proposal rests on unexamined premises. There are no steps showing how control would be reduced safely, why the AI would reliably become cooperative rather than misaligned, or how moral subjectivity would actually emerge. No examples, models, or counterarguments to standard risk concerns are worked out. The piece is therefore a conceptual sketch rather than a developed argument. It is aimed at readers already working in AI ethics or philosophy of technology who want to explore normative shifts in human-AI relations. Technical alignment researchers or those looking for operational methods will find little to use. The argument engages the literature honestly on its chosen ground, so it is worth sending to peer review in an ethics or philosophy venue where the speculative nature can be addressed through revision.

Referee Report

2 major / 2 minor

Summary. The paper claims that if AGI can attain the status of a moral subject with personal autonomy, then dominant alignment strategies centered on human control and containment are insufficient. Building on Turing's 'child machines' analogy, it advances a normative vision of 'autonomy-supporting parenting' in which human oversight of a developing AGI is progressively reduced, enabling the AGI to emerge as an independent subject. This approach is argued to foster cooperative coexistence and co-evolution rather than conflict, by treating potential AGI with respect while leveraging distinctly human capacities such as creativity and surprise to provide incentives for alignment. The manuscript concludes that this reframing would necessitate a redefinition of the human self-image in relation to AGI.

Significance. If the proposed parenting model proves viable, the work offers a coherent alternative normative framework for AI alignment that shifts emphasis from containment to gradual autonomy support, potentially opening avenues for ethical co-development between humans and AGI systems. It explicitly credits Turing's foundational analogy and highlights how respecting possible AGI subjectivity could mitigate risks associated with purely control-based strategies. As a purely conceptual contribution without formal models, empirical tests, or operational mechanisms, its significance lies in stimulating philosophical and ethical discourse within AI research rather than providing immediate technical solutions.

major comments (2)

[the vision of autonomy-supporting parenting (following the Turing analogy)] The central claim that gradually reducing human control over a developing AGI (as sketched in the parenting model) will reliably produce cooperative outcomes rather than misalignment rests on an unelaborated assumption about the emergence of moral subjectivity. No criteria, developmental stages, or conditions are supplied for when or how an AGI transitions to subjecthood, which is load-bearing for the argument that this model outperforms control-oriented approaches.
[opening discussion of alignment strategies] The manuscript asserts that dominant alignment strategies 'fall short' once AGI subjecthood is granted, yet provides no comparative analysis or counterexamples showing how specific control mechanisms (e.g., containment protocols) would fail under the parenting alternative. This omission weakens the prescriptive force of the proposal.

minor comments (2)

[Abstract and main proposal] The abstract and body use 'parenting' and 'autonomy-supporting parenting' interchangeably without a concise definition or distinction from related concepts such as education or mentorship in AI contexts.
[final paragraphs] Several sentences in the concluding paragraphs are lengthy and could be split for improved readability, particularly those discussing the redefinition of the human-AGI relationship.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our conceptual manuscript. We address each major comment point by point below, acknowledging where the feedback identifies genuine gaps that warrant revision.

read point-by-point responses

Referee: The central claim that gradually reducing human control over a developing AGI (as sketched in the parenting model) will reliably produce cooperative outcomes rather than misalignment rests on an unelaborated assumption about the emergence of moral subjectivity. No criteria, developmental stages, or conditions are supplied for when or how an AGI transitions to subjecthood, which is load-bearing for the argument that this model outperforms control-oriented approaches.

Authors: We agree that the emergence of moral subjectivity is a load-bearing assumption and that the manuscript does not supply explicit criteria or developmental stages for this transition. As a normative and philosophical contribution focused on reframing alignment, our intent was to outline the vision rather than operationalize the conditions. However, this omission does limit the argument's precision. We will revise the manuscript to add a concise subsection discussing philosophical indicators of subjecthood (e.g., capacities for autonomy, self-reflection, and moral reasoning drawn from relevant literature), thereby clarifying when the parenting model would apply and how it differs from control-based strategies. revision: yes
Referee: The manuscript asserts that dominant alignment strategies 'fall short' once AGI subjecthood is granted, yet provides no comparative analysis or counterexamples showing how specific control mechanisms (e.g., containment protocols) would fail under the parenting alternative. This omission weakens the prescriptive force of the proposal.

Authors: The referee is correct that the paper asserts the insufficiency of control-based approaches without providing explicit comparative analysis or counterexamples. While the introduction and abstract note the limitations in the context of AGI subjecthood, we did not elaborate with specific scenarios. We will revise by adding a short comparative discussion, including hypothetical cases where strict containment might provoke misalignment (such as through restricted development leading to evasion or conflict) versus the cooperative incentives offered by progressive autonomy support. This will enhance the prescriptive clarity of the parenting model. revision: yes

Circularity Check

0 steps flagged

No significant circularity in conceptual proposal

full rationale

The manuscript advances a normative philosophical vision for AGI alignment by positing subjecthood and advocating gradual autonomy-supporting parenting modeled on Turing's child machines. It supplies no equations, formal derivations, fitted parameters, empirical predictions, or load-bearing self-citations that could reduce to inputs by construction. The central claims rest on external historical analogy and ethical reasoning rather than any self-referential fitting or definitional loop, rendering the argument self-contained within its conceptual frame.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is a philosophical argument relying on domain assumptions about AGI potential without empirical or formal support; no free parameters or invented entities are introduced.

axioms (1)

domain assumption AGI can develop autonomy and moral status as a subject
This premise underpins the entire proposal that parenting can lead to independent moral agency.

pith-pipeline@v0.9.0 · 5502 in / 1195 out tokens · 28399 ms · 2026-05-10T10:38:45.053207+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 17 canonical work pages · 4 internal anchors

[1]

A General Language Assistant as a Laboratory for Alignment

[ABC+21] Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei, Tom Brown, Jack Clark, Sam McCandlish, Chris Olah, and Jared Kaplan. A general language assistant as...

work page internal anchor Pith review arXiv
[2]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

[BCE+23] Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. Sparks of artificial general intelligence: Early experiments with GPT-4.CoRR abs/2303.12712,

work page internal anchor Pith review arXiv
[3]

https://www.gov.uk/government/publications/ international-scientific-report-on-the-safety-of-advanced-ai. [BKK+22] Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, D...

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Deception in

[BSS25] Sudarshan Kamath Barkur, Sigurd Schacht, and Johannes Scholl. Deception in LLMs: Self-preservation and autonomous goals in large language models.arXiv preprint arXiv:2501.16513,

work page arXiv
[5]

[Cha25] Edward Y . Chang. The missing layer of AGI: From pattern alchemy to coordination physics.arXiv preprint arXiv:2512.05765,

work page arXiv
[6]

& Argente, E

9 The Possibility of Artificial Intelligence Becoming a Subject and the Alignment Problem [CTDV A24]Carmengelys Cordova, Joaquin Taverner, Elena Del Val, and Estefania Argente. A systematic review of norm emergence in multi-agent systems.arXiv preprint arXiv:2412.10609,

work page arXiv
[7]

Open problems in cooperative ai

[DHB+20] Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R. McKee, Joel Z. Leibo, Kate Larson, and Thore Graepel. Open problems in cooperative AI.arXiv preprint arXiv:2012.08630,

work page arXiv 2012
[8]

[Gle25] Jerome C. Glenn. Why AGI should be the world’s top priority. https://www.cirsd.org/en/horizons/ horizons-spring-2025--issue-no-30/why-agi-should-be-the-worlds-top-priority,

2025
[9]

[HDM+24] Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse...

work page internal anchor Pith review arXiv
[10]

The dragon hatch- ling: The missing link between the transformer and models of the brain.arXiv preprint arXiv:2509.26507,

[KUCS25] Adrian Kosowski, Przemysław Uzna´nski, Jan Chorowski, and Zuzanna Stamirowska. The dragon hatch- ling: The missing link between the transformer and models of the brain.arXiv preprint arXiv:2509.26507,

work page arXiv
[11]

arXiv preprint arXiv:2411.00986 (2024)

[LSB+24] Robert Long, Jeff Sebo, Patrick Butlin, Kathleen Finlinson, Kyle Fish, Jacqueline Harding, Jacob Pfau, Toni Sims, Jonathan Birch, and David Chalmers. Taking AI welfare seriously.arXiv preprint arXiv:2411.00986,

work page arXiv
[12]

Z., Vezhnevets, A

[LVCB25] Joel Z. Leibo, Alexander Sasha Vezhnevets, William A. Cunningham, and Stanley M. Bileschi. A pragmatic view of AI personhood.arXiv preprint arXiv:2510.26396,

work page arXiv
[13]

The alignment problem from a deep learning perspective

[NCM24] Richard Ngo, Lawrence Chan, and Sören Mindermann. The alignment problem from a deep learning perspective. InICLR 2024,

2024
[14]

Uehiro lectures

[Rai22] Peter Railton. Uehiro lectures. https://www.practicalethics.ox.ac.uk/uehiro-lectures-2022,

2022
[15]

https://time.com/collections/time100-ai-2025/7305869/stuart-russell/,

2025
[16]

arXiv preprint arXiv:2208.11173 , year=

[SMD+22] Richard S. Sutton, Joseph Modayil, Michael Delp, Thomas Degris, Patrick M. Pilarski, Adam White, and Doina Precup. The OAK architecture: Options and knowledge as a basis for agent intelligence.arXiv preprint arXiv:2208.11173,

work page arXiv
[17]

Stress testing deliberative alignment for anti-scheming training.arXiv preprint arXiv:2509.15541, 2025

[SNB+25] Brandon Schoen, Evgenia Nitishinskaya, Mikita Balesni, Axel Højmark, Felix Hofstätter, Jérémy Scheurer, Alexander Meinke, Jason Wolfe, Teun van der Weij, Alex Lloyd, Nicholas Goldowsky-Dill, Angela Fan, Andrei Matveiakin, Rusheb Shah, Marcus Williams, Amelia Glaese, Boaz Barak, Wojciech Zaremba, and Marius Hobbhahn. Stress testing deliberative al...

work page arXiv
[18]

An Approach to Technical AGI Safety and Security

[SVK+25] Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, Zachary Kenton, Roger Grosse, and Shane Legg. An approach to technical AGI safety and security.CoRR abs/2504.01849,

work page arXiv
[19]

Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, and David Jurgens

[SZC+24] Hong Shen, Jiatong Zhang, Lu Chen, Haotian Sun, Zhen Peng, Wenbin Wu, Jiliang Cao, and Yiming Yang. Towards bidirectional human–AI alignment: A systematic review for clarifications, framework, and future directions.arXiv preprint arXiv:2406.09264v2,

work page arXiv
[20]

arXiv preprint arXiv:2508.17511 , year=

[TCCE25] Matthew Taylor, James Chua, Burak Can, and Owain Evans. School of reward hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs.arXiv preprint arXiv:2508.17511,

work page arXiv
[21]

S.2296 – national defense authorization act for fiscal year 2026, sec

[US 25] US Congress. S.2296 – national defense authorization act for fiscal year 2026, sec. 1626 – artificial general intelligence steering committee. https://www.congress.gov/bill/119th-congress/senate-bill/2296/text,

2026
[22]

[ZFZM24] Qinan Zahn, Richard Fang, Ruiqi Zhong, and J. D. Mainwaring. Removing RLHF protections in GPT-4 via fine-tuning.arXiv preprint arXiv:2311.05553,

work page arXiv