pith. machine review for the scientific record. sign in

arxiv: 2603.24350 · v2 · submitted 2026-03-25 · 💻 cs.RO · cs.AI· cs.LG

Recognition: unknown

Evidence of an Emergent "Self" in Continual Robot Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:29 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG
keywords continual learningemergent selfinvariant subnetworkrobot adaptationneural network stabilityself-awarenesscognitive structure
0
0 comments X

The pith

Continual learning causes robots to develop a stable invariant subnetwork that functions as an emergent self.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes isolating the self as the most invariant part of cognitive processes in robots, because self is the persistent aspect amid changing experiences. By comparing a robot learning constant tasks to one facing variable tasks, it finds that continual learning produces a significantly more stable subnetwork. This subnetwork is functionally key, as keeping it intact helps adaptation to new tasks while disrupting it harms performance. A sympathetic reader would care because it offers an objective, measurable proxy for detecting self-awareness in artificial systems rather than relying on introspection.

Core claim

Robots subjected to continual learning under variable tasks develop an invariant subnetwork that remains significantly more stable than in robots learning a constant task, with statistical significance at p < 0.001. This subnetwork is functionally important because preserving it aids adaptation to new tasks while damaging it impairs performance. The authors interpret this invariant structure as evidence of an emergent self, defined as the portion of cognition that changes least compared to acquired skills.

What carries the argument

The invariant subnetwork, identified as the portion of the neural network with the least weight change during learning, proposed as the representation of the self.

If this is right

  • Preserving the invariant subnetwork improves a robot's ability to adapt to new tasks.
  • Damaging the invariant subnetwork reduces performance in continual learning scenarios.
  • Continual learning environments produce more stable invariant subnetworks than static task environments.
  • The stability difference is statistically significant with p less than 0.001.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach could be extended to non-robotic AI systems like language models to detect similar invariant structures.
  • If the invariant subnetwork truly represents self, then systems without continual learning pressure might lack a developed self.
  • Future experiments could test whether this subnetwork correlates with consistent behavior across different environments.

Load-bearing premise

That the most invariant portion of the network can be identified with the self.

What would settle it

If experiments show no significant difference in stability between continual and constant learning robots, or if damaging the identified subnetwork does not disproportionately impair adaptation compared to random damage.

Figures

Figures reproduced from arXiv: 2603.24350 by Adidev Jhunjhunwala, Hod Lipson, Judah Goldfeder.

Figure 1
Figure 1. Figure 1: Continual learning produces a stable self-like core. Compared to a single-task baseline, multi-behavior training yields a subnetwork that remains stable across behaviors (“persistent self”), while other components vary. Representative results from the first hidden layer of each network shown. *Correspondence to: aj3337@columbia.edu 1 arXiv:2603.24350v2 [cs.RO] 24 Apr 2026 [PITH_FULL_IMAGE:figures/full_fig… view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of the persistent self in static and variable conditions. Each policy is shown on its own plane, with hidden-layer units ordered by co-activation-based subnetworks, showing how the size and structure of the subnetworks change across learning. Alluvial flows connect matched neuron families across successive policies, grouped by source subnetwork → target subnetwork; dark purple denotes the sel… view at source ↗
Figure 3
Figure 3. Figure 3: Quantitative evidence for a persistent self-like subnetwork. Shown here is one trained policy from each condition (constant-task and continual). Although both policies successfully perform the same behavior, they exhibit markedly different internal structure. The top panel shows the reordered neuron–neuron co-activation matrix with inferred subnetwork boundaries, and the bottom panel shows per-neuron persi… view at source ↗
Figure 4
Figure 4. Figure 4: Subnetwork persistence and size: constant-task baseline vs. continual learning. Mean persistence score (top) and self subnetwork size (bottom) across 50 cycles for both hidden layers. The continual-learning (multi-behavior) agent shows a clear separation between the self subnetwork (largest subnetwork) and the pooled task subnetwork (all remaining units), while the walk-only baseline exhibits weaker separa… view at source ↗
Figure 5
Figure 5. Figure 5: Reorganization concentrates in task-like regions at behavior switches. Overlay heatmaps show how much each subnetwork changes when learning a new behavior. The self-like subnetwork exhibits consistently smaller change than the pooled task-like region, validating the stable core alongside components that relearn more aggressively to acquire new skills. observed in the manipulation replication across all fou… view at source ↗
Figure 6
Figure 6. Figure 6: Overview of the training and comparison pipeline in the multi-behavior experiment. A single agent morphology was trained sequentially on walk, wiggle, and bob using SAC, with actor weights transferred between phases and plateau-based phase switching. After training, we compared the resulting policies on a shared set of reference states to identify neural groups that remained stable across behaviors and neu… view at source ↗
Figure 7
Figure 7. Figure 7: Sensitivity to τ in subnetwork sizes. Late-cycle mean subnetwork sizes (alive units) for the self and task subnetworks in Layers 1–2. We selected τ = 0.70 as a representative operating point. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Sensitivity to τ in self–task separation. Late-cycle mean separation ∆ (self minus task persistence score, in percentage points) for Layers 1–2 with confidence bands. We selected τ = 0.70 as a representative operating point. Overall, the key qualitative conclusions were stable for a sensible mid-range of thresholds. In [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Pooled transfer overlays for freeze and lesion interventions. Left: per-transition nor￾malized reward curves aggregated across matched self-freeze and task-freeze transfer runs. The self￾freeze condition remains slightly above task-freeze across much of the evaluation window. Right: per￾transition normalized reward curves aggregated across matched self-lesion, task-lesion, and original transfer runs. The t… view at source ↗
Figure 10
Figure 10. Figure 10: Manipulation tasks used in the Continual World replication. The six Meta-World ma￾nipulation tasks used across the four post-validation task sets, spanning tool use, pushing, pressing, sliding, closing, and alignment-sensitive extraction. transfer pairs (592 non-tied pairs), self-freeze showed a median advantage of 1.6% of the evaluation window (1.9% mean). Bootstrap resampling preserved the predicted dir… view at source ↗
Figure 11
Figure 11. Figure 11: triA. Manipulation results for hammer-v1, faucet-close-v1, and peg-unplug-side-v1. Figures 11–14 summarize the results for the four task sets, each averaged across three seeds. In every case, the largest inferred subnetwork remained more persistent than the pooled task-like remainder, and this separation was sustained across training rather than appearing only at isolated checkpoints. The left panels trac… view at source ↗
Figure 12
Figure 12. Figure 12: triC. Manipulation results for hammer-v1, push-wall-v1, and window-close-v1 [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: quadA. Manipulation results for faucet-close-v1, handle-press-side-v1, window-close-v1, and peg-unplug-side-v1 [PITH_FULL_IMAGE:figures/full_fig_p030_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: pentaA. Manipulation results for hammer-v1, push-wall-v1, faucet-close-v1, stick-pull-v1, and peg-unplug-side-v1. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Overlay view of activation matrices and persistence-scores. Each composite is formed by alpha-blending activation-matrix visualizations from all successful plateaued snapshots within a continual-learning run (cycles 16–50), so consistent structure accumulates while inconsistent correla￾tions fade. Top: single-behavior (walk-only) controls remain diffuse and “spaghetti-like” with no equally clean core. Bot… view at source ↗
Figure 16
Figure 16. Figure 16: Examples of [PITH_FULL_IMAGE:figures/full_fig_p033_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Examples of [PITH_FULL_IMAGE:figures/full_fig_p033_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Examples of [PITH_FULL_IMAGE:figures/full_fig_p033_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Bob-only baseline (all-states reference set). Left: Persistence score of the self and task subnetworks across phases. Right: Size of the self and task subnetworks across phases. null of no separation and the stronger 15-point benchmark were rejected with overwhelming one-sided significance. S6 Single-behavior baselines across cycles As a control, we computed the same subnetwork diagnostics for single-beha… view at source ↗
Figure 20
Figure 20. Figure 20: Wiggle-only baseline. Left: Persistence score of the self and task subnetworks across phases. Right: Size of the self and task subnetworks across phases. S7 RL hyperparameters for Quadruped • Algorithm: Soft Actor–Critic (rl_games implementation). • Actor/critic architecture: 2-layer MLPs with 150 hidden units per layer and ReLU activa￾tions. • Discount factor: γ = 0.99. • Actor and critic learning rate: … view at source ↗
Figure 21
Figure 21. Figure 21: Example rollout of the training schema. An example training trace illustrating phase switching across two full walk→wiggle→bob cycles with the plateau-stop logic described above. S9 Behavior definitions and rewards All three behaviors shared the same dynamics, observation, and action spaces; they differed only in their reward shaping. Walk. For walking, we used IsaacLab’s default Ant locomotion reward, wh… view at source ↗
Figure 22
Figure 22. Figure 22: Quadruped in simulation and hardware: (left) CAD model showing the disk-shaped torso [PITH_FULL_IMAGE:figures/full_fig_p041_22.png] view at source ↗
read the original abstract

A key challenge to understanding self-awareness has been a principled way of quantifying whether an intelligent system has a concept of a "self", and if so how to differentiate the "self" from other cognitive structures. We propose that the "self" can be isolated by seeking the invariant portion of cognitive process that changes relatively little compared to more rapidly acquired cognitive knowledge and skills, because our self is the most persistent aspect of our experiences. We used this principle to analyze the cognitive structure of robots under two conditions: One robot learns a constant task, while a second robot is subjected to continual learning under variable tasks. We find that robots subjected to continual learning develop an invariant subnetwork that is significantly more stable (p < 0.001) compared to the control, and that this subnetwork is also functionally important: preserving it aids adaptation while damaging it impairs performance. We suggest that this principle can offer a window into exploring selfhood in other cognitive AI systems

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes isolating the 'self' in intelligent systems as the most invariant portion of cognitive processes, which changes little relative to acquired skills. It compares a robot learning a constant task against one under continual learning with variable tasks, reporting that the continual-learning robot develops a significantly more stable invariant subnetwork (p < 0.001). This subnetwork is functionally relevant: preserving it supports adaptation while ablating it impairs performance. The authors argue the approach provides a window into selfhood in other AI systems.

Significance. If the empirical results hold after fuller documentation, the work supplies a concrete, measurable criterion for detecting emergent invariant structures in continual-learning agents and demonstrates their functional role via preservation/ablation tests. The statistical threshold and performance impact are positive features. The interpretive step equating invariance with selfhood, however, remains an open claim that would benefit from external validation.

major comments (2)
  1. Abstract: the reported statistical result (p < 0.001) and functional test are presented without any description of network architecture, task definitions, the precise procedure used to extract the invariant subnetwork, or controls for total training time and network size. These omissions prevent evaluation of whether the stability difference is attributable to the continual-learning condition rather than confounding variables.
  2. Methods (assumed section describing subnetwork isolation): the invariant subnetwork is isolated by the same invariance criterion that is then used to label it the 'self'. No independent behavioral, representational, or self-other distinction test is supplied to decouple the measurement from the interpretation, leaving the central claim vulnerable to circularity.
minor comments (1)
  1. Provide the exact stability metric, threshold for invariance, and any statistical test details (e.g., sample size, correction for multiple comparisons) used to obtain p < 0.001.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us improve the clarity and robustness of our manuscript. We address each major comment below and have made corresponding revisions.

read point-by-point responses
  1. Referee: [—] Abstract: the reported statistical result (p < 0.001) and functional test are presented without any description of network architecture, task definitions, the precise procedure used to extract the invariant subnetwork, or controls for total training time and network size. These omissions prevent evaluation of whether the stability difference is attributable to the continual-learning condition rather than confounding variables.

    Authors: We agree that the abstract should have included summaries of these key elements to facilitate immediate evaluation. Although the full details are provided in the Methods section, we have revised the abstract to briefly describe the network architecture, task definitions, the procedure used to extract the invariant subnetwork, and the controls for total training time and network size. This revision ensures that readers can assess whether the stability difference is due to the continual-learning condition. revision: yes

  2. Referee: [—] Methods (assumed section describing subnetwork isolation): the invariant subnetwork is isolated by the same invariance criterion that is then used to label it the 'self'. No independent behavioral, representational, or self-other distinction test is supplied to decouple the measurement from the interpretation, leaving the central claim vulnerable to circularity.

    Authors: We acknowledge the potential for circularity in the interpretation. The subnetwork is identified via the invariance criterion, and we interpret it as the 'self' based on the persistence principle outlined in the introduction. However, the functional relevance is demonstrated through independent preservation and ablation experiments, which show performance impacts without relying on the 'self' label. We have added a dedicated subsection in the Discussion to explicitly address this concern, including suggestions for future independent tests such as representational analyses and behavioral self-distinction tasks. This provides a stronger separation between measurement and interpretation while preserving the core contribution. revision: partial

Circularity Check

1 steps flagged

Definition of 'self' as the invariant subnetwork renders the identification tautological

specific steps
  1. self definitional [Abstract]
    "We propose that the 'self' can be isolated by seeking the invariant portion of cognitive process that changes relatively little compared to more rapidly acquired cognitive knowledge and skills, because our self is the most persistent aspect of our experiences. We used this principle to analyze the cognitive structure of robots under two conditions... We find that robots subjected to continual learning develop an invariant subnetwork that is significantly more stable (p < 0.001) compared to the control, and that this subnetwork is also functionally important"

    The paper first defines 'self' as the invariant portion, then applies the invariance criterion to extract a subnetwork and declares it the emergent 'self'. The identification is therefore true by the definitional premise rather than by any separate validation that the subnetwork implements self-concept or self-other distinction.

full rationale

The paper's central interpretive step defines the 'self' explicitly as the most invariant portion of the cognitive process and then isolates the most stable subnetwork under continual learning to label it the emergent 'self'. This matches the self-definitional pattern: the label follows directly from the isolation criterion rather than from independent evidence of selfhood (e.g., self-other distinction or behavioral tests). The empirical measurements of stability (p<0.001) and functional ablation effects are direct and non-circular, but the load-bearing claim that this subnetwork constitutes a 'self' reduces to the premise used to select it. No mathematical derivation or fitted parameter is involved, keeping the overall circularity moderate.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The claim rests on one domain assumption (self equals the slowest-changing cognitive component) and introduces one invented entity (the invariant subnetwork as a candidate self). No free parameters are described in the abstract.

axioms (1)
  • domain assumption The self is the most persistent aspect of experiences and can therefore be isolated as the invariant portion of cognitive processes
    This premise directly motivates the choice to compare stability between continual and constant-task robots.
invented entities (1)
  • invariant subnetwork no independent evidence
    purpose: To serve as a measurable proxy for the emergent self in the robot's cognitive structure
    Defined operationally as the portion of the network that changes least under continual learning; no independent evidence outside the stability metric is given.

pith-pipeline@v0.9.0 · 5471 in / 1270 out tokens · 30498 ms · 2026-05-15T00:29:03.809767+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 5 internal anchors

  1. [1]

    Locke,An Essay Concerning Human Understanding(Clarendon Press, Oxford) (1975)

    J. Locke,An Essay Concerning Human Understanding(Clarendon Press, Oxford) (1975). 17

  2. [2]

    Ricoeur,Oneself as Another(University of Chicago Press, Chicago) (1992)

    P . Ricoeur,Oneself as Another(University of Chicago Press, Chicago) (1992)

  3. [3]

    Parfit,Reasons and Persons(Oxford University Press, Oxford) (1984)

    D. Parfit,Reasons and Persons(Oxford University Press, Oxford) (1984)

  4. [4]

    Merleau-Ponty,Phenomenology of Perception(Routledge, London) (2012)

    M. Merleau-Ponty,Phenomenology of Perception(Routledge, London) (2012)

  5. [5]

    Bongard, V

    J. Bongard, V. Zykov, H. Lipson, Resilient machines through continuous self-modeling. Science314(5802), 1118–1121 (2006), doi:10.1126/science.1133687

  6. [6]

    B. Chen, R. Kwiatkowski, C. Vondrick, H. Lipson, Full-body visual self-modeling of robot morphologies.Science Robotics7(66), eabn8010 (2022), doi:10.1126/scirobotics. abn8010

  7. [7]

    Díaz Ledezma, S

    F . Díaz Ledezma, S. Haddadin, Machine learning–driven self-discovery of the robot body morphology.Science Robotics8(74), eade2241 (2023), doi:10.1126/scirobotics.ade2241

  8. [8]

    Jiang, J

    S. Jiang, J. Zhang, L. Wong, Robot body schema learning from full-body ex- tero/proprioception sensors.arXiv preprint arXiv:2402.18675(2024),https://arxiv. org/abs/2402.18675

  9. [9]

    Pugach, A

    G. Pugach, A. Pitti, O. Tolochko, P . Gaussier, Brain-inspired coding of robot body schema through visuo-motor integration of touched events.Frontiers in Neurorobotics13, 37 (2019), doi:10.3389/fnbot.2019.00037

  10. [10]

    Lipson, J

    H. Lipson, J. B. Pollack, N. P . Suh, On the origin of modular variation.Evolution56(8), 1549–1556 (2002), doi:10.1111/j.0014-3820.2002.tb01466.x

  11. [11]

    Clune, J.-B

    J. Clune, J.-B. Mouret, H. Lipson, The evolutionary origins of modularity.Proceedings of the Royal Society B: Biological Sciences280(1755), 20122863 (2013), doi:10.1098/rspb. 2012.2863

  12. [12]

    Celestial Mechan- ics and Dynamical Astronomy83, 155–169 (2002) https://doi.org/10.1023/A: 1020143116091

    R. Caruana, Multitask Learning.Machine Learning28, 41–75 (1997), doi:10.1023/A: 1007379606734. 18

  13. [13]

    Learning Shared Representations in Multi-task Reinforcement Learning

    D. Borsa, T. Graepel, J. Shawe-Taylor, Learning Shared Representations in Multi-task Reinforcement Learning, arXiv (2016), doi:10.48550/arXiv.1603.02041,https://arxiv. org/abs/1603.02041

  14. [14]

    D’Eramo, D

    C. D’Eramo, D. Tateo, A. Bonarini, M. Restelli, J. Peters, Sharing Knowledge in Multi-Task Deep Reinforcement Learning, inInternational Conference on Learning Representations (ICLR)(2020),https://openreview.net/forum?id=rkgpv2VFvr

  15. [15]

    Khetarpal, M

    K. Khetarpal, M. Riemer, I. Rish, D. Precup, Towards Continual Reinforcement Learning: A Review and Perspectives.Journal of Artificial Intelligence Research75, 1401–1476 (2022), doi:10.1613/JAIR.1.13673

  16. [16]

    D.; Rybski, D.; Andrade, J

    J. Kirkpatrick,et al., Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences114(13), 3521–3526 (2017), doi:10.1073/pnas. 1611835114,https://www.pnas.org/doi/10.1073/pnas.1611835114

  17. [17]

    A. A. Rusu,et al., Progressive Neural Networks, arXiv (2016), doi:10.48550/arXiv.1606. 04671,https://arxiv.org/abs/1606.04671

  18. [18]

    Wołczyk, M

    M. Wołczyk, M. Zaj ˛ ac, R. Pascanu, Ł. Kuci´nski, P . Miło´s, Continual World: A Robotic Benchmark For Continual Reinforcement Learning.arXiv preprint arXiv:2105.10919 (2021), doi:10.48550/arXiv.2105.10919

  19. [19]

    Yu,et al., Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforce- ment Learning, inProceedings of the 4th Conference on Robot Learning (CoRL)(2020)

    T. Yu,et al., Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforce- ment Learning, inProceedings of the 4th Conference on Robot Learning (CoRL)(2020)

  20. [20]

    V. A. C. Horta, I. Tiddi, S. Little, A. Mileo, Extracting knowledge from deep neural networks through graph analysis.Future Generation Computer Systems120, 109–118 (2021), doi:10.1016/j.future.2021.02.009,https://www.sciencedirect.com/science/ article/pii/S0167739X21000613. 19

  21. [21]

    A. M. Weil, V. A. C. Horta, H. Qadeer, A. Mileo, Adapting Graph-Based Analysis for Knowledge Extraction from Transformer Models, inProceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, vol. 284 ofProceedings of Ma- chine Learning Research(2025), pp. 1–14,https://proceedings.mlr.press/v284/ weil25a.html

  22. [22]

    Cuthill, J

    E. Cuthill, J. McKee, Reducing the Bandwidth of Sparse Symmetric Matrices, inProceed- ings of the 1969 24th National Conference, ACM ’69 (Association for Computing Machin- ery, New Y ork, NY , USA) (1969), pp. 157–172, doi:10.1145/800195.805928

  23. [23]

    George, J

    A. George, J. W. H. Liu,Computer Solution of Large Sparse Positive Definite Systems (Prentice-Hall, Englewood Cliffs, NJ, USA) (1981)

  24. [24]

    Puiutta, E

    E. Puiutta, E. M. S. P . Veith, Explainable Reinforcement Learning: A Survey, arXiv (2020), doi:10.48550/arXiv.2005.06247,https://arxiv.org/abs/2005.06247

  25. [25]

    Milani, N

    S. Milani, N. Topin, M. Veloso, F . Fang, A Survey of Explainable Reinforcement Learning, arXiv (2022), doi:10.48550/arXiv.2202.08434,https://arxiv.org/abs/2202.08434

  26. [26]

    Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

    F . Acero, Z. Li, Distilling Reinforcement Learning Policies for Interpretable Robot Locomo- tion: Gradient Boosting Machines and Symbolic Regression (2024), doi:10.48550/arXiv. 2403.14328,https://arxiv.org/abs/2403.14328

  27. [27]

    S. Lomasov,et al., Exploring Human-AI Conceptual Alignment through the Prism of Chess, inAdvances in Neural Information Processing Systems 39 (NeurIPS 2025), Cre- ative AI Track(2025), also available as arXiv:2510.26025

  28. [28]

    PathNet: Evolution Channels Gradient Descent in Super Neural Networks

    C. Fernando,et al., PathNet: Evolution Channels Gradient Descent in Super Neural Net- works, arXiv (2017), doi:10.48550/arXiv.1701.08734,https://arxiv.org/abs/1701. 08734. 20

  29. [29]

    Lipson, Principles of modularity, regularity, and hierarchy for scalable systems.Journal of Biological Physics and Chemistry7(4), 125–128 (2007), doi:10.4024/40701.jbpc.07.04

    H. Lipson, Principles of modularity, regularity, and hierarchy for scalable systems.Journal of Biological Physics and Chemistry7(4), 125–128 (2007), doi:10.4024/40701.jbpc.07.04

  30. [30]

    Y osinski, J

    J. Y osinski, J. Clune, Y . Bengio, H. Lipson, How transferable are features in deep neu- ral networks?, inAdvances in Neural Information Processing Systems, vol. 27 (2014), pp. 3320–3328,https://proceedings.neurips.cc/paper_files/paper/2014/file/ 532a2f85b6977104bc93f8580abbb330-Paper.pdf

  31. [31]

    R. S. Sutton, D. Precup, S. Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning.Artificial Intelligence112(1–2), 181– 211 (1999), doi:10.1016/S0004-3702(99)00052-1,https://www.sciencedirect.com/ science/article/pii/S0004370299000521

  32. [32]

    P . M. Wyder,et al., Robot metabolism: Toward machines that can grow by consum- ing other machines.Science Advances11(29), eadu6897 (2025), doi:10.1126/sciadv. adu6897

  33. [33]

    Farama Foundation, Ant, Gymnasium Documentation (MuJoCo Environments) (2026), https://gymnasium.farama.org/environments/mujoco/ant/, accessed: 2026-02- 18

  34. [34]

    Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

    M. Mittal,et al., Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning (2025), doi:10.48550/arXiv.2511.04831,https://arxiv.org/abs/2511. 04831

  35. [35]

    Haarnoja, A

    T. Haarnoja, A. Zhou, P . Abbeel, S. Levine, Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, inInternational Conference on Ma- chine Learning (ICML)(2018),https://proceedings.mlr.press/v80/haarnoja18b. html

  36. [36]

    Haarnoja,et al., Soft Actor-Critic Algorithms and Applications, arXiv (2018)

    T. Haarnoja,et al., Soft Actor-Critic Algorithms and Applications, arXiv (2018). 21

  37. [37]

    Raghu, J

    M. Raghu, J. Gilmer, J. Y osinski, J. Sohl-Dickstein, SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability, inAdvances in Neu- ral Information Processing Systems (NeurIPS)(2017),https://proceedings.neurips. cc/paper/2017/hash/dc6a7e655d7e5840e66733e9ee67cc69-Abstract.html

  38. [38]

    Kornblith, M

    S. Kornblith, M. Norouzi, H. Lee, G. Hinton, Similarity of Neural Network Representa- tions Revisited, inInternational Conference on Machine Learning (ICML)(2019),https: //proceedings.mlr.press/v97/kornblith19a.html

  39. [39]

    M. B. Eisen, P . T. Spellman, P . O. Brown, D. Botstein, Cluster analysis and display of genome-wide expression patterns.Proceedings of the National Academy of Sciences 95(25), 14863–14868 (1998), doi:10.1073/pnas.95.25.14863

  40. [40]

    Entezari, H

    R. Entezari, H. Sedghi, O. Saukh, B. Neyshabur, The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks, inInternational Conference on Learning Representations (ICLR)(2022),https://openreview.net/forum?id=dNigytemkL

  41. [41]

    S. K. Ainsworth, J. Hayase, S. Srinivasa, Git Re-Basin: Merging Models modulo Permuta- tion Symmetries, inInternational Conference on Learning Representations (ICLR)(2023), doi:10.48550/arXiv.2209.04836,https://openreview.net/forum?id=CQsmMYmlP5T

  42. [42]

    B. Simsek,et al., Geometry of the Loss Landscape in Overparameterized Neural Net- works: Symmetries and Invariances, inInternational Conference on Learning Represen- tations (ICML), vol. 139 ofProceedings of Machine Learning Research(2021), pp. 9722– 9732,https://proceedings.mlr.press/v139/simsek21a.html

  43. [43]

    H. W. Kuhn, The Hungarian Method for the Assignment Problem.Naval Research Logis- tics Quarterly2, 83–97 (1955), doi:10.1002/nav.3800020109

  44. [44]

    Munkres, Algorithms for the Assignment and Transportation Problems.Journal of the Society for Industrial and Applied Mathematics5(1), 32–38 (1957), doi:10.1137/0105003

    J. Munkres, Algorithms for the Assignment and Transportation Problems.Journal of the Society for Industrial and Applied Mathematics5(1), 32–38 (1957), doi:10.1137/0105003. 22

  45. [45]

    S. Kim, S. Lee, Continual Learning with Neuron Activation Importance, inImage Analysis and Processing – ICIAP 2022, S. Sclaroff, C. Distante, M. Leo, G. M. Farinella, F . Tombari, Eds. (Springer, Cham), vol. 13231 ofLecture Notes in Computer Science(2022), doi: 10.1007/978-3-031-06427-2_26

  46. [46]

    D. Bau,et al., Understanding the role of individual units in a deep neural network.Pro- ceedings of the National Academy of Sciences(2020), doi:10.1073/pnas.1907375117, https://www.pnas.org/doi/10.1073/pnas.1907375117

  47. [47]

    A. S. Morcos, D. G. T. Barrett, N. C. Rabinowitz, M. Botvinick, On the Importance of Single Directions for Generalization, inInternational Conference on Learning Representations (ICLR)(2018),https://openreview.net/forum?id=r1iuQjxCZ

  48. [48]

    Biometrics Bulletin , author =

    F . Wilcoxon, Individual Comparisons by Ranking Methods.Biometrics Bulletin1(6), 80–83 (1945), doi:10.2307/3001968. 23 Code Availability All code is available athttps://github.com/adidevj7/EmergentRobotSelf. The repository includes the Isaac-based training pipeline, the full analysis toolkit used in this study, experi- ment configuration files, and valida...