arxiv: 2603.24350 · v2 · submitted 2026-03-25 · 💻 cs.RO · cs.AI· cs.LG

Recognition: unknown

Evidence of an Emergent "Self" in Continual Robot Learning

Adidev Jhunjhunwala , Judah Goldfeder , Hod Lipson

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:29 UTC · model grok-4.3

classification 💻 cs.RO cs.AIcs.LG

keywords continual learningemergent selfinvariant subnetworkrobot adaptationneural network stabilityself-awarenesscognitive structure

0 comments

The pith

Continual learning causes robots to develop a stable invariant subnetwork that functions as an emergent self.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes isolating the self as the most invariant part of cognitive processes in robots, because self is the persistent aspect amid changing experiences. By comparing a robot learning constant tasks to one facing variable tasks, it finds that continual learning produces a significantly more stable subnetwork. This subnetwork is functionally key, as keeping it intact helps adaptation to new tasks while disrupting it harms performance. A sympathetic reader would care because it offers an objective, measurable proxy for detecting self-awareness in artificial systems rather than relying on introspection.

Core claim

Robots subjected to continual learning under variable tasks develop an invariant subnetwork that remains significantly more stable than in robots learning a constant task, with statistical significance at p < 0.001. This subnetwork is functionally important because preserving it aids adaptation to new tasks while damaging it impairs performance. The authors interpret this invariant structure as evidence of an emergent self, defined as the portion of cognition that changes least compared to acquired skills.

What carries the argument

The invariant subnetwork, identified as the portion of the neural network with the least weight change during learning, proposed as the representation of the self.

If this is right

Preserving the invariant subnetwork improves a robot's ability to adapt to new tasks.
Damaging the invariant subnetwork reduces performance in continual learning scenarios.
Continual learning environments produce more stable invariant subnetworks than static task environments.
The stability difference is statistically significant with p less than 0.001.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could be extended to non-robotic AI systems like language models to detect similar invariant structures.
If the invariant subnetwork truly represents self, then systems without continual learning pressure might lack a developed self.
Future experiments could test whether this subnetwork correlates with consistent behavior across different environments.

Load-bearing premise

That the most invariant portion of the network can be identified with the self.

What would settle it

If experiments show no significant difference in stability between continual and constant learning robots, or if damaging the identified subnetwork does not disproportionately impair adaptation compared to random damage.

Figures

Figures reproduced from arXiv: 2603.24350 by Adidev Jhunjhunwala, Hod Lipson, Judah Goldfeder.

**Figure 1.** Figure 1: Continual learning produces a stable self-like core. Compared to a single-task baseline, multi-behavior training yields a subnetwork that remains stable across behaviors (“persistent self”), while other components vary. Representative results from the first hidden layer of each network shown. *Correspondence to: aj3337@columbia.edu 1 arXiv:2603.24350v2 [cs.RO] 24 Apr 2026 [PITH_FULL_IMAGE:figures/full_fig… view at source ↗

**Figure 2.** Figure 2: Visualization of the persistent self in static and variable conditions. Each policy is shown on its own plane, with hidden-layer units ordered by co-activation-based subnetworks, showing how the size and structure of the subnetworks change across learning. Alluvial flows connect matched neuron families across successive policies, grouped by source subnetwork → target subnetwork; dark purple denotes the sel… view at source ↗

**Figure 3.** Figure 3: Quantitative evidence for a persistent self-like subnetwork. Shown here is one trained policy from each condition (constant-task and continual). Although both policies successfully perform the same behavior, they exhibit markedly different internal structure. The top panel shows the reordered neuron–neuron co-activation matrix with inferred subnetwork boundaries, and the bottom panel shows per-neuron persi… view at source ↗

**Figure 4.** Figure 4: Subnetwork persistence and size: constant-task baseline vs. continual learning. Mean persistence score (top) and self subnetwork size (bottom) across 50 cycles for both hidden layers. The continual-learning (multi-behavior) agent shows a clear separation between the self subnetwork (largest subnetwork) and the pooled task subnetwork (all remaining units), while the walk-only baseline exhibits weaker separa… view at source ↗

**Figure 5.** Figure 5: Reorganization concentrates in task-like regions at behavior switches. Overlay heatmaps show how much each subnetwork changes when learning a new behavior. The self-like subnetwork exhibits consistently smaller change than the pooled task-like region, validating the stable core alongside components that relearn more aggressively to acquire new skills. observed in the manipulation replication across all fou… view at source ↗

**Figure 6.** Figure 6: Overview of the training and comparison pipeline in the multi-behavior experiment. A single agent morphology was trained sequentially on walk, wiggle, and bob using SAC, with actor weights transferred between phases and plateau-based phase switching. After training, we compared the resulting policies on a shared set of reference states to identify neural groups that remained stable across behaviors and neu… view at source ↗

**Figure 7.** Figure 7: Sensitivity to τ in subnetwork sizes. Late-cycle mean subnetwork sizes (alive units) for the self and task subnetworks in Layers 1–2. We selected τ = 0.70 as a representative operating point. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

**Figure 8.** Figure 8: Sensitivity to τ in self–task separation. Late-cycle mean separation ∆ (self minus task persistence score, in percentage points) for Layers 1–2 with confidence bands. We selected τ = 0.70 as a representative operating point. Overall, the key qualitative conclusions were stable for a sensible mid-range of thresholds. In [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: Pooled transfer overlays for freeze and lesion interventions. Left: per-transition normalized reward curves aggregated across matched self-freeze and task-freeze transfer runs. The selffreeze condition remains slightly above task-freeze across much of the evaluation window. Right: pertransition normalized reward curves aggregated across matched self-lesion, task-lesion, and original transfer runs. The t… view at source ↗

**Figure 10.** Figure 10: Manipulation tasks used in the Continual World replication. The six Meta-World manipulation tasks used across the four post-validation task sets, spanning tool use, pushing, pressing, sliding, closing, and alignment-sensitive extraction. transfer pairs (592 non-tied pairs), self-freeze showed a median advantage of 1.6% of the evaluation window (1.9% mean). Bootstrap resampling preserved the predicted dir… view at source ↗

**Figure 11.** Figure 11: triA. Manipulation results for hammer-v1, faucet-close-v1, and peg-unplug-side-v1. Figures 11–14 summarize the results for the four task sets, each averaged across three seeds. In every case, the largest inferred subnetwork remained more persistent than the pooled task-like remainder, and this separation was sustained across training rather than appearing only at isolated checkpoints. The left panels trac… view at source ↗

**Figure 12.** Figure 12: triC. Manipulation results for hammer-v1, push-wall-v1, and window-close-v1 [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗

**Figure 13.** Figure 13: quadA. Manipulation results for faucet-close-v1, handle-press-side-v1, window-close-v1, and peg-unplug-side-v1 [PITH_FULL_IMAGE:figures/full_fig_p030_13.png] view at source ↗

**Figure 14.** Figure 14: pentaA. Manipulation results for hammer-v1, push-wall-v1, faucet-close-v1, stick-pull-v1, and peg-unplug-side-v1. 30 [PITH_FULL_IMAGE:figures/full_fig_p030_14.png] view at source ↗

**Figure 15.** Figure 15: Overlay view of activation matrices and persistence-scores. Each composite is formed by alpha-blending activation-matrix visualizations from all successful plateaued snapshots within a continual-learning run (cycles 16–50), so consistent structure accumulates while inconsistent correlations fade. Top: single-behavior (walk-only) controls remain diffuse and “spaghetti-like” with no equally clean core. Bot… view at source ↗

**Figure 16.** Figure 16: Examples of [PITH_FULL_IMAGE:figures/full_fig_p033_16.png] view at source ↗

**Figure 17.** Figure 17: Examples of [PITH_FULL_IMAGE:figures/full_fig_p033_17.png] view at source ↗

**Figure 18.** Figure 18: Examples of [PITH_FULL_IMAGE:figures/full_fig_p033_18.png] view at source ↗

**Figure 19.** Figure 19: Bob-only baseline (all-states reference set). Left: Persistence score of the self and task subnetworks across phases. Right: Size of the self and task subnetworks across phases. null of no separation and the stronger 15-point benchmark were rejected with overwhelming one-sided significance. S6 Single-behavior baselines across cycles As a control, we computed the same subnetwork diagnostics for single-beha… view at source ↗

**Figure 20.** Figure 20: Wiggle-only baseline. Left: Persistence score of the self and task subnetworks across phases. Right: Size of the self and task subnetworks across phases. S7 RL hyperparameters for Quadruped • Algorithm: Soft Actor–Critic (rl_games implementation). • Actor/critic architecture: 2-layer MLPs with 150 hidden units per layer and ReLU activations. • Discount factor: γ = 0.99. • Actor and critic learning rate: … view at source ↗

**Figure 21.** Figure 21: Example rollout of the training schema. An example training trace illustrating phase switching across two full walk→wiggle→bob cycles with the plateau-stop logic described above. S9 Behavior definitions and rewards All three behaviors shared the same dynamics, observation, and action spaces; they differed only in their reward shaping. Walk. For walking, we used IsaacLab’s default Ant locomotion reward, wh… view at source ↗

**Figure 22.** Figure 22: Quadruped in simulation and hardware: (left) CAD model showing the disk-shaped torso [PITH_FULL_IMAGE:figures/full_fig_p041_22.png] view at source ↗

read the original abstract

A key challenge to understanding self-awareness has been a principled way of quantifying whether an intelligent system has a concept of a "self", and if so how to differentiate the "self" from other cognitive structures. We propose that the "self" can be isolated by seeking the invariant portion of cognitive process that changes relatively little compared to more rapidly acquired cognitive knowledge and skills, because our self is the most persistent aspect of our experiences. We used this principle to analyze the cognitive structure of robots under two conditions: One robot learns a constant task, while a second robot is subjected to continual learning under variable tasks. We find that robots subjected to continual learning develop an invariant subnetwork that is significantly more stable (p < 0.001) compared to the control, and that this subnetwork is also functionally important: preserving it aids adaptation while damaging it impairs performance. We suggest that this principle can offer a window into exploring selfhood in other cognitive AI systems

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows continual-learning robots form a more stable invariant subnetwork than fixed-task controls, with a clear functional ablation effect, but the step from measured invariance to an emergent 'self' stays interpretive.

read the letter

The main result is straightforward: robots trained across changing tasks develop a subnetwork that changes less than the rest of the network, and this difference reaches p < 0.001 against the constant-task control. Keeping that subnetwork intact helps the robot adapt to new tasks; damaging it hurts performance. That contrast and the ablation test are the concrete contributions here.

Referee Report

2 major / 1 minor

Summary. The paper proposes isolating the 'self' in intelligent systems as the most invariant portion of cognitive processes, which changes little relative to acquired skills. It compares a robot learning a constant task against one under continual learning with variable tasks, reporting that the continual-learning robot develops a significantly more stable invariant subnetwork (p < 0.001). This subnetwork is functionally relevant: preserving it supports adaptation while ablating it impairs performance. The authors argue the approach provides a window into selfhood in other AI systems.

Significance. If the empirical results hold after fuller documentation, the work supplies a concrete, measurable criterion for detecting emergent invariant structures in continual-learning agents and demonstrates their functional role via preservation/ablation tests. The statistical threshold and performance impact are positive features. The interpretive step equating invariance with selfhood, however, remains an open claim that would benefit from external validation.

major comments (2)

Abstract: the reported statistical result (p < 0.001) and functional test are presented without any description of network architecture, task definitions, the precise procedure used to extract the invariant subnetwork, or controls for total training time and network size. These omissions prevent evaluation of whether the stability difference is attributable to the continual-learning condition rather than confounding variables.
Methods (assumed section describing subnetwork isolation): the invariant subnetwork is isolated by the same invariance criterion that is then used to label it the 'self'. No independent behavioral, representational, or self-other distinction test is supplied to decouple the measurement from the interpretation, leaving the central claim vulnerable to circularity.

minor comments (1)

Provide the exact stability metric, threshold for invariance, and any statistical test details (e.g., sample size, correction for multiple comparisons) used to obtain p < 0.001.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us improve the clarity and robustness of our manuscript. We address each major comment below and have made corresponding revisions.

read point-by-point responses

Referee: [—] Abstract: the reported statistical result (p < 0.001) and functional test are presented without any description of network architecture, task definitions, the precise procedure used to extract the invariant subnetwork, or controls for total training time and network size. These omissions prevent evaluation of whether the stability difference is attributable to the continual-learning condition rather than confounding variables.

Authors: We agree that the abstract should have included summaries of these key elements to facilitate immediate evaluation. Although the full details are provided in the Methods section, we have revised the abstract to briefly describe the network architecture, task definitions, the procedure used to extract the invariant subnetwork, and the controls for total training time and network size. This revision ensures that readers can assess whether the stability difference is due to the continual-learning condition. revision: yes
Referee: [—] Methods (assumed section describing subnetwork isolation): the invariant subnetwork is isolated by the same invariance criterion that is then used to label it the 'self'. No independent behavioral, representational, or self-other distinction test is supplied to decouple the measurement from the interpretation, leaving the central claim vulnerable to circularity.

Authors: We acknowledge the potential for circularity in the interpretation. The subnetwork is identified via the invariance criterion, and we interpret it as the 'self' based on the persistence principle outlined in the introduction. However, the functional relevance is demonstrated through independent preservation and ablation experiments, which show performance impacts without relying on the 'self' label. We have added a dedicated subsection in the Discussion to explicitly address this concern, including suggestions for future independent tests such as representational analyses and behavioral self-distinction tasks. This provides a stronger separation between measurement and interpretation while preserving the core contribution. revision: partial

Circularity Check

1 steps flagged

Definition of 'self' as the invariant subnetwork renders the identification tautological

specific steps

self definitional [Abstract]
"We propose that the 'self' can be isolated by seeking the invariant portion of cognitive process that changes relatively little compared to more rapidly acquired cognitive knowledge and skills, because our self is the most persistent aspect of our experiences. We used this principle to analyze the cognitive structure of robots under two conditions... We find that robots subjected to continual learning develop an invariant subnetwork that is significantly more stable (p < 0.001) compared to the control, and that this subnetwork is also functionally important"

The paper first defines 'self' as the invariant portion, then applies the invariance criterion to extract a subnetwork and declares it the emergent 'self'. The identification is therefore true by the definitional premise rather than by any separate validation that the subnetwork implements self-concept or self-other distinction.

full rationale

The paper's central interpretive step defines the 'self' explicitly as the most invariant portion of the cognitive process and then isolates the most stable subnetwork under continual learning to label it the emergent 'self'. This matches the self-definitional pattern: the label follows directly from the isolation criterion rather than from independent evidence of selfhood (e.g., self-other distinction or behavioral tests). The empirical measurements of stability (p<0.001) and functional ablation effects are direct and non-circular, but the load-bearing claim that this subnetwork constitutes a 'self' reduces to the premise used to select it. No mathematical derivation or fitted parameter is involved, keeping the overall circularity moderate.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The claim rests on one domain assumption (self equals the slowest-changing cognitive component) and introduces one invented entity (the invariant subnetwork as a candidate self). No free parameters are described in the abstract.

axioms (1)

domain assumption The self is the most persistent aspect of experiences and can therefore be isolated as the invariant portion of cognitive processes
This premise directly motivates the choice to compare stability between continual and constant-task robots.

invented entities (1)

invariant subnetwork no independent evidence
purpose: To serve as a measurable proxy for the emergent self in the robot's cognitive structure
Defined operationally as the portion of the network that changes least under continual learning; no independent evidence outside the stability metric is given.

pith-pipeline@v0.9.0 · 5471 in / 1270 out tokens · 30498 ms · 2026-05-15T00:29:03.809767+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 5 internal anchors

[1]

Locke,An Essay Concerning Human Understanding(Clarendon Press, Oxford) (1975)

J. Locke,An Essay Concerning Human Understanding(Clarendon Press, Oxford) (1975). 17

work page 1975
[2]

Ricoeur,Oneself as Another(University of Chicago Press, Chicago) (1992)

P . Ricoeur,Oneself as Another(University of Chicago Press, Chicago) (1992)

work page 1992
[3]

Parfit,Reasons and Persons(Oxford University Press, Oxford) (1984)

D. Parfit,Reasons and Persons(Oxford University Press, Oxford) (1984)

work page 1984
[4]

Merleau-Ponty,Phenomenology of Perception(Routledge, London) (2012)

M. Merleau-Ponty,Phenomenology of Perception(Routledge, London) (2012)

work page 2012
[5]

Bongard, V

J. Bongard, V. Zykov, H. Lipson, Resilient machines through continuous self-modeling. Science314(5802), 1118–1121 (2006), doi:10.1126/science.1133687

work page doi:10.1126/science.1133687 2006
[6]

B. Chen, R. Kwiatkowski, C. Vondrick, H. Lipson, Full-body visual self-modeling of robot morphologies.Science Robotics7(66), eabn8010 (2022), doi:10.1126/scirobotics. abn8010

work page doi:10.1126/scirobotics 2022
[7]

Díaz Ledezma, S

F . Díaz Ledezma, S. Haddadin, Machine learning–driven self-discovery of the robot body morphology.Science Robotics8(74), eade2241 (2023), doi:10.1126/scirobotics.ade2241

work page doi:10.1126/scirobotics.ade2241 2023
[8]

Jiang, J

S. Jiang, J. Zhang, L. Wong, Robot body schema learning from full-body ex- tero/proprioception sensors.arXiv preprint arXiv:2402.18675(2024),https://arxiv. org/abs/2402.18675

work page arXiv 2024
[9]

Pugach, A

G. Pugach, A. Pitti, O. Tolochko, P . Gaussier, Brain-inspired coding of robot body schema through visuo-motor integration of touched events.Frontiers in Neurorobotics13, 37 (2019), doi:10.3389/fnbot.2019.00037

work page doi:10.3389/fnbot.2019.00037 2019
[10]

Lipson, J

H. Lipson, J. B. Pollack, N. P . Suh, On the origin of modular variation.Evolution56(8), 1549–1556 (2002), doi:10.1111/j.0014-3820.2002.tb01466.x

work page doi:10.1111/j.0014-3820.2002.tb01466.x 2002
[11]

Clune, J.-B

J. Clune, J.-B. Mouret, H. Lipson, The evolutionary origins of modularity.Proceedings of the Royal Society B: Biological Sciences280(1755), 20122863 (2013), doi:10.1098/rspb. 2012.2863

work page doi:10.1098/rspb 2013
[12]

Celestial Mechan- ics and Dynamical Astronomy83, 155–169 (2002) https://doi.org/10.1023/A: 1020143116091

R. Caruana, Multitask Learning.Machine Learning28, 41–75 (1997), doi:10.1023/A: 1007379606734. 18

work page doi:10.1023/a: 1997
[13]

Learning Shared Representations in Multi-task Reinforcement Learning

D. Borsa, T. Graepel, J. Shawe-Taylor, Learning Shared Representations in Multi-task Reinforcement Learning, arXiv (2016), doi:10.48550/arXiv.1603.02041,https://arxiv. org/abs/1603.02041

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1603.02041 2016
[14]

D’Eramo, D

C. D’Eramo, D. Tateo, A. Bonarini, M. Restelli, J. Peters, Sharing Knowledge in Multi-Task Deep Reinforcement Learning, inInternational Conference on Learning Representations (ICLR)(2020),https://openreview.net/forum?id=rkgpv2VFvr

work page 2020
[15]

Khetarpal, M

K. Khetarpal, M. Riemer, I. Rish, D. Precup, Towards Continual Reinforcement Learning: A Review and Perspectives.Journal of Artificial Intelligence Research75, 1401–1476 (2022), doi:10.1613/JAIR.1.13673

work page doi:10.1613/jair.1.13673 2022
[16]

D.; Rybski, D.; Andrade, J

J. Kirkpatrick,et al., Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences114(13), 3521–3526 (2017), doi:10.1073/pnas. 1611835114,https://www.pnas.org/doi/10.1073/pnas.1611835114

work page doi:10.1073/pnas 2017
[17]

A. A. Rusu,et al., Progressive Neural Networks, arXiv (2016), doi:10.48550/arXiv.1606. 04671,https://arxiv.org/abs/1606.04671

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1606 2016
[18]

Wołczyk, M

M. Wołczyk, M. Zaj ˛ ac, R. Pascanu, Ł. Kuci´nski, P . Miło´s, Continual World: A Robotic Benchmark For Continual Reinforcement Learning.arXiv preprint arXiv:2105.10919 (2021), doi:10.48550/arXiv.2105.10919

work page doi:10.48550/arxiv.2105.10919 2021
[19]

Yu,et al., Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforce- ment Learning, inProceedings of the 4th Conference on Robot Learning (CoRL)(2020)

T. Yu,et al., Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforce- ment Learning, inProceedings of the 4th Conference on Robot Learning (CoRL)(2020)

work page 2020
[20]

V. A. C. Horta, I. Tiddi, S. Little, A. Mileo, Extracting knowledge from deep neural networks through graph analysis.Future Generation Computer Systems120, 109–118 (2021), doi:10.1016/j.future.2021.02.009,https://www.sciencedirect.com/science/ article/pii/S0167739X21000613. 19

work page doi:10.1016/j.future.2021.02.009 2021
[21]

A. M. Weil, V. A. C. Horta, H. Qadeer, A. Mileo, Adapting Graph-Based Analysis for Knowledge Extraction from Transformer Models, inProceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning, vol. 284 ofProceedings of Ma- chine Learning Research(2025), pp. 1–14,https://proceedings.mlr.press/v284/ weil25a.html

work page 2025
[22]

Cuthill, J

E. Cuthill, J. McKee, Reducing the Bandwidth of Sparse Symmetric Matrices, inProceed- ings of the 1969 24th National Conference, ACM ’69 (Association for Computing Machin- ery, New Y ork, NY , USA) (1969), pp. 157–172, doi:10.1145/800195.805928

work page doi:10.1145/800195.805928 1969
[23]

George, J

A. George, J. W. H. Liu,Computer Solution of Large Sparse Positive Definite Systems (Prentice-Hall, Englewood Cliffs, NJ, USA) (1981)

work page 1981
[24]

Puiutta, E

E. Puiutta, E. M. S. P . Veith, Explainable Reinforcement Learning: A Survey, arXiv (2020), doi:10.48550/arXiv.2005.06247,https://arxiv.org/abs/2005.06247

work page doi:10.48550/arxiv.2005.06247 2020
[25]

Milani, N

S. Milani, N. Topin, M. Veloso, F . Fang, A Survey of Explainable Reinforcement Learning, arXiv (2022), doi:10.48550/arXiv.2202.08434,https://arxiv.org/abs/2202.08434

work page doi:10.48550/arxiv.2202.08434 2022
[26]

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

F . Acero, Z. Li, Distilling Reinforcement Learning Policies for Interpretable Robot Locomo- tion: Gradient Boosting Machines and Symbolic Regression (2024), doi:10.48550/arXiv. 2403.14328,https://arxiv.org/abs/2403.14328

work page internal anchor Pith review doi:10.48550/arxiv 2024
[27]

S. Lomasov,et al., Exploring Human-AI Conceptual Alignment through the Prism of Chess, inAdvances in Neural Information Processing Systems 39 (NeurIPS 2025), Cre- ative AI Track(2025), also available as arXiv:2510.26025

work page arXiv 2025
[28]

PathNet: Evolution Channels Gradient Descent in Super Neural Networks

C. Fernando,et al., PathNet: Evolution Channels Gradient Descent in Super Neural Net- works, arXiv (2017), doi:10.48550/arXiv.1701.08734,https://arxiv.org/abs/1701. 08734. 20

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1701.08734 2017
[29]

Lipson, Principles of modularity, regularity, and hierarchy for scalable systems.Journal of Biological Physics and Chemistry7(4), 125–128 (2007), doi:10.4024/40701.jbpc.07.04

H. Lipson, Principles of modularity, regularity, and hierarchy for scalable systems.Journal of Biological Physics and Chemistry7(4), 125–128 (2007), doi:10.4024/40701.jbpc.07.04

work page doi:10.4024/40701.jbpc.07.04 2007
[30]

Y osinski, J

J. Y osinski, J. Clune, Y . Bengio, H. Lipson, How transferable are features in deep neu- ral networks?, inAdvances in Neural Information Processing Systems, vol. 27 (2014), pp. 3320–3328,https://proceedings.neurips.cc/paper_files/paper/2014/file/ 532a2f85b6977104bc93f8580abbb330-Paper.pdf

work page 2014
[31]

R. S. Sutton, D. Precup, S. Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning.Artificial Intelligence112(1–2), 181– 211 (1999), doi:10.1016/S0004-3702(99)00052-1,https://www.sciencedirect.com/ science/article/pii/S0004370299000521

work page doi:10.1016/s0004-3702(99)00052-1 1999
[32]

P . M. Wyder,et al., Robot metabolism: Toward machines that can grow by consum- ing other machines.Science Advances11(29), eadu6897 (2025), doi:10.1126/sciadv. adu6897

work page doi:10.1126/sciadv 2025
[33]

Farama Foundation, Ant, Gymnasium Documentation (MuJoCo Environments) (2026), https://gymnasium.farama.org/environments/mujoco/ant/, accessed: 2026-02- 18

work page 2026
[34]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

M. Mittal,et al., Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning (2025), doi:10.48550/arXiv.2511.04831,https://arxiv.org/abs/2511. 04831

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.04831 2025
[35]

Haarnoja, A

T. Haarnoja, A. Zhou, P . Abbeel, S. Levine, Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, inInternational Conference on Ma- chine Learning (ICML)(2018),https://proceedings.mlr.press/v80/haarnoja18b. html

work page 2018
[36]

Haarnoja,et al., Soft Actor-Critic Algorithms and Applications, arXiv (2018)

T. Haarnoja,et al., Soft Actor-Critic Algorithms and Applications, arXiv (2018). 21

work page 2018
[37]

Raghu, J

M. Raghu, J. Gilmer, J. Y osinski, J. Sohl-Dickstein, SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability, inAdvances in Neu- ral Information Processing Systems (NeurIPS)(2017),https://proceedings.neurips. cc/paper/2017/hash/dc6a7e655d7e5840e66733e9ee67cc69-Abstract.html

work page 2017
[38]

Kornblith, M

S. Kornblith, M. Norouzi, H. Lee, G. Hinton, Similarity of Neural Network Representa- tions Revisited, inInternational Conference on Machine Learning (ICML)(2019),https: //proceedings.mlr.press/v97/kornblith19a.html

work page 2019
[39]

M. B. Eisen, P . T. Spellman, P . O. Brown, D. Botstein, Cluster analysis and display of genome-wide expression patterns.Proceedings of the National Academy of Sciences 95(25), 14863–14868 (1998), doi:10.1073/pnas.95.25.14863

work page doi:10.1073/pnas.95.25.14863 1998
[40]

Entezari, H

R. Entezari, H. Sedghi, O. Saukh, B. Neyshabur, The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks, inInternational Conference on Learning Representations (ICLR)(2022),https://openreview.net/forum?id=dNigytemkL

work page 2022
[41]

S. K. Ainsworth, J. Hayase, S. Srinivasa, Git Re-Basin: Merging Models modulo Permuta- tion Symmetries, inInternational Conference on Learning Representations (ICLR)(2023), doi:10.48550/arXiv.2209.04836,https://openreview.net/forum?id=CQsmMYmlP5T

work page doi:10.48550/arxiv.2209.04836 2023
[42]

B. Simsek,et al., Geometry of the Loss Landscape in Overparameterized Neural Net- works: Symmetries and Invariances, inInternational Conference on Learning Represen- tations (ICML), vol. 139 ofProceedings of Machine Learning Research(2021), pp. 9722– 9732,https://proceedings.mlr.press/v139/simsek21a.html

work page 2021
[43]

H. W. Kuhn, The Hungarian Method for the Assignment Problem.Naval Research Logis- tics Quarterly2, 83–97 (1955), doi:10.1002/nav.3800020109

work page doi:10.1002/nav.3800020109 1955
[44]

Munkres, Algorithms for the Assignment and Transportation Problems.Journal of the Society for Industrial and Applied Mathematics5(1), 32–38 (1957), doi:10.1137/0105003

J. Munkres, Algorithms for the Assignment and Transportation Problems.Journal of the Society for Industrial and Applied Mathematics5(1), 32–38 (1957), doi:10.1137/0105003. 22

work page doi:10.1137/0105003 1957
[45]

S. Kim, S. Lee, Continual Learning with Neuron Activation Importance, inImage Analysis and Processing – ICIAP 2022, S. Sclaroff, C. Distante, M. Leo, G. M. Farinella, F . Tombari, Eds. (Springer, Cham), vol. 13231 ofLecture Notes in Computer Science(2022), doi: 10.1007/978-3-031-06427-2_26

work page doi:10.1007/978-3-031-06427-2_26 2022
[46]

D. Bau,et al., Understanding the role of individual units in a deep neural network.Pro- ceedings of the National Academy of Sciences(2020), doi:10.1073/pnas.1907375117, https://www.pnas.org/doi/10.1073/pnas.1907375117

work page doi:10.1073/pnas.1907375117 2020
[47]

A. S. Morcos, D. G. T. Barrett, N. C. Rabinowitz, M. Botvinick, On the Importance of Single Directions for Generalization, inInternational Conference on Learning Representations (ICLR)(2018),https://openreview.net/forum?id=r1iuQjxCZ

work page 2018
[48]

Biometrics Bulletin , author =

F . Wilcoxon, Individual Comparisons by Ranking Methods.Biometrics Bulletin1(6), 80–83 (1945), doi:10.2307/3001968. 23 Code Availability All code is available athttps://github.com/adidevj7/EmergentRobotSelf. The repository includes the Isaac-based training pipeline, the full analysis toolkit used in this study, experi- ment configuration files, and valida...

work page doi:10.2307/3001968 1945