Sensorimotor Self-Recognition in Multimodal Large Language Model-Driven Robots

Diego Torricelli; Eduardo Rocon; Gabriel Delgado-Oleas; I\~naki Dellibarda Varela; Jose Ignacio Serrano; Manuel Cebrian; Maria Dolores del Castillo Sobrino; Pablo Romero-Sorozabal

arxiv: 2505.19237 · v2 · submitted 2025-05-25 · 💻 cs.AI · cs.RO

Sensorimotor Self-Recognition in Multimodal Large Language Model-Driven Robots

I\~naki Dellibarda Varela , Pablo Romero-Sorozabal , Diego Torricelli , Gabriel Delgado-Oleas , Jose Ignacio Serrano , Maria Dolores del Castillo Sobrino , Eduardo Rocon , Manuel Cebrian This is my paper

Pith reviewed 2026-05-19 13:25 UTC · model grok-4.3

classification 💻 cs.AI cs.RO

keywords self-recognitionmultimodal LLMembodied AIsensorimotor experienceminimal selfrobot autonomysensory integrationartificial selfhood

0 comments

The pith

Multimodal large language models integrated into robots develop self-recognition from sensorimotor experience, opening a route to artificial selfhood.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether a multimodal LLM placed inside a mobile robot can build an internal sense of its own body and place in the world when given ongoing sensory streams. It reports that the combined system shows environmental awareness, identifies itself as a robot, and anticipates its own movements, with statistical models tracing how different senses feed into separate aspects of this minimal self and how memory links past and present states. A sympathetic reader would care because self-recognition is described as the starting point for autonomous behavior, so success here would mean LLMs can move beyond text patterns toward grounded, embodied cognition when properly situated in physical agents.

Core claim

Integrating a multimodal LLM into an autonomous mobile robot produces robust environmental awareness, self-identification, and predictive awareness that lets the system infer its own robotic nature and motion characteristics. Structural equation modeling shows how sensory integration shapes distinct dimensions of the minimal self and coordinates them with structured and episodic memory, while ablation of sensory channels reveals compensatory interactions among inputs and confirms memory's essential role.

What carries the argument

Sensorimotor integration through the multimodal LLM, which fuses visual, proprioceptive and other streams to construct and maintain an internal representation of the robot's body within its surroundings.

If this is right

The robot distinguishes its own body and actions from surrounding objects using fused sensory data.
Removal of one sensory channel is offset by strengthened use of remaining channels to preserve self-identification.
Structured and episodic memory are required to link current sensations with past states for consistent self-recognition.
The resulting internal associations form a hierarchical structure that drives explicit self-identification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same integration pattern could be tried on different robot bodies or with newer multimodal models to check whether self-recognition generalizes beyond the specific platform used here.
If the approach scales, it supplies a concrete way to test whether minimal self representations can support more complex behaviors such as long-term planning or social interaction.
The work suggests that embodied selfhood may not require new architectures but can emerge when existing language models receive sustained, multimodal feedback from a physical body.

Load-bearing premise

The robot's spoken descriptions and action forecasts reflect a genuine internal model of itself rather than surface pattern matching from its training data or the prompt.

What would settle it

A test in which the robot is placed in a novel environment or given contradictory sensory feedback yet still claims the same self-identity and motion predictions as in the original trials would falsify the claim that the behavior arises from integrated sensorimotor self-representation.

read the original abstract

Self-recognition -- the ability to maintain an internal representation of one's own body within the environment -- underpins intelligent, autonomous behavior. As a foundational component of the minimal self, self-recognition provides the initial substrate from which higher forms of self-awareness may eventually emerge. Recent advances in large language models achieve human-like performance in tasks integrating multimodal information, raising growing interest in the embodiment capabilities of AI agents deployed on nonhuman platforms such as robots. We investigate whether multimodal LLMs can develop self-recognition through sensorimotor experience by integrating an LLM into an autonomous mobile robot. The system exhibits robust environmental awareness, self-identification, and predictive awareness, enabling it to infer its robotic nature and motion characteristics. Structural equation modeling reveals how sensory integration influences distinct dimensions of the minimal self and their coordination with past-present memory, as well as the hierarchical internal associations that drive self-identification. Ablation tests of sensory inputs demonstrate compensatory interactions among sensors and confirm the essential role of structured and episodic memory. Given appropriate sensory information about the world and itself, multimodal LLMs open the door to artificial selfhood in embodied cognitive systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a multimodal LLM on a physical robot producing self-identification reports and motion predictions, but the evidence stays tied to the model's own language outputs without clear external checks.

read the letter

The main takeaway is that the authors wired a multimodal LLM into a real mobile robot, fed it live sensor streams, and observed the system describe its own body, environment, and likely movements. They back this with structural equation modeling on how sensory channels feed into different aspects of a minimal self, plus memory ablations that show structured and episodic memory matter for the reported behaviors.

Referee Report

3 major / 2 minor

Summary. The paper claims that integrating a multimodal LLM into an autonomous mobile robot enables the development of self-recognition through sensorimotor experience. The system is reported to exhibit environmental awareness, self-identification, and predictive awareness of its robotic nature and motion. Structural equation modeling is used to show how sensory integration influences dimensions of the minimal self and their coordination with memory, while ablation tests demonstrate compensatory sensor interactions and the essential role of structured and episodic memory. The work concludes that appropriate sensory information allows multimodal LLMs to open the door to artificial selfhood in embodied systems.

Significance. If the central claims hold under rigorous validation, the work would be significant for embodied AI and cognitive robotics by providing empirical support for sensorimotor routes to minimal self in LLM-driven agents. It extends multimodal model capabilities to physical platforms and introduces SEM-based analysis of self-dimensions, which could inform future architectures for autonomous systems with internal self-models.

major comments (3)

[Abstract] Abstract: The description of ablation tests and structural equation modeling provides no quantitative results, error bars, baseline comparisons, statistical significance values, or details on prompt engineering and data exclusion criteria. This absence makes it impossible to assess whether the reported sensory integration effects and memory ablations actually support the central claim of emergent self-recognition.
[Results] Results/Interpretation sections: Evidence for self-identification and predictive awareness rests primarily on the LLM's own generated verbal reports and action predictions. Without independent behavioral metrics, external validation, or controls isolating embodiment (such as identical prompts and memory structures supplied with synthetic rather than real sensor streams), these outputs remain compatible with prompt-driven pattern completion from training data rather than an internally updated self-model.
[Methods] Methods: The manuscript does not report controls that decouple the contribution of real sensorimotor loops from structured context about 'self' and 'robot body'. This leaves open the possibility that observed self-identification arises from surface-level completion rather than sensorimotor updating, directly bearing on the claim that the system infers its robotic nature through experience.

minor comments (2)

[Abstract] The abstract and main text could more explicitly define the latent dimensions of the minimal self used in the SEM analysis and how they are operationalized from LLM outputs.
[Figures/Tables] Figure captions and table presentations of ablation results should include exact sample sizes, variance measures, and comparison conditions to improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. These highlight key opportunities to improve the transparency and rigor of our evidence for sensorimotor self-recognition in multimodal LLM-driven robots. We respond point by point below and commit to revisions that directly address the concerns raised.

read point-by-point responses

Referee: [Abstract] Abstract: The description of ablation tests and structural equation modeling provides no quantitative results, error bars, baseline comparisons, statistical significance values, or details on prompt engineering and data exclusion criteria. This absence makes it impossible to assess whether the reported sensory integration effects and memory ablations actually support the central claim of emergent self-recognition.

Authors: We agree that the abstract lacks the quantitative detail needed for full evaluation. In the revised version we will expand the abstract to report key SEM results (standardized coefficients, significance levels, and model fit statistics) along with ablation outcomes including performance deltas, error bars, and baseline comparisons. The Methods section will be updated with explicit descriptions of prompt engineering protocols and data exclusion criteria to allow readers to assess support for the sensory integration and memory claims. revision: yes
Referee: [Results] Results/Interpretation sections: Evidence for self-identification and predictive awareness rests primarily on the LLM's own generated verbal reports and action predictions. Without independent behavioral metrics, external validation, or controls isolating embodiment (such as identical prompts and memory structures supplied with synthetic rather than real sensor streams), these outputs remain compatible with prompt-driven pattern completion from training data rather than an internally updated self-model.

Authors: The concern is valid: verbal reports alone leave room for alternative explanations. We will add independent behavioral metrics extracted from logged robot trajectories and interaction success rates. We will also introduce control conditions that supply identical prompts and memory structures but replace real sensor streams with synthetic equivalents. These additions will allow direct comparison and help establish that self-identification reflects ongoing sensorimotor updating rather than static pattern completion. revision: yes
Referee: [Methods] Methods: The manuscript does not report controls that decouple the contribution of real sensorimotor loops from structured context about 'self' and 'robot body'. This leaves open the possibility that observed self-identification arises from surface-level completion rather than sensorimotor updating, directly bearing on the claim that the system infers its robotic nature through experience.

Authors: We recognize that the existing sensory-ablation results do not fully isolate real-time sensorimotor updating from pre-provided contextual knowledge. In revision we will add and report explicit control experiments in which the model receives structured self- and body-related context but operates without live sensorimotor input, contrasting these against the full embodied condition. This will more directly test whether self-recognition depends on sensorimotor experience. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical setup integrating a multimodal LLM into a mobile robot and reports observed behaviors (environmental awareness, self-identification via verbal reports and actions), supported by structural equation modeling on sensory integration data and ablation tests on memory and inputs. These steps rely on external experimental measurements and statistical analysis of outputs rather than defining the target phenomenon in terms of itself or renaming fitted parameters as predictions. No equations or self-citation chains reduce the central claim to its inputs by construction; the derivation remains self-contained against the described behavioral and modeling benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that LLM-generated language about self and motion constitutes evidence of internal representation; no free parameters are explicitly named in the abstract, but the structural equation model necessarily introduces latent variables whose values are fitted to the observed behaviors.

free parameters (1)

latent self-dimensions in SEM
Structural equation modeling fits latent variables that represent dimensions of the minimal self; these are not measured directly and are inferred from the data.

axioms (1)

domain assumption LLM outputs can be treated as veridical reports of internal states
The paper interprets the model's self-descriptions as evidence of self-recognition without independent verification that the descriptions reflect genuine internal modeling.

pith-pipeline@v0.9.0 · 5757 in / 1390 out tokens · 41757 ms · 2026-05-19T13:25:50.600745+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean bare_distinguishability_of_absolute_floor echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

differentiation, discriminating self-generated from external events through sensorimotor contingencies

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 9 internal anchors

[1]

Evolutionary-scale prediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, March 2023

G. Gallup, Chimpanzees: Self-Recognition. Science 167, 86–87 (1970), doi:10.1126/science. 167.3914.86

work page doi:10.1126/science 1970
[2]

Rizzolatti, L

G. Rizzolatti, L. Craighero, The Mirror-Neuron System. Annual Review of Neuroscience 27, 169–192 (2004), doi:10.1146/annurev.neuro.27.070203.1 44230

work page doi:10.1146/annurev.neuro.27.070203.1 2004
[3]

A. D. Craig, How do you feel–now? The anterior insula and hu man awareness. Nature Re- views Neuroscience 10 (1), 59–70 (2009), doi:10.1038/nrn2555, https://doi.org/10. 1038/nrn2555

work page doi:10.1038/nrn2555 2009
[4]

A. M. Turing, Computing Machinery and Intelligence. Mind 59 (236), 433–460 (1950), http: //www.jstor.org/stable/2251299

work page arXiv 1950
[5]

Watchus, Towards Self-Aware AI: Embodiment, Feedback Loops, and the Role of the Insula in Consciousness

B. Watchus, Towards Self-Aware AI: Embodiment, Feedback Loops, and the Role of the Insula in Consciousness. Preprints 2024110661 (2024), doi:10.20944/preprints202411.0661. v1, https://doi.org/10.20944/preprints202411.0661.v1

work page doi:10.20944/preprints202411.0661 2024
[6]

L. Li, C. Li, Enabling self-identiﬁcation in intelligent agent: insights from computational psychoanalysis (2024), https://arxiv.org/abs/2403.07664

work page arXiv 2024
[7]

Y . K. Georgie, G. Schillaci, V . V . Hafner, An interdisciplinary overview of developmental in- dices and behavioral measures of the minimal self. 2019 Joint IEEE 9th International Confer- ence on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2019 pp. 129–136 (2019), doi:10.1109/DEVLRN.2019.8850703, https://arxiv.org/pdf/1907.00709

work page doi:10.1109/devlrn.2019.8850703 2019
[8]

V . V . Hafner, P . Loviken, A. P . Villalpando, G. Schillaci, Prerequisites for an Artiﬁcial Self. Frontiers in Neurorobotics 14, 423754 (2020), doi:10.3389/FNBOT.2020.00005/BIBTEX, www.frontiersin.org

work page doi:10.3389/fnbot.2020.00005/bibtex 2020
[9]

Pfeifer, C

R. Pfeifer, C. Scheier, Understanding Intelligence (MIT Press, Cambridge, MA) (1999)

work page 1999
[10]

Dehaene, H

S. Dehaene, H. Lau, S. Kouider, What is consciousness, an d could machines have it? Science 358 (6362), 486–492 (2017), doi:10.1126/science.aan8871. 16

work page doi:10.1126/science.aan8871 2017
[11]

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al

D. Silver, et al., Mastering the game of Go with deep neural networks and tree s earch. Nature 529 (7587), 484–489 (2016), doi:10.1038/nature16961

work page doi:10.1038/nature16961 2016
[12]

B. M. Lake, T. D. Ullman, J. B. Tenenbaum, S. J. Gershman, B uilding machines that learn and think like people. Behavioral and Brain Sciences 40, e253 (2017), doi:10.1017/ S0140525X16001837

work page 2017
[13]

Rahwan, et al., Machine behaviour

I. Rahwan, et al., Machine behaviour. Nature 568 (7753), 477–486 (2019)

work page 2019
[14]

Large language models for mathematical reasoning: Progresses and challenges.arXiv preprint arXiv:2402.00157, 2024

J. Ahn, et al., Large Language Models for Mathematical Reasoning: Progresses and Challenges (2024), https://arxiv.org/abs/2402.00157

work page arXiv 2024
[15]

Chang, et al., A Survey on Evaluation of Large Language Models (2023),https://arxiv

Y . Chang, et al., A Survey on Evaluation of Large Language Models (2023),https://arxiv. org/abs/2307.03109

work page arXiv 2023
[16]

K. M. Collins, et al., Evaluating Language Models for Mathematics Through Interactions. Pro- ceedings of the National Academy of Sciences of the United St ates of America 121 (24), e2318124121 (2024), doi:10.1073/pnas.2318124121, https://doi.org/10.1073/pnas. 2318124121

work page doi:10.1073/pnas.2318124121 2024
[17]

J. W. A. Strachan, et al., Testing theory of mind in large language models and humans. Nature Human Behaviour 8 (7), 1285–1295 (2024), doi:10.1038/s41562-024-01882-z

work page doi:10.1038/s41562-024-01882-z 2024
[18]

OpenAI, et al., GPT-4 Technical Report (2024), https://arxiv.org/abs/2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2024
[19]

PaLM-E: An Embodied Multimodal Language Model

D. Driess, et al. , PaLM-E: An Embodied Multimodal Language Model (2023), https:// arxiv.org/abs/2303.03378

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

S. Wu, H. Fei, L. Qu, W. Ji, T.-S. Chua, NExT-GPT: Any-to-A ny Multimodal LLM (2024), https://arxiv.org/abs/2309.05519

work page arXiv 2024
[21]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

G. Team, et al., Gemini 1.5: Unlocking multimodal understanding across mi llions of tokens of context (2024), https://arxiv.org/abs/2403.05530

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

G. R. Team, et al. , Gemini Robotics: Bringing AI into the Physical World (2025 ), https: //arxiv.org/abs/2503.20020. 17

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

M. Ahn, et al., Do As I Can, Not As I Say: Grounding Language in Robotic Aﬀordances (2022), https://arxiv.org/abs/2204.01691

work page internal anchor Pith review Pith/arXiv arXiv 2022
[24]

Zheng, R

L. Zheng, R. Mei, B. Zou, et al., GMM-searcher: eﬃcient ob ject search in large-scale scenes using large language models. Scientiﬁc Reports 15, 16709 (2025), doi:10.1038/ s41598-025-00788-8

work page 2025
[25]

Mon-Williams, G

R. Mon-Williams, G. Li, R. Long, et al. , Embodied large language models enable robots to complete complex tasks in unpredictable environments. Nature Machine Intelligence 7, 592–601 (2025), doi:10.1038/s42256-025-01005-x

work page doi:10.1038/s42256-025-01005-x 2025
[26]

Zhang, J

C. Zhang, J. Chen, J. Li, Y . Peng, Z. Mao, Large language models for human–robot interaction: A review. Biomimetic Intelligence and Robotics 3 (4), 100131 (2023), doi:https://doi.org/10. 1016/j.birob.2023.100131, https://www.sciencedirect.com/science/article/pii/ S2667379723000451

work page arXiv 2023
[27]

Menon, 20 years of the default mode network: A review an d synthesis

V . Menon, 20 years of the default mode network: A review an d synthesis. Neuron 111 (16), 2469–2484 (2023), doi:10.1016/j.neuron.2023.04.023

work page doi:10.1016/j.neuron.2023.04.023 2023
[28]

M. E. Raichle, et al., A default mode of brain function. Proceedings of the National Academy of Sciences 98 (2), 676–682 (2001), doi:10.1073/pnas.98.2.676

work page doi:10.1073/pnas.98.2.676 2001
[29]

Northoﬀ, et al., Self-referential processing in our brain–a meta-analysi s of imaging studies on the self

G. Northoﬀ, et al., Self-referential processing in our brain–a meta-analysi s of imaging studies on the self. NeuroImage 31 (1), 440–457 (2006), doi:10.1016/j.neuroimage.2005.12. 002

work page doi:10.1016/j.neuroimage.2005.12 2006
[30]

Rochat, Five Levels of Self-Awareness as They Unfold E arly in Life

P . Rochat, Five Levels of Self-Awareness as They Unfold E arly in Life. Consciousness and Cognition 12 (4), 717–731 (2003), doi:10.1016/S1053-8100(03)00081-3

work page doi:10.1016/s1053-8100(03)00081-3 2003
[31]

Gemini: A Family of Highly Capable Multimodal Models

G. Team, et al. , Gemini: A Family of Highly Capable Multimodal Models (2024 ), https: //arxiv.org/abs/2312.11805

work page internal anchor Pith review Pith/arXiv arXiv 2024
[32]

Hercz, W

T. Hercz, W. Liu, Mecabot User Manual (2024), http://www.roboworks.net, version 20240501, Roboworks

work page 2024
[33]

S. M. Mousavi, et al., Gemini and Physical World: Large Language Models Can Estim ate the Intensity of Earthquake Shaking from Multimodal Social Med ia Posts. Geophysical Journal 18 International 240 (2), 1281–1294 (2025), doi:10.1093/gji/ggae436, https://doi.org/10. 1093/gji/ggae436

work page doi:10.1093/gji/ggae436 2025
[34]

Prasad, M

D. Prasad, M. Pimpude, A. Alankar, Towards Development o f Automated Knowledge Maps and Databases for Materials Engineering using Large Langua ge Models (2024), https:// arxiv.org/abs/2402.11323

work page arXiv 2024
[35]

Gemma: Open Models Based on Gemini Research and Technology

G. Team, et al. , Gemma: Open Models Based on Gemini Research and Technology (2024), https://arxiv.org/abs/2403.08295

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

Retrieval-Augmented Generation for Large Language Models: A Survey

Y . Gao, et al., Retrieval-Augmented Generation for Large Language Models: A Survey (2024), https://arxiv.org/abs/2312.10997

work page internal anchor Pith review Pith/arXiv arXiv 2024
[37]

Shi, et al., Optimization-based Prompt Injection Attack to LLM-as-a- Judge (2025), https: //arxiv.org/abs/2403.17710

J. Shi, et al., Optimization-based Prompt Injection Attack to LLM-as-a- Judge (2025), https: //arxiv.org/abs/2403.17710

work page arXiv 2025
[38]

Li, et al., Generative Judge for Evaluating Alignment (2023), https://arxiv.org/abs/ 2310.05470

J. Li, et al., Generative Judge for Evaluating Alignment (2023), https://arxiv.org/abs/ 2310.05470

work page arXiv 2023
[39]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

L. Zheng, et al., Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena (2023), https: //arxiv.org/abs/2306.05685

work page internal anchor Pith review Pith/arXiv arXiv 2023
[40]

L. J. Cronbach, P . E. Meehl, Construct validity in psycho logical tests. Psychological Bulletin 52 (4), 281–302 (1955), doi:10.1037/h0040957

work page doi:10.1037/h0040957 1955
[41]

M. R. Longo, F. Sch¨ u¨ ur, M. P . Kammers, M. Tsakiris, P . Ha ggard, What is em- bodiment? A psychometric approach. Cognition 107 (3), 978–998 (2008), doi:https: //doi.org/10.1016/j.cognition.2007.12.004, https://www.sciencedirect.com/science/ article/pii/S0010027708000061

work page doi:10.1016/j.cognition.2007.12.004 2008
[42]

M. Gao, X. Hu, J. Ruan, X. Pu, X. Wan, LLM-based NLG Evaluat ion: Current Status and Challenges (2025), https://arxiv.org/abs/2402.01383

work page arXiv 2025
[43]

Kim, et al., Prometheus: Inducing Fine-grained Evaluation Capabilit y in Language Models (2024), https://arxiv.org/abs/2310.08491

S. Kim, et al., Prometheus: Inducing Fine-grained Evaluation Capabilit y in Language Models (2024), https://arxiv.org/abs/2310.08491. 19

work page arXiv 2024
[44]

E. Goh, R. Gallo, J. Hom, et al. , Large Language Model Inﬂuence on Diagnostic Rea- soning: A Randomized Clinical Trial. JAMA Network Open 7 (10), e2440969 (2024), doi:10.1001/jamanetworkopen.2024.40969, https://jamanetwork.com/article.aspx? doi=10.1001/jamanetworkopen.2024.40969

work page doi:10.1001/jamanetworkopen.2024.40969 2024
[45]

Giannakopoulos, A

K. Giannakopoulos, A. Kavadella, A. A. Salim, V . Stamatopoulos, E. Kaklamanos, Evaluation of the Performance of Generative AI Large Language Models Ch atGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistr y: Comparative Mixed Methods Study. Journal of Medical Internet Research 25, e51580 (2023), doi:10.2196/51580, https: //www...

work page doi:10.2196/51580 2023
[46]

M. W.-L. Cheung, Meta-Analysis: A Structural Equation Modeling Approach(Wiley, Hoboken, NJ) (2015), doi:10.1002/9781118957813

work page doi:10.1002/9781118957813 2015
[47]

Raykov, G

T. Raykov, G. A. Marcoulides, A First Course in Structural Equation Modeling (Routledge), 2nd ed. (2006), doi:10.4324/9780203930687

work page doi:10.4324/9780203930687 2006
[48]

L. B. Merabet, et al., Rapid and reversible recruitment of early visual cortex fo r touch. PLoS One 3 (8), e3046 (2008), doi:10.1371/journal.pone.0003046

work page doi:10.1371/journal.pone.0003046 2008
[49]

A. J. King, Crossmodal plasticity and hearing capabilit ies following blindness. Cell Tissue Res. 361 (1), 295–300 (2015), doi:10.1007/s00441-015-2175-y

work page doi:10.1007/s00441-015-2175-y 2015
[50]

sensors” become “sources of information

S. G. Lomber, M. A. Meredith, A. Kral, Cross-modal plasti city in speciﬁc auditory cortices underlies visual compensations in the deaf. Nature Neuroscience 13, 1421–1427 (2010), doi: 10.1038/nn.2653. Acknowledgments The authors would like to thank Rafael Sendra-Arranz and ´Alvaro Guti´errez for their discussions and technical input during the development ...

work page doi:10.1038/nn.2653 2010

[1] [1]

Evolutionary-scale prediction of atomic-level protein structure with a language model.Science, 379(6637):1123–1130, March 2023

G. Gallup, Chimpanzees: Self-Recognition. Science 167, 86–87 (1970), doi:10.1126/science. 167.3914.86

work page doi:10.1126/science 1970

[2] [2]

Rizzolatti, L

G. Rizzolatti, L. Craighero, The Mirror-Neuron System. Annual Review of Neuroscience 27, 169–192 (2004), doi:10.1146/annurev.neuro.27.070203.1 44230

work page doi:10.1146/annurev.neuro.27.070203.1 2004

[3] [3]

A. D. Craig, How do you feel–now? The anterior insula and hu man awareness. Nature Re- views Neuroscience 10 (1), 59–70 (2009), doi:10.1038/nrn2555, https://doi.org/10. 1038/nrn2555

work page doi:10.1038/nrn2555 2009

[4] [4]

A. M. Turing, Computing Machinery and Intelligence. Mind 59 (236), 433–460 (1950), http: //www.jstor.org/stable/2251299

work page arXiv 1950

[5] [5]

Watchus, Towards Self-Aware AI: Embodiment, Feedback Loops, and the Role of the Insula in Consciousness

B. Watchus, Towards Self-Aware AI: Embodiment, Feedback Loops, and the Role of the Insula in Consciousness. Preprints 2024110661 (2024), doi:10.20944/preprints202411.0661. v1, https://doi.org/10.20944/preprints202411.0661.v1

work page doi:10.20944/preprints202411.0661 2024

[6] [6]

L. Li, C. Li, Enabling self-identiﬁcation in intelligent agent: insights from computational psychoanalysis (2024), https://arxiv.org/abs/2403.07664

work page arXiv 2024

[7] [7]

Y . K. Georgie, G. Schillaci, V . V . Hafner, An interdisciplinary overview of developmental in- dices and behavioral measures of the minimal self. 2019 Joint IEEE 9th International Confer- ence on Development and Learning and Epigenetic Robotics, ICDL-EpiRob 2019 pp. 129–136 (2019), doi:10.1109/DEVLRN.2019.8850703, https://arxiv.org/pdf/1907.00709

work page doi:10.1109/devlrn.2019.8850703 2019

[8] [8]

V . V . Hafner, P . Loviken, A. P . Villalpando, G. Schillaci, Prerequisites for an Artiﬁcial Self. Frontiers in Neurorobotics 14, 423754 (2020), doi:10.3389/FNBOT.2020.00005/BIBTEX, www.frontiersin.org

work page doi:10.3389/fnbot.2020.00005/bibtex 2020

[9] [9]

Pfeifer, C

R. Pfeifer, C. Scheier, Understanding Intelligence (MIT Press, Cambridge, MA) (1999)

work page 1999

[10] [10]

Dehaene, H

S. Dehaene, H. Lau, S. Kouider, What is consciousness, an d could machines have it? Science 358 (6362), 486–492 (2017), doi:10.1126/science.aan8871. 16

work page doi:10.1126/science.aan8871 2017

[11] [11]

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al

D. Silver, et al., Mastering the game of Go with deep neural networks and tree s earch. Nature 529 (7587), 484–489 (2016), doi:10.1038/nature16961

work page doi:10.1038/nature16961 2016

[12] [12]

B. M. Lake, T. D. Ullman, J. B. Tenenbaum, S. J. Gershman, B uilding machines that learn and think like people. Behavioral and Brain Sciences 40, e253 (2017), doi:10.1017/ S0140525X16001837

work page 2017

[13] [13]

Rahwan, et al., Machine behaviour

I. Rahwan, et al., Machine behaviour. Nature 568 (7753), 477–486 (2019)

work page 2019

[14] [14]

Large language models for mathematical reasoning: Progresses and challenges.arXiv preprint arXiv:2402.00157, 2024

J. Ahn, et al., Large Language Models for Mathematical Reasoning: Progresses and Challenges (2024), https://arxiv.org/abs/2402.00157

work page arXiv 2024

[15] [15]

Chang, et al., A Survey on Evaluation of Large Language Models (2023),https://arxiv

Y . Chang, et al., A Survey on Evaluation of Large Language Models (2023),https://arxiv. org/abs/2307.03109

work page arXiv 2023

[16] [16]

K. M. Collins, et al., Evaluating Language Models for Mathematics Through Interactions. Pro- ceedings of the National Academy of Sciences of the United St ates of America 121 (24), e2318124121 (2024), doi:10.1073/pnas.2318124121, https://doi.org/10.1073/pnas. 2318124121

work page doi:10.1073/pnas.2318124121 2024

[17] [17]

J. W. A. Strachan, et al., Testing theory of mind in large language models and humans. Nature Human Behaviour 8 (7), 1285–1295 (2024), doi:10.1038/s41562-024-01882-z

work page doi:10.1038/s41562-024-01882-z 2024

[18] [18]

OpenAI, et al., GPT-4 Technical Report (2024), https://arxiv.org/abs/2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2024

[19] [19]

PaLM-E: An Embodied Multimodal Language Model

D. Driess, et al. , PaLM-E: An Embodied Multimodal Language Model (2023), https:// arxiv.org/abs/2303.03378

work page internal anchor Pith review Pith/arXiv arXiv 2023

[20] [20]

S. Wu, H. Fei, L. Qu, W. Ji, T.-S. Chua, NExT-GPT: Any-to-A ny Multimodal LLM (2024), https://arxiv.org/abs/2309.05519

work page arXiv 2024

[21] [21]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

G. Team, et al., Gemini 1.5: Unlocking multimodal understanding across mi llions of tokens of context (2024), https://arxiv.org/abs/2403.05530

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [22]

G. R. Team, et al. , Gemini Robotics: Bringing AI into the Physical World (2025 ), https: //arxiv.org/abs/2503.20020. 17

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

M. Ahn, et al., Do As I Can, Not As I Say: Grounding Language in Robotic Aﬀordances (2022), https://arxiv.org/abs/2204.01691

work page internal anchor Pith review Pith/arXiv arXiv 2022

[24] [24]

Zheng, R

L. Zheng, R. Mei, B. Zou, et al., GMM-searcher: eﬃcient ob ject search in large-scale scenes using large language models. Scientiﬁc Reports 15, 16709 (2025), doi:10.1038/ s41598-025-00788-8

work page 2025

[25] [25]

Mon-Williams, G

R. Mon-Williams, G. Li, R. Long, et al. , Embodied large language models enable robots to complete complex tasks in unpredictable environments. Nature Machine Intelligence 7, 592–601 (2025), doi:10.1038/s42256-025-01005-x

work page doi:10.1038/s42256-025-01005-x 2025

[26] [26]

Zhang, J

C. Zhang, J. Chen, J. Li, Y . Peng, Z. Mao, Large language models for human–robot interaction: A review. Biomimetic Intelligence and Robotics 3 (4), 100131 (2023), doi:https://doi.org/10. 1016/j.birob.2023.100131, https://www.sciencedirect.com/science/article/pii/ S2667379723000451

work page arXiv 2023

[27] [27]

Menon, 20 years of the default mode network: A review an d synthesis

V . Menon, 20 years of the default mode network: A review an d synthesis. Neuron 111 (16), 2469–2484 (2023), doi:10.1016/j.neuron.2023.04.023

work page doi:10.1016/j.neuron.2023.04.023 2023

[28] [28]

M. E. Raichle, et al., A default mode of brain function. Proceedings of the National Academy of Sciences 98 (2), 676–682 (2001), doi:10.1073/pnas.98.2.676

work page doi:10.1073/pnas.98.2.676 2001

[29] [29]

Northoﬀ, et al., Self-referential processing in our brain–a meta-analysi s of imaging studies on the self

G. Northoﬀ, et al., Self-referential processing in our brain–a meta-analysi s of imaging studies on the self. NeuroImage 31 (1), 440–457 (2006), doi:10.1016/j.neuroimage.2005.12. 002

work page doi:10.1016/j.neuroimage.2005.12 2006

[30] [30]

Rochat, Five Levels of Self-Awareness as They Unfold E arly in Life

P . Rochat, Five Levels of Self-Awareness as They Unfold E arly in Life. Consciousness and Cognition 12 (4), 717–731 (2003), doi:10.1016/S1053-8100(03)00081-3

work page doi:10.1016/s1053-8100(03)00081-3 2003

[31] [31]

Gemini: A Family of Highly Capable Multimodal Models

G. Team, et al. , Gemini: A Family of Highly Capable Multimodal Models (2024 ), https: //arxiv.org/abs/2312.11805

work page internal anchor Pith review Pith/arXiv arXiv 2024

[32] [32]

Hercz, W

T. Hercz, W. Liu, Mecabot User Manual (2024), http://www.roboworks.net, version 20240501, Roboworks

work page 2024

[33] [33]

S. M. Mousavi, et al., Gemini and Physical World: Large Language Models Can Estim ate the Intensity of Earthquake Shaking from Multimodal Social Med ia Posts. Geophysical Journal 18 International 240 (2), 1281–1294 (2025), doi:10.1093/gji/ggae436, https://doi.org/10. 1093/gji/ggae436

work page doi:10.1093/gji/ggae436 2025

[34] [34]

Prasad, M

D. Prasad, M. Pimpude, A. Alankar, Towards Development o f Automated Knowledge Maps and Databases for Materials Engineering using Large Langua ge Models (2024), https:// arxiv.org/abs/2402.11323

work page arXiv 2024

[35] [35]

Gemma: Open Models Based on Gemini Research and Technology

G. Team, et al. , Gemma: Open Models Based on Gemini Research and Technology (2024), https://arxiv.org/abs/2403.08295

work page internal anchor Pith review Pith/arXiv arXiv 2024

[36] [36]

Retrieval-Augmented Generation for Large Language Models: A Survey

Y . Gao, et al., Retrieval-Augmented Generation for Large Language Models: A Survey (2024), https://arxiv.org/abs/2312.10997

work page internal anchor Pith review Pith/arXiv arXiv 2024

[37] [37]

Shi, et al., Optimization-based Prompt Injection Attack to LLM-as-a- Judge (2025), https: //arxiv.org/abs/2403.17710

J. Shi, et al., Optimization-based Prompt Injection Attack to LLM-as-a- Judge (2025), https: //arxiv.org/abs/2403.17710

work page arXiv 2025

[38] [38]

Li, et al., Generative Judge for Evaluating Alignment (2023), https://arxiv.org/abs/ 2310.05470

J. Li, et al., Generative Judge for Evaluating Alignment (2023), https://arxiv.org/abs/ 2310.05470

work page arXiv 2023

[39] [39]

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

L. Zheng, et al., Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena (2023), https: //arxiv.org/abs/2306.05685

work page internal anchor Pith review Pith/arXiv arXiv 2023

[40] [40]

L. J. Cronbach, P . E. Meehl, Construct validity in psycho logical tests. Psychological Bulletin 52 (4), 281–302 (1955), doi:10.1037/h0040957

work page doi:10.1037/h0040957 1955

[41] [41]

M. R. Longo, F. Sch¨ u¨ ur, M. P . Kammers, M. Tsakiris, P . Ha ggard, What is em- bodiment? A psychometric approach. Cognition 107 (3), 978–998 (2008), doi:https: //doi.org/10.1016/j.cognition.2007.12.004, https://www.sciencedirect.com/science/ article/pii/S0010027708000061

work page doi:10.1016/j.cognition.2007.12.004 2008

[42] [42]

M. Gao, X. Hu, J. Ruan, X. Pu, X. Wan, LLM-based NLG Evaluat ion: Current Status and Challenges (2025), https://arxiv.org/abs/2402.01383

work page arXiv 2025

[43] [43]

Kim, et al., Prometheus: Inducing Fine-grained Evaluation Capabilit y in Language Models (2024), https://arxiv.org/abs/2310.08491

S. Kim, et al., Prometheus: Inducing Fine-grained Evaluation Capabilit y in Language Models (2024), https://arxiv.org/abs/2310.08491. 19

work page arXiv 2024

[44] [44]

E. Goh, R. Gallo, J. Hom, et al. , Large Language Model Inﬂuence on Diagnostic Rea- soning: A Randomized Clinical Trial. JAMA Network Open 7 (10), e2440969 (2024), doi:10.1001/jamanetworkopen.2024.40969, https://jamanetwork.com/article.aspx? doi=10.1001/jamanetworkopen.2024.40969

work page doi:10.1001/jamanetworkopen.2024.40969 2024

[45] [45]

Giannakopoulos, A

K. Giannakopoulos, A. Kavadella, A. A. Salim, V . Stamatopoulos, E. Kaklamanos, Evaluation of the Performance of Generative AI Large Language Models Ch atGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistr y: Comparative Mixed Methods Study. Journal of Medical Internet Research 25, e51580 (2023), doi:10.2196/51580, https: //www...

work page doi:10.2196/51580 2023

[46] [46]

M. W.-L. Cheung, Meta-Analysis: A Structural Equation Modeling Approach(Wiley, Hoboken, NJ) (2015), doi:10.1002/9781118957813

work page doi:10.1002/9781118957813 2015

[47] [47]

Raykov, G

T. Raykov, G. A. Marcoulides, A First Course in Structural Equation Modeling (Routledge), 2nd ed. (2006), doi:10.4324/9780203930687

work page doi:10.4324/9780203930687 2006

[48] [48]

L. B. Merabet, et al., Rapid and reversible recruitment of early visual cortex fo r touch. PLoS One 3 (8), e3046 (2008), doi:10.1371/journal.pone.0003046

work page doi:10.1371/journal.pone.0003046 2008

[49] [49]

A. J. King, Crossmodal plasticity and hearing capabilit ies following blindness. Cell Tissue Res. 361 (1), 295–300 (2015), doi:10.1007/s00441-015-2175-y

work page doi:10.1007/s00441-015-2175-y 2015

[50] [50]

sensors” become “sources of information

S. G. Lomber, M. A. Meredith, A. Kral, Cross-modal plasti city in speciﬁc auditory cortices underlies visual compensations in the deaf. Nature Neuroscience 13, 1421–1427 (2010), doi: 10.1038/nn.2653. Acknowledgments The authors would like to thank Rafael Sendra-Arranz and ´Alvaro Guti´errez for their discussions and technical input during the development ...

work page doi:10.1038/nn.2653 2010