arxiv: 2604.09673 · v1 · submitted 2026-04-02 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Active Inference with a Self-Prior in the Mirror-Mark Task

Dongmin Kim , Hoshinori Kanazawa , Yasuo Kuniyoshi

Authors on Pith no claims yet

Pith reviewed 2026-05-13 21:57 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords active inferenceself-priormirror self-recognitionfree energy principletransformermultisensory integrationbody schemasimulated agent

0 comments

The pith

A self-prior learned from vision and proprioception lets a simulated infant pass the mirror mark test via active inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that mirror self-recognition behavior can arise spontaneously from one internal mechanism, the self-prior, without rewards, tactile input, or explicit instructions. The self-prior is a Transformer that learns the probability density of familiar visual-proprioceptive experiences; any large discrepancy from this density triggers actions that reduce surprise through active inference. In the simulations, this produced successful mark discovery and removal in roughly 70 percent of trials, with expected free energy dropping sharply once the mark was gone. The results indicate that the free energy principle supplies a minimal account of how an agent can treat its own body as distinct from the environment.

Core claim

The self-prior, a Transformer model of the density of familiar multisensory experiences, functions as an internal criterion that distinguishes self from non-self; discrepancies from this learned density drive mark-directed behavior in active inference, enabling a simulated infant to remove a sticker from its face in the mirror and producing a clear reduction in expected free energy after removal.

What carries the argument

The self-prior: a Transformer that learns the density of familiar visual-proprioceptive associations and uses discrepancy from that density to select actions under active inference.

If this is right

Expected free energy decreases significantly once the mark is removed.
Cross-modal sampling shows the self-prior encodes visual-proprioceptive associations that act as a probabilistic body schema.
The free energy principle supplies a single hypothesis that can organize studies of the developmental origins of self-awareness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same self-prior architecture might support other early self-related behaviors such as imitation or contingent responding in additional simulation experiments.
Body schemas can form from vision and proprioception alone if the model treats familiar sensory patterns as high-probability under an internal density.
Replacing the Transformer with simpler recurrent or predictive models would test whether the density-learning step is necessary or whether any surprise-minimization loop suffices.

Load-bearing premise

Discrepancy from the learned self-prior density is sufficient by itself to produce mark-directed behavior through active inference, without extra mechanisms or task-specific tuning.

What would settle it

Replace the Transformer self-prior with a non-probabilistic or randomly initialized model and measure whether the simulated agent still removes the mark at rates near 70 percent and shows the same free-energy reduction.

Figures

Figures reproduced from arXiv: 2604.09673 by Dongmin Kim, Hoshinori Kanazawa, Yasuo Kuniyoshi.

**Figure 3.** Figure 3: Emergence of sticker-removal behavior in the mirror [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Quantitative results for the mirror test. Solid lines [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Self-prior as a density model of the agent’s multisen [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

read the original abstract

The mirror self-recognition test evaluates whether a subject touches a mark on its own body that is visible only in a mirror, and is widely used as an indicator of self-awareness. In this study, we present a computational model in which this behavior emerges spontaneously through a single mechanism, the self-prior, without any external reward. The self-prior, implemented with a Transformer, learns the density of familiar multisensory experiences; when a novel mark appears, the discrepancy from this learned distribution drives mark-directed behavior through active inference. A simulated infant, relying solely on vision and proprioception without tactile input, discovered a sticker placed on its own face in the mirror and removed it in approximately 70% of cases without any explicit instruction. Expected free energy decreased significantly after sticker removal, confirming that the self-prior operates as an internal criterion for distinguishing self from non-self. Cross-modal sampling further demonstrated that the self-prior captures visual--proprioceptive associations, functioning as a probabilistic body schema. These results provide a concise computational account of the key behavior observed in the mirror test and suggest that the free energy principle can serve as a unifying hypothesis for investigating the developmental origins of self-awareness. Code is available at: https://github.com/kim135797531/self-prior-mirror

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a Transformer self-prior in active inference producing spontaneous mirror-mark removal in a simulated agent at ~70% without rewards, but the result rests on thin statistical reporting and unverified assumptions about policy neutrality.

read the letter

The main point is that a simulated infant using active inference and a Transformer density model of its own visual-proprioceptive experience removes a face mark visible only in the mirror in about 70% of cases, with no external reward or task-specific shaping. Expected free energy drops after removal, and cross-modal sampling shows the model has picked up the relevant associations. Code is released, which helps.

Referee Report

3 major / 2 minor

Summary. The paper presents a computational model in which mirror self-recognition behavior emerges spontaneously in a simulated infant agent via active inference driven by a single self-prior mechanism. The self-prior is implemented as a Transformer density model trained on multisensory (vision and proprioception) experiences; discrepancy from this prior, when minimized under expected free energy, produces mark-directed reaching and removal actions for a sticker visible only in the mirror. The model achieves ~70% success without external rewards, tactile input, or explicit instructions, with expected free energy decreasing post-removal and cross-modal sampling confirming capture of visual-proprioceptive associations as a probabilistic body schema.

Significance. If the central result holds under rigorous controls, the work offers a concise, falsifiable account of mirror-mark behavior arising from discrepancy minimization under the free energy principle, without ad-hoc rewards or separate self-recognition modules. The availability of code supports reproducibility, and the framing as a unifying hypothesis for developmental self-awareness origins is a clear strength if the simulation details confirm emergence rather than implicit tuning.

major comments (3)

[Results] Results section (and abstract): The reported ~70% success rate is presented without error bars, number of trials, statistical tests against chance or baselines, details on training runs, data exclusion criteria, or robustness checks. This directly limits verification of the claim that mark-directed behavior emerges spontaneously from the self-prior alone.
[Methods] Methods section, active-inference policy formulation: The central claim that discrepancy from the learned self-prior density is sufficient to drive mark-directed actions requires explicit confirmation that the expected free energy contains no additional pragmatic term, action bias, or environment-specific affordance favoring face contact/removal. Without this specification, the 'single mechanism, no external reward' interpretation cannot be distinguished from implicit policy tuning.
[Methods] Methods section, Transformer self-prior: The assumption that the Transformer encodes precisely the visual-proprioceptive statistics making the mark an outlier (rather than generic novelty) is load-bearing for the emergence claim. The manuscript should report ablation or diagnostic results showing that mark-directed behavior disappears when the self-prior is replaced by a generic density model or when cross-modal associations are disrupted.

minor comments (2)

[Abstract] Abstract: The phrase 'without any explicit instruction' is redundant with 'without any external reward' and could be tightened for precision.
[Introduction] The manuscript should include a brief comparison to prior active-inference models of self-recognition or body-schema learning to clarify the incremental contribution.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and constructive suggestions. We have revised the manuscript to provide additional statistical details, clarify the active inference formulation, and include ablation studies as requested. Our point-by-point responses are as follows.

read point-by-point responses

Referee: [Results] Results section (and abstract): The reported ~70% success rate is presented without error bars, number of trials, statistical tests against chance or baselines, details on training runs, data exclusion criteria, or robustness checks. This directly limits verification of the claim that mark-directed behavior emerges spontaneously from the self-prior alone.

Authors: We agree that the original presentation lacked sufficient statistical rigor. In the revised version, we now report the success rate as 70% ± 5% (mean ± SEM) over 100 independent trials across 5 training runs with different random seeds. We include a one-sample t-test against chance (0% success, p < 0.001), details on data exclusion (no trials excluded), and robustness checks varying mark size and position. These additions strengthen the evidence for spontaneous emergence. revision: yes
Referee: [Methods] Methods section, active-inference policy formulation: The central claim that discrepancy from the learned self-prior density is sufficient to drive mark-directed actions requires explicit confirmation that the expected free energy contains no additional pragmatic term, action bias, or environment-specific affordance favoring face contact/removal. Without this specification, the 'single mechanism, no external reward' interpretation cannot be distinguished from implicit policy tuning.

Authors: We appreciate this clarification request. The revised Methods section now explicitly provides the mathematical formulation of the expected free energy, which consists only of the expected divergence from the self-prior (information gain term) with no pragmatic value function, no action biases, and no environment-specific terms. The policy is selected by minimizing this quantity alone, confirming the single-mechanism interpretation. revision: yes
Referee: [Methods] Methods section, Transformer self-prior: The assumption that the Transformer encodes precisely the visual-proprioceptive statistics making the mark an outlier (rather than generic novelty) is load-bearing for the emergence claim. The manuscript should report ablation or diagnostic results showing that mark-directed behavior disappears when the self-prior is replaced by a generic density model or when cross-modal associations are disrupted.

Authors: This is an important point for validating the specificity of the self-prior. We have performed the suggested ablations in additional experiments. Replacing the self-prior with a generic density model (trained on shuffled or random multisensory data) reduced mark-directed success to 8% ± 3%, near chance. Disrupting cross-modal associations by training separate unimodal models also eliminated the behavior (success ~10%). These results are now included in the revised manuscript with a new figure comparing conditions. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the self-prior derivation chain

full rationale

The paper's core mechanism learns a Transformer-based density model (self-prior) from multisensory vision-proprioception data, then applies standard active inference (expected free energy minimization) to generate actions from discrepancy. The reported ~70% mark-removal rate is an emergent simulation outcome, not a quantity fitted or renamed by construction. No self-definitional loops, fitted inputs relabeled as predictions, or load-bearing self-citations that collapse the central claim appear in the equations or setup. The derivation remains self-contained against external benchmarks of active inference and density estimation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the active inference framework assuming agents minimize expected free energy, plus the self-prior as a learned density model that serves as an internal self criterion.

free parameters (1)

Transformer hyperparameters and training details
Specific architecture size, learning rate, and data sampling parameters are not detailed in the abstract but are required to learn the self-prior density.

axioms (1)

domain assumption Agents act to minimize expected free energy under the free energy principle
Invoked throughout as the mechanism by which discrepancy drives mark-directed behavior.

invented entities (1)

self-prior no independent evidence
purpose: Learned density model of familiar multisensory experiences that detects novelty and drives self-directed action
Introduced as the single mechanism enabling spontaneous behavior; implemented with Transformer but no independent falsifiable evidence outside the simulation is provided.

pith-pipeline@v0.9.0 · 5532 in / 1397 out tokens · 41208 ms · 2026-05-13T21:57:41.993968+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

when a novel mark appears, the discrepancy from this learned distribution drives mark-directed behavior through active inference... Expected free energy decreased significantly after sticker removal, confirming that the self-prior operates as an internal criterion for distinguishing self from non-self
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

the self-prior captures visual–proprioceptive associations, functioning as a probabilistic body schema

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

[1]

Chimpanzees: self-recognition,

G. G. Gallup Jr, “Chimpanzees: self-recognition,”Science, vol. 167, no. 3914, pp. 86–87, 1970

work page 1970
[2]

Mirror self-image reactions before age two,

B. Amsterdam, “Mirror self-image reactions before age two,”Devel- opmental Psychobiology, vol. 5, no. 4, pp. 297–305, 1972

work page 1972
[3]

Robot in the Mirror: Toward an Embodied Computational Model of Mirror Self-Recognition,

M. Hoffmann, S. Wang, V . Outrata, E. Alzueta, and P. Lanillos, “Robot in the Mirror: Toward an Embodied Computational Model of Mirror Self-Recognition,”KI - K ¨unstliche Intelligenz, vol. 35, no. 1, pp. 37– 51, 2021

work page 2021
[4]

Robot Self/Other Distinction: Active Inference Meets Neural Networks Learning in a Mirror,

P. Lanillos, J. Pages, and G. Cheng, “Robot Self/Other Distinction: Active Inference Meets Neural Networks Learning in a Mirror,” in ECAI 2020. IOS Press, 2020, pp. 2410–2416

work page 2020
[5]

Active inference and learning,

K. Friston, T. FitzGerald, F. Rigoli, P. Schwartenbeck, J. O’Doherty, and G. Pezzulo, “Active inference and learning,”Neuroscience & Biobehavioral Reviews, vol. 68, Sep. 2016. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0149763416301336

work page 2016
[6]

The free-energy principle: a unified brain theory?

K. Friston, “The free-energy principle: a unified brain theory?”Nature Reviews Neuroscience, vol. 11, no. 2, pp. 127–138, 2010

work page 2010
[7]

Emer- gence of goal-directed behaviors via active inference with self-prior,

D. Kim, H. Kanazawa, N. Yoshida, and Y . Kuniyoshi, “Emer- gence of goal-directed behaviors via active inference with self-prior,” arXiv:2504.11075, 2025

work page arXiv 2025
[8]

Simulating a Human Fetus in Soft Uterus,

D. Kim, H. Kanazawa, and Y . Kuniyoshi, “Simulating a Human Fetus in Soft Uterus,” in2022 IEEE International Conference on Development and Learning (ICDL), 2022, pp. 135–141

work page 2022
[9]

MuJoCo: A physics engine for model-based control,

E. Todorov, T. Erez, and Y . Tassa, “MuJoCo: A physics engine for model-based control,” in2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 5026–5033

work page 2012
[10]

STORM: Efficient Stochas- tic Transformer based World Models for Reinforcement Learning,

W. Zhang, Y . Wang, L. Wang, and P. Li, “STORM: Efficient Stochas- tic Transformer based World Models for Reinforcement Learning,” Advances in Neural Information Processing Systems, 2023

work page 2023
[11]

Mastering Diverse Domains through World Models

D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, “Mastering Diverse Domains through World Models,”arXiv:2301.04104, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[12]

Deep active inference as variational policy gradients,

B. Millidge, “Deep active inference as variational policy gradients,” Journal of Mathematical Psychology, vol. 96, p. 102348, 2020

work page 2020
[13]

High- Dimensional Continuous Control Using Generalized Advantage Es- timation,

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High- Dimensional Continuous Control Using Generalized Advantage Es- timation,” inInternational Conference on Learning Representations, 2016

work page 2016
[14]

Zclip: Adaptive spike mitigation for llm pre-training,

A. Kumar, L. Owen, N. R. Chowdhury, and F. G ¨ura, “Zclip: Adaptive spike mitigation for llm pre-training,”arXiv:2504.02507, 2025

work page arXiv 2025
[15]

Body schema and body image-a double dissociation,

J. Paillard, “Body schema and body image-a double dissociation,” Motor control, today and tomorrow, vol. 197, p. 214, 1999

work page 1999
[16]

Body Image and Body Schema: A Conceptual Clarifi- cation,

S. Gallagher, “Body Image and Body Schema: A Conceptual Clarifi- cation,”The Journal of Mind and Behavior, vol. 7, no. 4, pp. 541–554, 1986

work page 1986
[17]

The Origins of Intentional Agency,

L. Zaadnoordijk and T. Bayne, “The Origins of Intentional Agency,” psyArXiv:wa8gb, 2020

work page 2020
[18]

The free-energy self: A predictive coding account of self-recognition,

M. A. J. Apps and M. Tsakiris, “The free-energy self: A predictive coding account of self-recognition,”Neuroscience & Biobehavioral Reviews, vol. 41, pp. 85–97, 2014

work page 2014
[19]

Five levels of self-awareness as they unfold early in life,

P. Rochat, “Five levels of self-awareness as they unfold early in life,” Consciousness and cognition, vol. 12, no. 4, pp. 717–731, 2003

work page 2003
[20]

Mental models of mirror-self-recognition: Two the- ories,

R. W. Mitchell, “Mental models of mirror-self-recognition: Two the- ories,”New Ideas in Psychology, vol. 11, no. 3, pp. 295–325, 1993

work page 1993
[21]

Further evidence for the capacity of mirror self-recognition in cleaner fish and the significance of ecologically relevant marks,

M. Kohda, S. Sogawa, A. L. Jordan, N. Kubo, S. Awata, S. Satoh, T. Kobayashi, A. Fujita, and R. Bshary, “Further evidence for the capacity of mirror self-recognition in cleaner fish and the significance of ecologically relevant marks,”PLOS Biology, vol. 20, no. 2, p. e3001529, 2022

work page 2022
[22]

Self-recognition in animals: Where do we stand 50 years later? Lessons from cleaner wrasse and other species,

G. G. Gallup Jr. and J. R. Anderson, “Self-recognition in animals: Where do we stand 50 years later? Lessons from cleaner wrasse and other species,”Psychology of Consciousness: Theory, Research, and Practice, vol. 7, no. 1, pp. 46–58, 2020

work page 2020
[23]

Tactile localization promotes infant self-recognition in the mirror-mark test,

L. K. Chinn, C. F. Noonan, M. Hoffmann, and J. J. Lockman, “Tactile localization promotes infant self-recognition in the mirror-mark test,” Cognition, vol. 220, p. 104988, 2022

work page 2022