Developmental Trajectories of Situation Modeling and Mentalizing in Transformer Language Models

Cameron Jones; Pamela D. Rivi\`ere; Sean Trott

arxiv: 2606.28524 · v1 · pith:CC7ZJT3Onew · submitted 2026-06-26 · 💻 cs.CL

Developmental Trajectories of Situation Modeling and Mentalizing in Transformer Language Models

Pamela D. Rivi\`ere , Cameron Jones , Sean Trott This is my paper

Pith reviewed 2026-06-30 01:29 UTC · model grok-4.3

classification 💻 cs.CL

keywords language modelsfalse belief taskmentalizingsituation modelingpretraining trajectoriestransformer modelsdevelopmental stages

0 comments

The pith

Larger language models acquire situation modeling before false-belief reasoning, but both emerge late in pretraining and remain fragile.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tracks false belief task performance and situation modeling accuracy across training stages in the Olmo2 and Pythia model families. Above-chance mentalizing appears only after sufficient model size and training volume, shows up late in pretraining, and improves most from supervised fine-tuning or direct preference optimization on the implicit false belief condition. Situation modeling reaches higher accuracy earlier yet produces inconsistent reports about an antagonist agent's knowledge state when non-factive verbs are present. These patterns indicate models assemble partially coherent scene representations in a developmental order before mentalizing appears, while remaining sensitive to surface cues.

Core claim

Above-chance false belief task performance depends on both model size and sufficient training volume, emerges relatively late in pretraining, and improves most from post-training interventions in the implicit false belief condition; situation modeling accuracy precedes and exceeds false belief task accuracy yet situational representations remain incoherent because models are influenced by the target agent's knowledge state and non-factive verbs even when reporting on the antagonist who always knows the true location.

What carries the argument

Developmental trajectories of false belief task accuracy and situation modeling probes tracked across pretraining checkpoints and post-training stages in Olmo2 and Pythia suites.

If this is right

False belief task performance scales with model size and cumulative training tokens.
Situation modeling reaches usable accuracy before false belief task performance does.
Supervised fine-tuning and direct preference optimization produce the largest gains on implicit false belief items.
Non-factive verbs increase false belief attributions even in true belief conditions.
Situation representations stay partially incoherent when tracking multiple agents' knowledge states.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The observed sequence suggests situation modeling may serve as a prerequisite for later mentalizing.
Fragility to verb choice could limit reliable use of these models in tasks requiring stable belief tracking.
Targeted post-training on varied verb and agent conditions might reduce incoherence in situation reports.
Replicating the trajectories on additional model families would test whether the size-and-volume dependence generalizes.

Load-bearing premise

The false belief task and situation modeling probes measure internal mentalizing and scene representations rather than surface statistical patterns in the training text.

What would settle it

An experiment showing that models achieve above-chance false belief task scores solely through sensitivity to non-factive verbs or target-agent phrasing, without corresponding improvement when those cues are removed.

Figures

Figures reproduced from arXiv: 2606.28524 by Cameron Jones, Pamela D. Rivi\`ere, Sean Trott.

**Figure 3.** Figure 3: Coefficients of a linear mixed effects model [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Olmo2 13b post-training interventions improve performance under stimulus properties that are typically used to index mentalizing (False Belief, Implicit), but remains relatively uniform for all others, including low-accuracy ones that should only reflect the ability to report on factual properties of a scene (True Belief, Explicit). Subpanels depict Olmo2 13b FB task accuracy across various training stag… view at source ↗

**Figure 7.** Figure 7: Olmo 2 13b situation model accuracy generally increases over the course of pretraining, its improvements precede and generally exceed False Belief task accuracies, but biases also emerge, particularly when the model is asked to track agents. Performance trajectories are colored according to Task Type, with False Belief accuracies in teal, and Situation Model accuracies in grey. Dashed horizontal line indic… view at source ↗

**Figure 8.** Figure 8: Companion to Main Text [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Companion to [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Companion to Main Text [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Control task (Antagonist Belief Tracking) accuracies for all LMs tested, for the Olmo [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

read the original abstract

Recent work suggests that Large Language Models (LLMs) are sensitive to the belief states of agents described by text, as measured by the false belief task (FBT), yet persistent concerns of construct validity remain. We adopt a **developmental perspective**, tracing the pattern of mental state reasoning behavior -- and likely **preconditions** for this behavior -- across multiple training stages in the Olmo2 and Pythia language model suites. We find that above-chance FBT performance depends both on model size and sufficient training volume, emerges relatively late in pretraining, and is most improved by post-training interventions (SFT, DPO) in the condition most diagnostic of mentalizing (False Belief, Implicit). However, FBT performance is fragile: consistent with past work, the use of non-factive verbs (e.g., thinks) increases false belief attributions even in the True Belief condition. To contextualize these findings, we track the emergence of **situation modeling**: the ability to report on basic factual properties of a described scene. Situation modeling accuracy generally precedes and exceeds FBT accuracy, yet situational representations also prove surprisingly incoherent in certain respects: when asked about the knowledge states of the Antagonist agent -- who always knows the item's true location -- Olmo2 13b is consistently influenced both by the Target agent's knowledge state and the presence of non-factive verbs. Together, these results suggest that larger, sufficiently trained models build partially coherent situation models in a developmentally appropriate sequence, yet display surprising fragility -- highlighting the value of developmental and stress-testing approaches for evaluating LLM capabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper tracks FBT and situation modeling emergence across checkpoints in two model families and flags some fragility, but the tasks may mostly reflect surface cues rather than internal mentalizing.

read the letter

Two things stand out right away. The work follows false-belief task performance and basic situation modeling through multiple pretraining checkpoints in Olmo2 and Pythia, plus post-training stages. Above-chance FBT appears late, scales with size and data volume, and improves most from SFT and DPO in the implicit false-belief condition. Situation modeling shows up earlier but still looks incoherent when the model is asked about the antagonist agent's knowledge.

The multi-checkpoint design across two families is the clearest addition. It moves past single-snapshot FBT studies and gives a timeline for when these behaviors appear and what training steps affect them. The stress tests on non-factive verbs and antagonist knowledge are also straightforward and useful for showing where the measures break.

The main weakness is construct validity, and the abstract already flags it. Non-factive verbs boost false-belief attributions even in true-belief conditions, and the larger Olmo2 model mixes target-agent state and verb choice into its antagonist reports. If the probes are mostly picking up local lexical patterns instead of structured situation models or belief tracking, then the size, volume, and post-training claims lose their interpretation as evidence of mentalizing. The absence of reported accuracies, error bars, or confound controls makes it hard to judge how reliable the directional patterns are.

This is for readers in model interpretability and cognitive modeling who care about training dynamics. It gives them a developmental map to build on, even if the mentalizing framing needs more scrutiny.

Send it to peer review. The checkpoint tracking is a step beyond prior work, and referees can press on the validity issues and demand tighter reporting.

Referee Report

2 major / 1 minor

Summary. The paper claims that in the Olmo2 and Pythia model suites, above-chance false-belief task (FBT) performance for mentalizing depends on both model size and sufficient pretraining volume, emerges relatively late during pretraining, and is most improved by post-training interventions (SFT, DPO) in the implicit false-belief condition. Situation-modeling accuracy generally precedes and exceeds FBT accuracy, yet situational representations remain incoherent in specific respects (e.g., antagonist knowledge reports are influenced by target-agent state and non-factive verbs). The work adopts a developmental perspective to trace these abilities and their preconditions while explicitly noting construct-validity concerns with the probes.

Significance. If the empirical patterns hold after addressing the noted limitations, the results would contribute to understanding the training dynamics and developmental sequence of higher-order reasoning capabilities in LLMs, underscoring the value of checkpoint-level analysis, the role of scale and alignment methods, and the need for stress-testing. The multi-model, multi-stage design and the paper's own reporting of fragilities (non-factive verb effects, knowledge leakage) are strengths that could inform future evaluation methodologies.

major comments (2)

[Abstract] Abstract: the abstract reports directional findings on the dependence of FBT performance on model size, training volume, and post-training interventions but supplies no statistical details, exact accuracies, error bars, or controls for confounds, limiting assessment of whether the data support the stated claims about emergence and improvement.
[Abstract] Abstract: the central claim that larger, sufficiently trained models build 'partially coherent situation models' and mentalizing abilities rests on FBT and situation probes reflecting internal representations rather than surface lexical associations; however, the abstract itself documents that non-factive verbs increase false-belief attributions even in True Belief conditions and that antagonist knowledge reports are contaminated by target-agent state and verb choice, which directly undermines interpretability of the reported size/training-volume dependence, late emergence, and SFT/DPO gains as evidence of mentalizing.

minor comments (1)

[Abstract] Abstract: the specific reference to 'Olmo2 13b' for the antagonist-influence finding should clarify whether this pattern holds across the full range of model sizes examined or is size-specific.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address each major comment below and indicate planned revisions to strengthen the presentation of results and qualifications.

read point-by-point responses

Referee: [Abstract] Abstract: the abstract reports directional findings on the dependence of FBT performance on model size, training volume, and post-training interventions but supplies no statistical details, exact accuracies, error bars, or controls for confounds, limiting assessment of whether the data support the stated claims about emergence and improvement.

Authors: We agree that the abstract would be improved by including select quantitative details. In revision we will incorporate key accuracy figures (e.g., peak FBT performance ranges by model size), a brief note on statistical thresholds used, and reference to the confound controls and error-bar reporting already present in the main results and appendix. Abstract length constraints will limit the number of numbers, but the additions will directly address the concern. revision: yes
Referee: [Abstract] Abstract: the central claim that larger, sufficiently trained models build 'partially coherent situation models' and mentalizing abilities rests on FBT and situation probes reflecting internal representations rather than surface lexical associations; however, the abstract itself documents that non-factive verbs increase false-belief attributions even in True Belief conditions and that antagonist knowledge reports are contaminated by target-agent state and verb choice, which directly undermines interpretability of the reported size/training-volume dependence, late emergence, and SFT/DPO gains as evidence of mentalizing.

Authors: The abstract already foregrounds these fragilities precisely to signal the construct-validity limits of the probes, and the core claims are qualified with the phrases 'partially coherent' and 'surprising fragility.' The reported dependencies on scale, pretraining volume, and post-training are measured across multiple conditions and persist after the documented verb-type and knowledge-leakage effects are accounted for in the full analyses. We will revise the abstract to add an explicit clause stating that the emergence patterns hold after controlling for these factors, thereby clarifying that the limitations do not negate the observed developmental trajectories. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical measurement study

full rationale

The paper reports direct experimental results on FBT accuracy and situation-modeling probes across model sizes, training checkpoints, and post-training interventions in Olmo2 and Pythia suites. No derivations, equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described methods. Performance is measured against external task benchmarks with explicit construct-validity caveats noted by the authors themselves. The central claims (late emergence, size/training dependence, SFT/DPO gains) rest on observed accuracies rather than any reduction to prior definitions or self-referential fits.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is purely empirical and relies on standard assumptions from cognitive psychology about what the false-belief task measures; no free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

domain assumption The false belief task is a valid probe of mentalizing in language models
Central to interpreting FBT results as evidence of mentalizing despite noted construct-validity concerns.

pith-pipeline@v0.9.1-grok · 5818 in / 1226 out tokens · 30028 ms · 2026-06-30T01:29:58.087671+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

94 extracted references · 57 canonical work pages · 3 internal anchors

[1]

Minds and Machines , author =

Explanation as. Minds and Machines , author =. 1998 , keywords =. doi:10.1023/A:1008290415597 , language =

work page doi:10.1023/a:1008290415597 1998
[2]

Perspectives on Psychological Science , author =

Repositioning. Perspectives on Psychological Science , author =. 2025 , note =. doi:10.1177/17456916231195852 , abstract =

work page doi:10.1177/17456916231195852 2025
[3]

Inventing

Chang, Hasok , month = aug, year =. Inventing
[4]

Nature Human Behaviour , author =

How to evaluate the cognitive abilities of. Nature Human Behaviour , author =. 2025 , pmid =. doi:10.1038/s41562-024-02096-z , language =

work page doi:10.1038/s41562-024-02096-z 2025
[5]

Trends in Cognitive Sciences , author =

Identifying indicators of consciousness in. Trends in Cognitive Sciences , author =. 2025 , pmid =. doi:10.1016/j.tics.2025.10.011 , language =

work page doi:10.1016/j.tics.2025.10.011 2025
[6]

Philosophy of the Social Sciences , author =

Valid for. Philosophy of the Social Sciences , author =. 2021 , note =. doi:10.1177/0048393120971169 , abstract =

work page doi:10.1177/0048393120971169 2021
[7]

Finding Interpretable Prompt-Specific Circuits in Language Models

Franco, Gabriel and Tassis, Lucas M. and Rohr, Azalea and Crovella, Mark , month = feb, year =. Finding. doi:10.48550/arXiv.2602.13483 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.13483
[8]

The Journal of Philosophy , author =

Real. The Journal of Philosophy , author =. 1991 , note =. doi:10.2307/2027085 , number =

work page doi:10.2307/2027085 1991
[9]

Psychological Review , author =

The. Psychological Review , author =. 2004 , keywords =. doi:10.1037/0033-295X.111.4.1061 , language =

work page doi:10.1037/0033-295x.111.4.1061 2004
[10]

Benchmarks as

Saxon, Michael and Holtzman, Ari and West, Peter and Wang, William Yang and Saphra, Naomi , month = jul, year =. Benchmarks as. doi:10.48550/arXiv.2407.16711 , abstract =

work page doi:10.48550/arxiv.2407.16711
[11]

Psychological Bulletin , author =

Construct validity in psychological tests , volume =. Psychological Bulletin , author =. 1955 , note =. doi:10.1037/h0040957 , abstract =

work page doi:10.1037/h0040957 1955
[12]

Philosophy of Science , author =

A. Philosophy of Science , author =. 2019 , keywords =. doi:10.1086/705567 , abstract =

work page doi:10.1086/705567 2019
[13]

Philosophy of Science , author =

Is. Philosophy of Science , author =. 2016 , keywords =. doi:10.1086/687941 , abstract =

work page doi:10.1086/687941 2016
[14]

Auxiliary task demands mask the capabilities of smaller language models , url =

Hu, Jennifer and Frank, Michael , month = aug, year =. Auxiliary task demands mask the capabilities of smaller language models , url =
[15]

Nature Human Behaviour , author =

Testing theory of mind in large language models and humans , volume =. Nature Human Behaviour , author =. 2024 , note =. doi:10.1038/s41562-024-01882-z , abstract =

work page doi:10.1038/s41562-024-01882-z 2024
[16]

Advances in Methods and Practices in Psychological Science , author =

Measurement. Advances in Methods and Practices in Psychological Science , author =. 2020 , note =. doi:10.1177/2515245920952393 , abstract =

work page doi:10.1177/2515245920952393 2020
[17]

Current Directions in Psychological Science , author =

Credibility. Current Directions in Psychological Science , author =. 2022 , note =. doi:10.1177/09637214211067779 , abstract =

work page doi:10.1177/09637214211067779 2022
[18]

arXiv preprint arXiv:2111.15366 , year=

Raji, Inioluwa Deborah and Bender, Emily M. and Paullada, Amandalynne and Denton, Emily and Hanna, Alex , month = nov, year =. doi:10.48550/arXiv.2111.15366 , abstract =

work page doi:10.48550/arxiv.2111.15366
[19]

Bean, Andrew M. and Kearns, Ryan Othniel and Romanou, Angelika and Hafner, Franziska Sofia and Mayne, Harry and Batzner, Jan and Foroutan, Negar and Schmitz, Chris and Korgul, Karolina and Batra, Hunar and Deb, Oishi and Beharry, Emma and Emde, Cornelius and Foster, Thomas and Gausen, Anna and Grandury, María and Han, Simeng and Hofmann, Valentin and Ibra...

work page doi:10.48550/arxiv.2511.04703
[20]

The British Journal for the Philosophy of Science , author =

Model. The British Journal for the Philosophy of Science , author =. 2015 , pages =. doi:10.1093/bjps/axt055 , abstract =

work page doi:10.1093/bjps/axt055 2015
[21]

NeuroImage , author =

Six problems for causal inference from. NeuroImage , author =. 2010 , pages =. doi:10.1016/j.neuroimage.2009.08.065 , abstract =

work page doi:10.1016/j.neuroimage.2009.08.065 2010
[22]

Transactions of the Association for Computational Linguistics , author =

Are. Transactions of the Association for Computational Linguistics , author =. 2024 , pages =. doi:10.1162/tacl_a_00690 , abstract =

work page doi:10.1162/tacl_a_00690 2024
[23]

Goldstein, Simon and Lederman, Harvey , file =. What
[24]

Convergence and

Fehlauer, Finlay and Mahowald, Kyle and Pimentel, Tiago , editor =. Convergence and. Proceedings of the 2025. 2025 , pages =. doi:10.18653/v1/2025.emnlp-main.1675 , abstract =

work page doi:10.18653/v1/2025.emnlp-main.1675 2025
[25]

arXiv.org , author =

Predicting the. arXiv.org , author =. 2025 , file =

2025
[26]

Tigges, Curt and Hanna, Michael and Biderman, Stella and Yu, Qinan , keywords =
[27]

2024 , pages =

Advances in Neural Information Processing Systems , author =. 2024 , pages =. doi:10.52202/079017-1287 , language =

work page doi:10.52202/079017-1287 2024
[28]

Tsvilodub, Polina and Klumpp, Jan-Felix and Mohammadpour, Amir and Hu, Jennifer and Franke, Michael , month = feb, year =. On. doi:10.48550/arXiv.2602.10298 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.10298
[29]

2025 , pmid =

Trends in Cognitive Sciences , author =. 2025 , pmid =. doi:10.1016/j.tics.2025.09.012 , language =

work page doi:10.1016/j.tics.2025.09.012 2025
[30]

ArXiv , author =

Measuring and. ArXiv , author =. 2025 , pmid =

2025
[31]

, file =

Nefdt, Ryan M. , file =. What it's like to be an
[32]

and McGraw, A

Bauman, Christopher W. and McGraw, A. Peter and Bartels, Daniel M. and Warren, Caleb , file =. Revisiting. doi:10.1111/spc3.12131 , abstract =

work page doi:10.1111/spc3.12131
[33]

bioRxiv , author =

Human-like individual differences emerge from random weight initializations in neural networks , issn =. bioRxiv , author =. 2025 , pmid =. doi:10.1101/2025.10.25.684448 , abstract =

work page doi:10.1101/2025.10.25.684448 2025
[34]

Transactions on Machine Learning Research , author =

Latent. Transactions on Machine Learning Research , author =. 2023 , file =

2023
[35]

and Saphra, Naomi , month = oct, year =

Chen, Angelica and Shwartz-Ziv, Ravid and Cho, Kyunghyun and Leavitt, Matthew L. and Saphra, Naomi , month = oct, year =. Sudden
[36]

, month = nov, year =

Zhao, Rosie and Saphra, Naomi and Kakade, Sham M. , month = nov, year =. Distributional
[37]

arXiv.org , author =

Random. arXiv.org , author =. 2025 , file =

2025
[38]

arXiv.org , author =

Sometimes. arXiv.org , author =. 2024 , file =

2024
[39]

PLOS Computational Biology , author =

Neuronal identity is not static:. PLOS Computational Biology , author =. 2025 , note =. doi:10.1371/journal.pcbi.1013821 , abstract =

work page doi:10.1371/journal.pcbi.1013821 2025
[40]

Nature Machine Intelligence , author =

A taxonomy and review of generalization research in. Nature Machine Intelligence , author =. 2023 , note =. doi:10.1038/s42256-023-00729-y , abstract =

work page doi:10.1038/s42256-023-00729-y 2023
[41]

Toward a theory of generalizability in llm mechanistic interpretability research.arXiv preprint arXiv:2509.22831,

Trott, Sean , month = sep, year =. Toward a. doi:10.48550/arXiv.2509.22831 , abstract =

work page doi:10.48550/arxiv.2509.22831
[42]

Cognition , author =

Two reasons to abandon the false belief task as a test of theory of mind , volume =. Cognition , author =. 2000 , keywords =. doi:10.1016/S0010-0277(00)00096-2 , abstract =

work page doi:10.1016/s0010-0277(00)00096-2 2000
[43]

and Rivière, Pamela D

Trott, Sean and Taylor, Samuel and Jones, Cameron and Michaelov, James A. and Rivière, Pamela D. , month = feb, year =. Language. doi:10.48550/arXiv.2602.16085 , abstract =

work page doi:10.48550/arxiv.2602.16085
[44]

Transactions of the Association for Computational Linguistics , author =

Comparing. Transactions of the Association for Computational Linguistics , author =. 2024 , note =. doi:10.1162/tacl_a_00674 , abstract =

work page doi:10.1162/tacl_a_00674 2024
[45]

Behavioral and Brain Sciences , author =

Understanding and sharing intentions:. Behavioral and Brain Sciences , author =. 2005 , keywords =. doi:10.1017/S0140525X05000129 , abstract =

work page doi:10.1017/s0140525x05000129 2005
[46]

theory of mind

What is “theory of mind”?. Quarterly Journal of Experimental Psychology , author =. 2012 , note =. doi:10.1080/17470218.2012.676055 , abstract =

work page doi:10.1080/17470218.2012.676055 2012
[47]

Proceedings of the Annual Meeting of the Cognitive Science Society , author =

Does reading words help you to read minds?. Proceedings of the Annual Meeting of the Cognitive Science Society , author =. 2024 , file =

2024
[48]

Shapira, Natalie and Levy, Mosh and Alavi, Seyed Hossein and Zhou, Xuhui and Choi, Yejin and Goldberg, Yoav and Sap, Maarten and Shwartz, Vered , editor =. Clever. Proceedings of the 18th. 2024 , pages =. doi:10.18653/v1/2024.eacl-long.138 , abstract =

work page doi:10.18653/v1/2024.eacl-long.138 2024
[49]

Ullman, Tomer , month = mar, year =. Large. doi:10.48550/arXiv.2302.08399 , abstract =

work page doi:10.48550/arxiv.2302.08399
[50]

Advances in Neural Information Processing Systems , author =

Understanding. Advances in Neural Information Processing Systems , author =. 2023 , pages =

2023
[51]

Proceedings of the 62nd

Xu, Hainiu and Zhao, Runcong and Zhu, Lixing and Du, Jinhua and He, Yulan , editor =. Proceedings of the 62nd. 2024 , pages =. doi:10.18653/v1/2024.acl-long.466 , abstract =

work page doi:10.18653/v1/2024.acl-long.466 2024
[52]

Cognitive Science , author =

Do. Cognitive Science , author =. 2023 , note =. doi:10.1111/cogs.13309 , abstract =

work page doi:10.1111/cogs.13309 2023
[53]

Philosophical Transactions of the Royal Society B: Biological Sciences , author =

Re-evaluating. Philosophical Transactions of the Royal Society B: Biological Sciences , author =. 2025 , pages =. doi:10.1098/rstb.2023.0499 , abstract =

work page doi:10.1098/rstb.2023.0499 2025
[54]

Developmental Psychology , author =

I can talk you into it:. Developmental Psychology , author =. 2013 , note =. doi:10.1037/a0028280 , abstract =

work page doi:10.1037/a0028280 2013
[55]

PLOS ONE , author =

Cooperation and. PLOS ONE , author =. 2008 , note =. doi:10.1371/journal.pone.0002023 , abstract =

work page doi:10.1371/journal.pone.0002023 2008
[56]

Child Development , author =

Small-. Child Development , author =. 1989 , note =. doi:10.2307/1130919 , abstract =

work page doi:10.2307/1130919 1989
[57]

Child Development , author =

Meta-. Child Development , author =. 2001 , note =. doi:10.1111/1467-8624.00304 , abstract =

work page doi:10.1111/1467-8624.00304 2001
[58]

Trends in Cognitive Sciences , author =

Does the chimpanzee have a theory of mind? 30 years later , volume =. Trends in Cognitive Sciences , author =. 2008 , pmid =. doi:10.1016/j.tics.2008.02.010 , language =

work page doi:10.1016/j.tics.2008.02.010 2008
[59]

WIREs Cognitive Science , author =

Theory of mind in animals:. WIREs Cognitive Science , author =. 2019 , note =. doi:10.1002/wcs.1503 , abstract =

work page doi:10.1002/wcs.1503 2019
[60]

Science , author =

Great apes anticipate that other individuals will act according to false beliefs , volume =. Science , author =. 2016 , note =. doi:10.1126/science.aaf8110 , abstract =

work page doi:10.1126/science.aaf8110 2016
[61]

2026 , note =

Frontiers in Human Neuroscience , author =. 2026 , note =. doi:10.3389/fnhum.2025.1633272 , abstract =

work page doi:10.3389/fnhum.2025.1633272 2026
[62]

Evaluating large language models in theory of mind tasks.arXiv preprint arXiv:2302.02083,

Evaluating large language models in theory of mind tasks , volume =. Proceedings of the National Academy of Sciences , author =. 2024 , note =. doi:10.1073/pnas.2405460121 , abstract =

work page doi:10.1073/pnas.2405460121 2024
[63]

Child Development , author =

Why. Child Development , author =. 1996 , note =. doi:10.1111/j.1467-8624.1996.tb01767.x , abstract =

work page doi:10.1111/j.1467-8624.1996.tb01767.x 1996
[64]

Topics in Language Disorders , author =

The. Topics in Language Disorders , author =. 2014 , pages =. doi:10.1097/TLD.0000000000000037 , abstract =

work page doi:10.1097/tld.0000000000000037 2014
[65]

Artificial Intelligence Review , author =

Lies, damned lies, and language statistics: a comprehensive review of risks from manipulation, persuasion, and deception with large language models , volume =. Artificial Intelligence Review , author =. 2026 , keywords =. doi:10.1007/s10462-026-11517-6 , abstract =

work page doi:10.1007/s10462-026-11517-6 2026
[66]

Child development , volume=

Sensorimotor decoupling contributes to triadic attention: A longitudinal investigation of mother--infant--object interactions , author=. Child development , volume=. 2016 , publisher=

2016
[67]

Journal of Autism and Developmental Disorders , volume=

Does the neurotypical human have a ‘theory of mind’? , author=. Journal of Autism and Developmental Disorders , volume=. 2023 , publisher=

2023
[68]

PloS one , volume=

Using fiction to assess mental state understanding: a new task for assessing theory of mind in adults , author=. PloS one , volume=. 2013 , publisher=

2013
[69]

British Journal of Developmental Psychology , volume=

Reliability and validity of advanced theory-of-mind measures in middle childhood and adolescence , author=. British Journal of Developmental Psychology , volume=. 2017 , publisher=

2017
[70]

Discourse Processes , volume=

Individual differences in mentalizing capacity predict indirect request comprehension , author=. Discourse Processes , volume=. 2019 , publisher=

2019
[71]

Cognition , volume=

Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception , author=. Cognition , volume=. 1983 , publisher=

1983
[72]

theory of mind

Does the autistic child have a “theory of mind”? , author=. Cognition , volume=. 1985 , publisher=

1985
[73]

Journal of statistical software , volume=

Fitting linear mixed-effects models using lme4 , author=. Journal of statistical software , volume=
[74]

OLMo, Team and Walsh, Pete and Soldaini, Luca and Groeneveld, Dirk and Lo, Kyle and Arora, Shane and Bhagia, Akshita and Gu, Yuling and Huang, Shengyi and Jordan, Matt and others , journal=. 2
[75]

Proceedings of the 40th International Conference on Machine Learning , pages=

Pythia: a suite for analyzing large language models across training and scaling , author=. Proceedings of the 40th International Conference on Machine Learning , pages=
[76]

2026 , eprint=

Traces of Social Competence in Large Language Models , author=. 2026 , eprint=

2026
[77]

cortex , volume=

The neural basis of belief-attribution across the lifespan: False-belief reasoning and the N400 effect , author=. cortex , volume=. 2020 , publisher=

2020
[78]

In-context Learning and Induction Heads

In-context learning and induction heads , author=. arXiv preprint arXiv:2209.11895 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[79]

arXiv preprint arXiv:2510.24963 , year=

Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale , author=. arXiv preprint arXiv:2510.24963 , year=

work page arXiv
[80]

, author=

A longitudinal study of the relation between language and theory-of-mind development. , author=. Developmental psychology , volume=. 1999 , publisher=

1999

Showing first 80 references.

[1] [1]

Minds and Machines , author =

Explanation as. Minds and Machines , author =. 1998 , keywords =. doi:10.1023/A:1008290415597 , language =

work page doi:10.1023/a:1008290415597 1998

[2] [2]

Perspectives on Psychological Science , author =

Repositioning. Perspectives on Psychological Science , author =. 2025 , note =. doi:10.1177/17456916231195852 , abstract =

work page doi:10.1177/17456916231195852 2025

[3] [3]

Inventing

Chang, Hasok , month = aug, year =. Inventing

[4] [4]

Nature Human Behaviour , author =

How to evaluate the cognitive abilities of. Nature Human Behaviour , author =. 2025 , pmid =. doi:10.1038/s41562-024-02096-z , language =

work page doi:10.1038/s41562-024-02096-z 2025

[5] [5]

Trends in Cognitive Sciences , author =

Identifying indicators of consciousness in. Trends in Cognitive Sciences , author =. 2025 , pmid =. doi:10.1016/j.tics.2025.10.011 , language =

work page doi:10.1016/j.tics.2025.10.011 2025

[6] [6]

Philosophy of the Social Sciences , author =

Valid for. Philosophy of the Social Sciences , author =. 2021 , note =. doi:10.1177/0048393120971169 , abstract =

work page doi:10.1177/0048393120971169 2021

[7] [7]

Finding Interpretable Prompt-Specific Circuits in Language Models

Franco, Gabriel and Tassis, Lucas M. and Rohr, Azalea and Crovella, Mark , month = feb, year =. Finding. doi:10.48550/arXiv.2602.13483 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.13483

[8] [8]

The Journal of Philosophy , author =

Real. The Journal of Philosophy , author =. 1991 , note =. doi:10.2307/2027085 , number =

work page doi:10.2307/2027085 1991

[9] [9]

Psychological Review , author =

The. Psychological Review , author =. 2004 , keywords =. doi:10.1037/0033-295X.111.4.1061 , language =

work page doi:10.1037/0033-295x.111.4.1061 2004

[10] [10]

Benchmarks as

Saxon, Michael and Holtzman, Ari and West, Peter and Wang, William Yang and Saphra, Naomi , month = jul, year =. Benchmarks as. doi:10.48550/arXiv.2407.16711 , abstract =

work page doi:10.48550/arxiv.2407.16711

[11] [11]

Psychological Bulletin , author =

Construct validity in psychological tests , volume =. Psychological Bulletin , author =. 1955 , note =. doi:10.1037/h0040957 , abstract =

work page doi:10.1037/h0040957 1955

[12] [12]

Philosophy of Science , author =

A. Philosophy of Science , author =. 2019 , keywords =. doi:10.1086/705567 , abstract =

work page doi:10.1086/705567 2019

[13] [13]

Philosophy of Science , author =

Is. Philosophy of Science , author =. 2016 , keywords =. doi:10.1086/687941 , abstract =

work page doi:10.1086/687941 2016

[14] [14]

Auxiliary task demands mask the capabilities of smaller language models , url =

Hu, Jennifer and Frank, Michael , month = aug, year =. Auxiliary task demands mask the capabilities of smaller language models , url =

[15] [15]

Nature Human Behaviour , author =

Testing theory of mind in large language models and humans , volume =. Nature Human Behaviour , author =. 2024 , note =. doi:10.1038/s41562-024-01882-z , abstract =

work page doi:10.1038/s41562-024-01882-z 2024

[16] [16]

Advances in Methods and Practices in Psychological Science , author =

Measurement. Advances in Methods and Practices in Psychological Science , author =. 2020 , note =. doi:10.1177/2515245920952393 , abstract =

work page doi:10.1177/2515245920952393 2020

[17] [17]

Current Directions in Psychological Science , author =

Credibility. Current Directions in Psychological Science , author =. 2022 , note =. doi:10.1177/09637214211067779 , abstract =

work page doi:10.1177/09637214211067779 2022

[18] [18]

arXiv preprint arXiv:2111.15366 , year=

Raji, Inioluwa Deborah and Bender, Emily M. and Paullada, Amandalynne and Denton, Emily and Hanna, Alex , month = nov, year =. doi:10.48550/arXiv.2111.15366 , abstract =

work page doi:10.48550/arxiv.2111.15366

[19] [19]

Bean, Andrew M. and Kearns, Ryan Othniel and Romanou, Angelika and Hafner, Franziska Sofia and Mayne, Harry and Batzner, Jan and Foroutan, Negar and Schmitz, Chris and Korgul, Karolina and Batra, Hunar and Deb, Oishi and Beharry, Emma and Emde, Cornelius and Foster, Thomas and Gausen, Anna and Grandury, María and Han, Simeng and Hofmann, Valentin and Ibra...

work page doi:10.48550/arxiv.2511.04703

[20] [20]

The British Journal for the Philosophy of Science , author =

Model. The British Journal for the Philosophy of Science , author =. 2015 , pages =. doi:10.1093/bjps/axt055 , abstract =

work page doi:10.1093/bjps/axt055 2015

[21] [21]

NeuroImage , author =

Six problems for causal inference from. NeuroImage , author =. 2010 , pages =. doi:10.1016/j.neuroimage.2009.08.065 , abstract =

work page doi:10.1016/j.neuroimage.2009.08.065 2010

[22] [22]

Transactions of the Association for Computational Linguistics , author =

Are. Transactions of the Association for Computational Linguistics , author =. 2024 , pages =. doi:10.1162/tacl_a_00690 , abstract =

work page doi:10.1162/tacl_a_00690 2024

[23] [23]

Goldstein, Simon and Lederman, Harvey , file =. What

[24] [24]

Convergence and

Fehlauer, Finlay and Mahowald, Kyle and Pimentel, Tiago , editor =. Convergence and. Proceedings of the 2025. 2025 , pages =. doi:10.18653/v1/2025.emnlp-main.1675 , abstract =

work page doi:10.18653/v1/2025.emnlp-main.1675 2025

[25] [25]

arXiv.org , author =

Predicting the. arXiv.org , author =. 2025 , file =

2025

[26] [26]

Tigges, Curt and Hanna, Michael and Biderman, Stella and Yu, Qinan , keywords =

[27] [27]

2024 , pages =

Advances in Neural Information Processing Systems , author =. 2024 , pages =. doi:10.52202/079017-1287 , language =

work page doi:10.52202/079017-1287 2024

[28] [28]

Tsvilodub, Polina and Klumpp, Jan-Felix and Mohammadpour, Amir and Hu, Jennifer and Franke, Michael , month = feb, year =. On. doi:10.48550/arXiv.2602.10298 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.10298

[29] [29]

2025 , pmid =

Trends in Cognitive Sciences , author =. 2025 , pmid =. doi:10.1016/j.tics.2025.09.012 , language =

work page doi:10.1016/j.tics.2025.09.012 2025

[30] [30]

ArXiv , author =

Measuring and. ArXiv , author =. 2025 , pmid =

2025

[31] [31]

, file =

Nefdt, Ryan M. , file =. What it's like to be an

[32] [32]

and McGraw, A

Bauman, Christopher W. and McGraw, A. Peter and Bartels, Daniel M. and Warren, Caleb , file =. Revisiting. doi:10.1111/spc3.12131 , abstract =

work page doi:10.1111/spc3.12131

[33] [33]

bioRxiv , author =

Human-like individual differences emerge from random weight initializations in neural networks , issn =. bioRxiv , author =. 2025 , pmid =. doi:10.1101/2025.10.25.684448 , abstract =

work page doi:10.1101/2025.10.25.684448 2025

[34] [34]

Transactions on Machine Learning Research , author =

Latent. Transactions on Machine Learning Research , author =. 2023 , file =

2023

[35] [35]

and Saphra, Naomi , month = oct, year =

Chen, Angelica and Shwartz-Ziv, Ravid and Cho, Kyunghyun and Leavitt, Matthew L. and Saphra, Naomi , month = oct, year =. Sudden

[36] [36]

, month = nov, year =

Zhao, Rosie and Saphra, Naomi and Kakade, Sham M. , month = nov, year =. Distributional

[37] [37]

arXiv.org , author =

Random. arXiv.org , author =. 2025 , file =

2025

[38] [38]

arXiv.org , author =

Sometimes. arXiv.org , author =. 2024 , file =

2024

[39] [39]

PLOS Computational Biology , author =

Neuronal identity is not static:. PLOS Computational Biology , author =. 2025 , note =. doi:10.1371/journal.pcbi.1013821 , abstract =

work page doi:10.1371/journal.pcbi.1013821 2025

[40] [40]

Nature Machine Intelligence , author =

A taxonomy and review of generalization research in. Nature Machine Intelligence , author =. 2023 , note =. doi:10.1038/s42256-023-00729-y , abstract =

work page doi:10.1038/s42256-023-00729-y 2023

[41] [41]

Toward a theory of generalizability in llm mechanistic interpretability research.arXiv preprint arXiv:2509.22831,

Trott, Sean , month = sep, year =. Toward a. doi:10.48550/arXiv.2509.22831 , abstract =

work page doi:10.48550/arxiv.2509.22831

[42] [42]

Cognition , author =

Two reasons to abandon the false belief task as a test of theory of mind , volume =. Cognition , author =. 2000 , keywords =. doi:10.1016/S0010-0277(00)00096-2 , abstract =

work page doi:10.1016/s0010-0277(00)00096-2 2000

[43] [43]

and Rivière, Pamela D

Trott, Sean and Taylor, Samuel and Jones, Cameron and Michaelov, James A. and Rivière, Pamela D. , month = feb, year =. Language. doi:10.48550/arXiv.2602.16085 , abstract =

work page doi:10.48550/arxiv.2602.16085

[44] [44]

Transactions of the Association for Computational Linguistics , author =

Comparing. Transactions of the Association for Computational Linguistics , author =. 2024 , note =. doi:10.1162/tacl_a_00674 , abstract =

work page doi:10.1162/tacl_a_00674 2024

[45] [45]

Behavioral and Brain Sciences , author =

Understanding and sharing intentions:. Behavioral and Brain Sciences , author =. 2005 , keywords =. doi:10.1017/S0140525X05000129 , abstract =

work page doi:10.1017/s0140525x05000129 2005

[46] [46]

theory of mind

What is “theory of mind”?. Quarterly Journal of Experimental Psychology , author =. 2012 , note =. doi:10.1080/17470218.2012.676055 , abstract =

work page doi:10.1080/17470218.2012.676055 2012

[47] [47]

Proceedings of the Annual Meeting of the Cognitive Science Society , author =

Does reading words help you to read minds?. Proceedings of the Annual Meeting of the Cognitive Science Society , author =. 2024 , file =

2024

[48] [48]

Shapira, Natalie and Levy, Mosh and Alavi, Seyed Hossein and Zhou, Xuhui and Choi, Yejin and Goldberg, Yoav and Sap, Maarten and Shwartz, Vered , editor =. Clever. Proceedings of the 18th. 2024 , pages =. doi:10.18653/v1/2024.eacl-long.138 , abstract =

work page doi:10.18653/v1/2024.eacl-long.138 2024

[49] [49]

Ullman, Tomer , month = mar, year =. Large. doi:10.48550/arXiv.2302.08399 , abstract =

work page doi:10.48550/arxiv.2302.08399

[50] [50]

Advances in Neural Information Processing Systems , author =

Understanding. Advances in Neural Information Processing Systems , author =. 2023 , pages =

2023

[51] [51]

Proceedings of the 62nd

Xu, Hainiu and Zhao, Runcong and Zhu, Lixing and Du, Jinhua and He, Yulan , editor =. Proceedings of the 62nd. 2024 , pages =. doi:10.18653/v1/2024.acl-long.466 , abstract =

work page doi:10.18653/v1/2024.acl-long.466 2024

[52] [52]

Cognitive Science , author =

Do. Cognitive Science , author =. 2023 , note =. doi:10.1111/cogs.13309 , abstract =

work page doi:10.1111/cogs.13309 2023

[53] [53]

Philosophical Transactions of the Royal Society B: Biological Sciences , author =

Re-evaluating. Philosophical Transactions of the Royal Society B: Biological Sciences , author =. 2025 , pages =. doi:10.1098/rstb.2023.0499 , abstract =

work page doi:10.1098/rstb.2023.0499 2025

[54] [54]

Developmental Psychology , author =

I can talk you into it:. Developmental Psychology , author =. 2013 , note =. doi:10.1037/a0028280 , abstract =

work page doi:10.1037/a0028280 2013

[55] [55]

PLOS ONE , author =

Cooperation and. PLOS ONE , author =. 2008 , note =. doi:10.1371/journal.pone.0002023 , abstract =

work page doi:10.1371/journal.pone.0002023 2008

[56] [56]

Child Development , author =

Small-. Child Development , author =. 1989 , note =. doi:10.2307/1130919 , abstract =

work page doi:10.2307/1130919 1989

[57] [57]

Child Development , author =

Meta-. Child Development , author =. 2001 , note =. doi:10.1111/1467-8624.00304 , abstract =

work page doi:10.1111/1467-8624.00304 2001

[58] [58]

Trends in Cognitive Sciences , author =

Does the chimpanzee have a theory of mind? 30 years later , volume =. Trends in Cognitive Sciences , author =. 2008 , pmid =. doi:10.1016/j.tics.2008.02.010 , language =

work page doi:10.1016/j.tics.2008.02.010 2008

[59] [59]

WIREs Cognitive Science , author =

Theory of mind in animals:. WIREs Cognitive Science , author =. 2019 , note =. doi:10.1002/wcs.1503 , abstract =

work page doi:10.1002/wcs.1503 2019

[60] [60]

Science , author =

Great apes anticipate that other individuals will act according to false beliefs , volume =. Science , author =. 2016 , note =. doi:10.1126/science.aaf8110 , abstract =

work page doi:10.1126/science.aaf8110 2016

[61] [61]

2026 , note =

Frontiers in Human Neuroscience , author =. 2026 , note =. doi:10.3389/fnhum.2025.1633272 , abstract =

work page doi:10.3389/fnhum.2025.1633272 2026

[62] [62]

Evaluating large language models in theory of mind tasks.arXiv preprint arXiv:2302.02083,

Evaluating large language models in theory of mind tasks , volume =. Proceedings of the National Academy of Sciences , author =. 2024 , note =. doi:10.1073/pnas.2405460121 , abstract =

work page doi:10.1073/pnas.2405460121 2024

[63] [63]

Child Development , author =

Why. Child Development , author =. 1996 , note =. doi:10.1111/j.1467-8624.1996.tb01767.x , abstract =

work page doi:10.1111/j.1467-8624.1996.tb01767.x 1996

[64] [64]

Topics in Language Disorders , author =

The. Topics in Language Disorders , author =. 2014 , pages =. doi:10.1097/TLD.0000000000000037 , abstract =

work page doi:10.1097/tld.0000000000000037 2014

[65] [65]

Artificial Intelligence Review , author =

Lies, damned lies, and language statistics: a comprehensive review of risks from manipulation, persuasion, and deception with large language models , volume =. Artificial Intelligence Review , author =. 2026 , keywords =. doi:10.1007/s10462-026-11517-6 , abstract =

work page doi:10.1007/s10462-026-11517-6 2026

[66] [66]

Child development , volume=

Sensorimotor decoupling contributes to triadic attention: A longitudinal investigation of mother--infant--object interactions , author=. Child development , volume=. 2016 , publisher=

2016

[67] [67]

Journal of Autism and Developmental Disorders , volume=

Does the neurotypical human have a ‘theory of mind’? , author=. Journal of Autism and Developmental Disorders , volume=. 2023 , publisher=

2023

[68] [68]

PloS one , volume=

Using fiction to assess mental state understanding: a new task for assessing theory of mind in adults , author=. PloS one , volume=. 2013 , publisher=

2013

[69] [69]

British Journal of Developmental Psychology , volume=

Reliability and validity of advanced theory-of-mind measures in middle childhood and adolescence , author=. British Journal of Developmental Psychology , volume=. 2017 , publisher=

2017

[70] [70]

Discourse Processes , volume=

Individual differences in mentalizing capacity predict indirect request comprehension , author=. Discourse Processes , volume=. 2019 , publisher=

2019

[71] [71]

Cognition , volume=

Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception , author=. Cognition , volume=. 1983 , publisher=

1983

[72] [72]

theory of mind

Does the autistic child have a “theory of mind”? , author=. Cognition , volume=. 1985 , publisher=

1985

[73] [73]

Journal of statistical software , volume=

Fitting linear mixed-effects models using lme4 , author=. Journal of statistical software , volume=

[74] [74]

OLMo, Team and Walsh, Pete and Soldaini, Luca and Groeneveld, Dirk and Lo, Kyle and Arora, Shane and Bhagia, Akshita and Gu, Yuling and Huang, Shengyi and Jordan, Matt and others , journal=. 2

[75] [75]

Proceedings of the 40th International Conference on Machine Learning , pages=

Pythia: a suite for analyzing large language models across training and scaling , author=. Proceedings of the 40th International Conference on Machine Learning , pages=

[76] [76]

2026 , eprint=

Traces of Social Competence in Large Language Models , author=. 2026 , eprint=

2026

[77] [77]

cortex , volume=

The neural basis of belief-attribution across the lifespan: False-belief reasoning and the N400 effect , author=. cortex , volume=. 2020 , publisher=

2020

[78] [78]

In-context Learning and Induction Heads

In-context learning and induction heads , author=. arXiv preprint arXiv:2209.11895 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[79] [79]

arXiv preprint arXiv:2510.24963 , year=

Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale , author=. arXiv preprint arXiv:2510.24963 , year=

work page arXiv

[80] [80]

, author=

A longitudinal study of the relation between language and theory-of-mind development. , author=. Developmental psychology , volume=. 1999 , publisher=

1999