Mind the Perspective: Let's Reason Recursively for Theory of Mind

Chao Lei; Guang Hu; Meng Yang; Nir Lipovetzky; Yanbei Jiang

arxiv: 2606.11724 · v1 · pith:X7YPI4GHnew · submitted 2026-06-10 · 💻 cs.AI

Mind the Perspective: Let's Reason Recursively for Theory of Mind

Chao Lei , Guang Hu , Meng Yang , Yanbei Jiang , Nir Lipovetzky This is my paper

Pith reviewed 2026-06-27 10:02 UTC · model grok-4.3

classification 💻 cs.AI

keywords Theory of MindRecursive ReasoningNested BeliefsPerspective ConstructionLLM InferenceHi-ToMBig-ToMFanToM

0 comments

The pith

RecToM reduces higher-order ToM questions to actual-world questions by recursively building each character's perspective from the prior one.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents RecToM as an inference-time method that addresses Theory of Mind reasoning in LLMs by explicitly modeling nested beliefs through recursive perspective construction. It builds each subsequent character's view from the preceding perspective along the chain given in the question, which converts complex belief questions into simpler queries about the final constructed world. This differs from prior approaches that rely on event filtering or temporal chains without handling nesting directly. A KD45 analysis supports that the construction produces a coherent belief modality. The method yields strong gains across benchmarks and reaches full accuracy on Hi-ToM for certain models.

Core claim

RecToM constructs each character perspective from the preceding character perspective along the character chain specified by the question, reducing higher-order belief questions to actual-world questions within the final constructed perspective, and the KD45 analysis shows that this perspective construction induces a well-formed belief modality beyond simple event filtering.

What carries the argument

Recursive perspective construction, which models nested beliefs by iteratively building each character's view from the previous one along the question-specified chain.

If this is right

RecToM reaches 100% accuracy on Hi-ToM with GPT-5.4 and Qwen3.5.
It consistently outperforms recent advanced approaches on Hi-ToM, Big-ToM, and FanToM across multiple LLM backbones.
The induced belief modality satisfies KD45 properties rather than reducing to event filtering alone.
Higher-order ToM questions become solvable by reducing them to first-order questions in the terminal perspective.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same recursive reduction might apply to other domains that require tracking multiple levels of knowledge, such as multi-agent planning under uncertainty.
Performance gains could diminish if questions involve deeper nesting than the benchmarks test, exposing limits in how far the base LLM can follow the constructed perspectives.
Explicit construction may allow smaller models to match larger ones on ToM tasks by offloading nesting to the procedure rather than internal capacity.

Load-bearing premise

The recursive construction of perspectives accurately captures nested beliefs without adding distortions or depending on the base model already having reliable higher-order reasoning.

What would settle it

A controlled test set of higher-order ToM scenarios where the final constructed perspective yields different answers from the known ground-truth nested beliefs.

Figures

Figures reproduced from arXiv: 2606.11724 by Chao Lei, Guang Hu, Meng Yang, Nir Lipovetzky, Yanbei Jiang.

**Figure 2.** Figure 2: Illustration of the full RECTOM procedure for solving the Hi-ToM instance in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: An example Big-ToM instance with the first [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: An example FanToM instance with first-order and second-order belief questions over a dialogue sequence. [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

read the original abstract

Theory of Mind (ToM) reasoning requires inferring agents' beliefs from partial and asymmetric observations, which remains an open challenge for LLMs. Existing prompting-based approaches improve ToM reasoning through observable-event filtering or temporal belief chains, without explicitly modeling nested beliefs. We introduce RecToM, an inference-time framework for ToM reasoning that models nested beliefs via recursive perspective construction. RecToM constructs each character perspective from the preceding character perspective along the character chain specified by the question, reducing higher-order belief questions to actual-world questions within the final constructed perspective. We further provide a KD45 analysis showing that RecToM's perspective construction induces a well-formed belief modality beyond simple event filtering. Experiments on ToM benchmarks, including Hi-ToM, Big-ToM, and FanToM, across multiple LLM backbones show that RecToM consistently outperforms recent advanced approaches, achieving state-of-the-art performance. Notably, RecToM reaches 100\% accuracy on Hi-ToM with GPT-5.4 and Qwen3.5, a benchmark requiring higher-order ToM reasoning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RecToM's recursive perspective chain is a clean new reduction for higher-order ToM, but the abstract leaves open whether the gains come from the method or from the base LLM already tracking the nesting.

read the letter

The main new thing is the explicit recursive construction that walks the character chain specified by the question, building each perspective from the prior one until the final view turns the original higher-order query into a first-person actual-world question. That step is distinct from the event-filtering and temporal-chain baselines cited.

The paper reports clear wins: consistent outperformance on Hi-ToM, Big-ToM, and FanToM across several LLM backbones, plus the 100% accuracy mark on Hi-ToM with GPT-5.4 and Qwen3.5. The KD45 analysis is offered as an independent check that the construction yields a well-formed belief modality.

The soft spot is the missing evidence that the recursion itself forces correct nested reasoning rather than simply surfacing what the base model can already do under this prompt format. No ablations isolate the recursive step, no error bars or failure-case breakdowns appear in the abstract, and the KD45 claim is stated without visible derivation. If the model already encodes the required nesting, the reduction adds little new capability.

This is aimed at people building practical ToM modules for multi-agent LLM systems. A reader who wants a straightforward inference-time trick and benchmark numbers will find it useful.

The idea is coherent enough on its own terms to deserve a serious referee who can inspect the full methods, controls, and derivation.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces RecToM, an inference-time framework for Theory of Mind (ToM) reasoning in LLMs. It models nested beliefs via recursive perspective construction: each character's perspective is built from the preceding one along the question-specified chain, reducing higher-order belief questions to first-order actual-world questions in the final perspective. A KD45 analysis is provided to establish that this construction induces a well-formed belief modality beyond simple event filtering. Experiments across multiple LLM backbones on Hi-ToM, Big-ToM, and FanToM show consistent outperformance over recent methods, with 100% accuracy on Hi-ToM using GPT-5.4 and Qwen3.5.

Significance. If the recursive construction can be shown to add structured nested-belief modeling independently of the base LLM's pre-existing higher-order reasoning, the work would offer a valuable, interpretable prompting technique for multi-agent reasoning tasks. The KD45 analysis supplies a formal element uncommon in prompting papers, and the reported 100% accuracy on Hi-ToM would be a notable empirical result if supported by ablations and statistical controls.

major comments (3)

[Method (recursive perspective construction)] Method section (recursive perspective construction): the central claim that successive perspective building reduces higher-order ToM queries to first-order actual-world queries inside the final perspective without systematic distortions is load-bearing, yet no ablation or controlled comparison is presented that isolates the recursive chain from the base LLM's ability to perform the required nesting when given an equivalent prompt sequence. Without such evidence the performance gains cannot be attributed to the proposed reduction mechanism.
[KD45 analysis] KD45 analysis section: the claim that the perspective construction induces a well-formed belief modality (K, D, 4, 5) is stated without visible derivation steps or explicit mapping from the recursive construction to the modal axioms. This is load-bearing for the assertion that RecToM goes beyond simple event filtering.
[Experiments] Experiments section, results tables: 100% accuracy on Hi-ToM and consistent outperformance are reported without error bars, number of runs, or failure-case analysis, undermining assessment of whether the gains are robust or attributable to the recursive method rather than model-specific prompting effects.

minor comments (1)

Clarify the exact model versions (e.g., 'GPT-5.4') and provide the precise prompt templates used for the recursive construction in an appendix for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments identify key areas where additional evidence and detail would strengthen the claims. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [Method (recursive perspective construction)] Method section (recursive perspective construction): the central claim that successive perspective building reduces higher-order ToM queries to first-order actual-world queries inside the final perspective without systematic distortions is load-bearing, yet no ablation or controlled comparison is presented that isolates the recursive chain from the base LLM's ability to perform the required nesting when given an equivalent prompt sequence. Without such evidence the performance gains cannot be attributed to the proposed reduction mechanism.

Authors: We agree that an ablation isolating the recursive perspective construction from the base LLM's pre-existing nesting ability is necessary to attribute gains specifically to the reduction mechanism. The current comparisons are against other prompting methods, but these do not directly control for equivalent prompt sequences. In the revised manuscript we will add a controlled ablation that replaces the recursive chain with a flat, non-recursive prompt sequence of matched length and content, allowing direct comparison of the recursive reduction's contribution. revision: yes
Referee: [KD45 analysis] KD45 analysis section: the claim that the perspective construction induces a well-formed belief modality (K, D, 4, 5) is stated without visible derivation steps or explicit mapping from the recursive construction to the modal axioms. This is load-bearing for the assertion that RecToM goes beyond simple event filtering.

Authors: The KD45 section argues that the recursive construction satisfies the axioms via the inductive building of perspectives, but we acknowledge that explicit derivation steps and axiom-by-axiom mappings are not fully detailed in the current text. In the revision we will expand this section with step-by-step derivations that map each axiom (K, D, 4, 5) to specific properties of the recursive perspective construction, clarifying the distinction from event filtering. revision: yes
Referee: [Experiments] Experiments section, results tables: 100% accuracy on Hi-ToM and consistent outperformance are reported without error bars, number of runs, or failure-case analysis, undermining assessment of whether the gains are robust or attributable to the recursive method rather than model-specific prompting effects.

Authors: We agree that the absence of error bars, run counts, and failure-case analysis limits evaluation of robustness. Although 100% accuracy was observed on the Hi-ToM benchmark with the tested backbones, the revised manuscript will report results across multiple independent runs with standard deviations, specify the number of runs performed, and include an analysis of any failure cases to support attribution to the recursive method. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The paper describes RecToM as an inference-time recursive perspective construction that reduces higher-order ToM queries to first-order ones within the final perspective, accompanied by a separate KD45 modal-logic analysis presented as an independent check. No equations, fitted parameters, or self-citations are shown that would make any claimed prediction or uniqueness result equivalent to its inputs by construction. Experimental results on Hi-ToM, Big-ToM, and FanToM are reported as external benchmarks rather than re-derivations of fitted values. The central claim therefore does not reduce to a self-definitional loop or renamed input.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that recursive perspective chaining preserves belief semantics (KD45) and that the base LLM can answer first-person questions accurately inside the constructed view. No free parameters or invented entities are mentioned.

axioms (1)

domain assumption KD45 belief modality is induced by the perspective construction
Stated in the abstract as the result of the KD45 analysis

pith-pipeline@v0.9.1-grok · 5725 in / 1248 out tokens · 18014 ms · 2026-06-27T10:02:24.405540+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 1 canonical work pages

[1]

Behavioral and Brain Sciences , author=

Does the chimpanzee have a theory of mind? , volume=. Behavioral and Brain Sciences , author=. 1978 , pages=

1978
[2]

Social IQa: Commonsense Reasoning about Social Interactions , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages=

2019
[3]

2023 , eprint=

Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks , author=. 2023 , eprint=

2023
[4]

Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models

Shapira, Natalie and Levy, Mosh and Alavi, Seyed Hossein and Zhou, Xuhui and Choi, Yejin and Goldberg, Yoav and Sap, Maarten and Shwartz, Vered. Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume ...

2024
[5]

Proceedings of the National Academy of Sciences , volume =

Michal Kosinski , title =. Proceedings of the National Academy of Sciences , volume =
[6]

FANT o M : A Benchmark for Stress-testing Machine Theory of Mind in Interactions

Kim, Hyunwoo and Sclar, Melanie and Zhou, Xuhui and Bras, Ronan and Kim, Gunhee and Choi, Yejin and Sap, Maarten. FANT o M : A Benchmark for Stress-testing Machine Theory of Mind in Interactions. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023

2023
[7]

Hi- T o M : A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models

Wu, Yufan and He, Yinghui and Jia, Yilin and Mihalcea, Rada and Chen, Yulong and Deng, Naihao. Hi- T o M : A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023

2023
[8]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

ToMBench: Benchmarking Theory of Mind in Large Language Models , author =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =. 2024 , address =

2024
[9]

O pen T o M : A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models

Xu, Hainiu and Zhao, Runcong and Zhu, Lixing and Du, Jinhua and He, Yulan. O pen T o M : A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024

2024
[10]

The Thirteenth International Conference on Learning Representations , year=

Explore Theory of Mind: program-guided adversarial data generation for theory of mind reasoning , author=. The Thirteenth International Conference on Learning Representations , year=
[11]

and Le, Quoc V

Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Ichter, Brian and Xia, Fei and Chi, Ed H. and Le, Quoc V. and Zhou, Denny , title =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =. 2022 , address =

2022
[12]

Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =

Kojima, Takeshi and Gu, Shixiang Shane and Reid, Machel and Matsuo, Yutaka and Iwasawa, Yusuke , title =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =. 2022 , address =

2022
[13]

The Eleventh International Conference on Learning Representations , year=

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. The Eleventh International Conference on Learning Representations , year=
[14]

Thirty-seventh Conference on Neural Information Processing Systems , year=

Tree of Thoughts: Deliberate Problem Solving with Large Language Models , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=
[15]

Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities

Wilf, Alex and Lee, Sihyun and Liang, Paul Pu and Morency, Louis-Philippe. Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024

2024
[16]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker , author =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =. 2023 , address =

2023
[17]

Findings of the Association for Computational Linguistics: ACL 2024 , pages =

TimeToM: Temporal Space is the Key to Unlocking the Door of Large Language Models' Theory-of-Mind , author =. Findings of the Association for Computational Linguistics: ACL 2024 , pages =. 2024 , address =

2024
[18]

Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models

Jung, Chani and Kim, Dongkwan and Jin, Jiho and Kim, Jiseon and Seonwoo, Yeon and Choi, Yejin and Oh, Alice and Kim, Hyunwoo. Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024

2024
[19]

E nigma T o M : Improve LLM s' Theory-of-Mind Reasoning Capabilities with Neural Knowledge Base of Entity States

Xu, Hainiu and Qi, Siya and Li, Jiazheng and Zhou, Yuxiang and Du, Jinhua and Catmur, Caroline and He, Yulan. E nigma T o M : Improve LLM s' Theory-of-Mind Reasoning Capabilities with Neural Knowledge Base of Entity States. Findings of the Association for Computational Linguistics: ACL 2025. 2025

2025
[20]

and Solecki, Slawomir , title =

Baltag, Alexandru and Moss, Lawrence S. and Solecki, Slawomir , title =. Proceedings of the 7th Conference on Theoretical Aspects of Rationality and Knowledge , pages =. 1998 , isbn =

1998
[21]

2007 , doi =

Dynamic Epistemic Logic , author =. 2007 , doi =

2007
[22]

M ind G ames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal Logic

Sileo, Damien and Lernould, Antoine. M ind G ames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal Logic. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023

2023
[23]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages =

DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic , author =. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages =. 2025 , month = nov, address =. doi:10.18653/v1/2025.emnlp-main.573 , url =

work page doi:10.18653/v1/2025.emnlp-main.573 2025
[24]

2024 , eprint=

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling , author=. 2024 , eprint=

2024
[25]

Charlie Victor Snell and Jaehoon Lee and Kelvin Xu and Aviral Kumar , booktitle=. Scaling
[26]

International conference on machine learning , pages=

Machine theory of mind , author=. International conference on machine learning , pages=. 2018 , organization=

2018
[27]

Proceedings of the 2022 conference on empirical methods in natural language processing , pages=

Neural theory-of-mind? on the limits of social intelligence in large lms , author=. Proceedings of the 2022 conference on empirical methods in natural language processing , pages=

2022
[28]

Understanding social reasoning in language models with language models , year =

Gandhi, Kanishk and Fr\". Understanding social reasoning in language models with language models , year =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =
[29]

Revisiting the evaluation of theory of mind through question answering , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages=

2019
[30]

John thinks that Mary thinks that…

“John thinks that Mary thinks that…” attribution of second-order beliefs by 5-to 10-year-old children , author=. Journal of experimental child psychology , volume=. 1985 , publisher=

1985
[31]

Cognition , volume=

Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception , author=. Cognition , volume=. 1983 , publisher=

1983
[32]

theory of mind

Does the autistic child have a “theory of mind”? , author=. Cognition , volume=. 1985 , publisher=

1985
[33]

Proceedings of the International Conference on Automated Planning and Scheduling , volume=

Planning with multi-agent belief using justified perspectives , author=. Proceedings of the International Conference on Automated Planning and Scheduling , volume=
[34]

Reidel, chapter What is Justified Belief , pages=

Justification and knowledge , author=. Reidel, chapter What is Justified Belief , pages=
[35]

Mind , volume=

Knowledge and belief , author=. Mind , volume=. 1952 , publisher=

1952
[36]

2004 , publisher=

Reasoning about knowledge , author=. 2004 , publisher=

2004
[37]

Introducing GPT-5.4 , author=
[38]

Gemini 3 Developer Guide , author=
[39]

Qwen3.5: Accelerating Productivity with Native Multimodal Agents , url =

Qwen Team , note=. Qwen3.5: Accelerating Productivity with Native Multimodal Agents , url =
[40]

Gemma 4 , url =

Google DeepMind , note=. Gemma 4 , url =
[41]

Perspectives on psychological science , volume=

What do theory-of-mind tasks actually measure? Theory and practice , author=. Perspectives on psychological science , volume=. 2020 , publisher=

2020

[1] [1]

Behavioral and Brain Sciences , author=

Does the chimpanzee have a theory of mind? , volume=. Behavioral and Brain Sciences , author=. 1978 , pages=

1978

[2] [2]

Social IQa: Commonsense Reasoning about Social Interactions , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages=

2019

[3] [3]

2023 , eprint=

Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks , author=. 2023 , eprint=

2023

[4] [4]

Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models

Shapira, Natalie and Levy, Mosh and Alavi, Seyed Hossein and Zhou, Xuhui and Choi, Yejin and Goldberg, Yoav and Sap, Maarten and Shwartz, Vered. Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume ...

2024

[5] [5]

Proceedings of the National Academy of Sciences , volume =

Michal Kosinski , title =. Proceedings of the National Academy of Sciences , volume =

[6] [6]

FANT o M : A Benchmark for Stress-testing Machine Theory of Mind in Interactions

Kim, Hyunwoo and Sclar, Melanie and Zhou, Xuhui and Bras, Ronan and Kim, Gunhee and Choi, Yejin and Sap, Maarten. FANT o M : A Benchmark for Stress-testing Machine Theory of Mind in Interactions. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023

2023

[7] [7]

Hi- T o M : A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models

Wu, Yufan and He, Yinghui and Jia, Yilin and Mihalcea, Rada and Chen, Yulong and Deng, Naihao. Hi- T o M : A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023

2023

[8] [8]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

ToMBench: Benchmarking Theory of Mind in Large Language Models , author =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =. 2024 , address =

2024

[9] [9]

O pen T o M : A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models

Xu, Hainiu and Zhao, Runcong and Zhu, Lixing and Du, Jinhua and He, Yulan. O pen T o M : A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024

2024

[10] [10]

The Thirteenth International Conference on Learning Representations , year=

Explore Theory of Mind: program-guided adversarial data generation for theory of mind reasoning , author=. The Thirteenth International Conference on Learning Representations , year=

[11] [11]

and Le, Quoc V

Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Ichter, Brian and Xia, Fei and Chi, Ed H. and Le, Quoc V. and Zhou, Denny , title =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =. 2022 , address =

2022

[12] [12]

Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =

Kojima, Takeshi and Gu, Shixiang Shane and Reid, Machel and Matsuo, Yutaka and Iwasawa, Yusuke , title =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =. 2022 , address =

2022

[13] [13]

The Eleventh International Conference on Learning Representations , year=

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. The Eleventh International Conference on Learning Representations , year=

[14] [14]

Thirty-seventh Conference on Neural Information Processing Systems , year=

Tree of Thoughts: Deliberate Problem Solving with Large Language Models , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

[15] [15]

Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities

Wilf, Alex and Lee, Sihyun and Liang, Paul Pu and Morency, Louis-Philippe. Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024

2024

[16] [16]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker , author =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =. 2023 , address =

2023

[17] [17]

Findings of the Association for Computational Linguistics: ACL 2024 , pages =

TimeToM: Temporal Space is the Key to Unlocking the Door of Large Language Models' Theory-of-Mind , author =. Findings of the Association for Computational Linguistics: ACL 2024 , pages =. 2024 , address =

2024

[18] [18]

Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models

Jung, Chani and Kim, Dongkwan and Jin, Jiho and Kim, Jiseon and Seonwoo, Yeon and Choi, Yejin and Oh, Alice and Kim, Hyunwoo. Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024

2024

[19] [19]

E nigma T o M : Improve LLM s' Theory-of-Mind Reasoning Capabilities with Neural Knowledge Base of Entity States

Xu, Hainiu and Qi, Siya and Li, Jiazheng and Zhou, Yuxiang and Du, Jinhua and Catmur, Caroline and He, Yulan. E nigma T o M : Improve LLM s' Theory-of-Mind Reasoning Capabilities with Neural Knowledge Base of Entity States. Findings of the Association for Computational Linguistics: ACL 2025. 2025

2025

[20] [20]

and Solecki, Slawomir , title =

Baltag, Alexandru and Moss, Lawrence S. and Solecki, Slawomir , title =. Proceedings of the 7th Conference on Theoretical Aspects of Rationality and Knowledge , pages =. 1998 , isbn =

1998

[21] [21]

2007 , doi =

Dynamic Epistemic Logic , author =. 2007 , doi =

2007

[22] [22]

M ind G ames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal Logic

Sileo, Damien and Lernould, Antoine. M ind G ames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal Logic. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023

2023

[23] [23]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages =

DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic , author =. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages =. 2025 , month = nov, address =. doi:10.18653/v1/2025.emnlp-main.573 , url =

work page doi:10.18653/v1/2025.emnlp-main.573 2025

[24] [24]

2024 , eprint=

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling , author=. 2024 , eprint=

2024

[25] [25]

Charlie Victor Snell and Jaehoon Lee and Kelvin Xu and Aviral Kumar , booktitle=. Scaling

[26] [26]

International conference on machine learning , pages=

Machine theory of mind , author=. International conference on machine learning , pages=. 2018 , organization=

2018

[27] [27]

Proceedings of the 2022 conference on empirical methods in natural language processing , pages=

Neural theory-of-mind? on the limits of social intelligence in large lms , author=. Proceedings of the 2022 conference on empirical methods in natural language processing , pages=

2022

[28] [28]

Understanding social reasoning in language models with language models , year =

Gandhi, Kanishk and Fr\". Understanding social reasoning in language models with language models , year =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =

[29] [29]

Revisiting the evaluation of theory of mind through question answering , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , pages=

2019

[30] [30]

John thinks that Mary thinks that…

“John thinks that Mary thinks that…” attribution of second-order beliefs by 5-to 10-year-old children , author=. Journal of experimental child psychology , volume=. 1985 , publisher=

1985

[31] [31]

Cognition , volume=

Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception , author=. Cognition , volume=. 1983 , publisher=

1983

[32] [32]

theory of mind

Does the autistic child have a “theory of mind”? , author=. Cognition , volume=. 1985 , publisher=

1985

[33] [33]

Proceedings of the International Conference on Automated Planning and Scheduling , volume=

Planning with multi-agent belief using justified perspectives , author=. Proceedings of the International Conference on Automated Planning and Scheduling , volume=

[34] [34]

Reidel, chapter What is Justified Belief , pages=

Justification and knowledge , author=. Reidel, chapter What is Justified Belief , pages=

[35] [35]

Mind , volume=

Knowledge and belief , author=. Mind , volume=. 1952 , publisher=

1952

[36] [36]

2004 , publisher=

Reasoning about knowledge , author=. 2004 , publisher=

2004

[37] [37]

Introducing GPT-5.4 , author=

[38] [38]

Gemini 3 Developer Guide , author=

[39] [39]

Qwen3.5: Accelerating Productivity with Native Multimodal Agents , url =

Qwen Team , note=. Qwen3.5: Accelerating Productivity with Native Multimodal Agents , url =

[40] [40]

Gemma 4 , url =

Google DeepMind , note=. Gemma 4 , url =

[41] [41]

Perspectives on psychological science , volume=

What do theory-of-mind tasks actually measure? Theory and practice , author=. Perspectives on psychological science , volume=. 2020 , publisher=

2020