Revisiting the evaluation of theory of mind through question answering

Matthew Le, Y-Lan Boureau, Maximilian Nickel · 2019 · DOI 10.18653/v1/d19-1598

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

EnactToM benchmark reveals frontier AI models achieve 0% on functional Theory of Mind task completion in embodied multi-agent settings despite 45% average on literal belief probes.

Cognifold: Always-On Proactive Memory via Cognitive Folding

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

Cognifold is a new proactive memory architecture that folds event streams into emergent cognitive structures by extending complementary learning systems theory with a prefrontal intent layer and graph topology self-organization.

Instructions Shape Production of Language, not Processing

cs.CL · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.

DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories

cs.CL · 2026-04-22 · unverdicted · novelty 6.0

LLMs identify mental states in dialogues well but mostly fail to forecast state-consistent future trajectories, except Gemini 3 Pro, with only weak overlap to human inferences.

PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

PDDL-Mind improves LLM accuracy on theory-of-mind benchmarks by over 5% by translating stories into verifiable PDDL states that decouple environment tracking from belief inference.

citing papers explorer

Showing 5 of 5 citing papers.

EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents cs.AI · 2026-05-11 · unverdicted · none · ref 5
EnactToM benchmark reveals frontier AI models achieve 0% on functional Theory of Mind task completion in embodied multi-agent settings despite 45% average on literal belief probes.
Cognifold: Always-On Proactive Memory via Cognitive Folding cs.AI · 2026-05-13 · unverdicted · none · ref 30
Cognifold is a new proactive memory architecture that folds event streams into emergent cognitive structures by extending complementary learning systems theory with a prefrontal intent layer and graph topology self-organization.
Instructions Shape Production of Language, not Processing cs.CL · 2026-05-11 · unverdicted · none · ref 32 · 2 links
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories cs.CL · 2026-04-22 · unverdicted · none · ref 23
LLMs identify mental states in dialogues well but mostly fail to forecast state-consistent future trajectories, except Gemini 3 Pro, with only weak overlap to human inferences.
PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking cs.CL · 2026-04-20 · unverdicted · none · ref 81
PDDL-Mind improves LLM accuracy on theory-of-mind benchmarks by over 5% by translating stories into verifiable PDDL states that decouple environment tracking from belief inference.

Revisiting the evaluation of theory of mind through question answering

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer