Revisiting the evaluation of theory of mind through question answering.Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pages 5872–5877

Le, Matthew, Boureau, Y-Lan, Nickel, Maximilian · 2019 · DOI 10.18653/v1/d19-1598

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

open at publisher browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

cs.AI · 2026-05-25 · unverdicted · novelty 7.0

OmniToM is a new benchmark for Theory of Mind in LLMs that evaluates explicit belief extraction and seven-dimensional labeling from 895 stories, revealing an actor-specific belief-tracking bottleneck.

On the Limits of Steering Vectors for Preference-Aligned Generation

cs.CL · 2026-07-02 · unverdicted · novelty 6.0

Empirical evaluation on the PLUME benchmark shows steering vectors vary widely in trait expressibility, degrade on task transfer, and lose effectiveness when multiple vectors are composed.

PerspectiveGap: A Benchmark for Multi-Agent Orchestration Prompting

cs.CL · 2026-06-07 · unverdicted · novelty 6.0

PerspectiveGap benchmark shows LLMs achieve only 14.9% average pass rate on multi-agent orchestration prompting tasks, with GPT-5.5 at 62%.

Instructions Shape Production of Language, not Processing

cs.CL · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.

EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents

cs.AI · 2026-05-11 · conditional · novelty 6.0 · 2 refs

EnactToM is an evolving benchmark of embodied multi-agent tasks that tests functional Theory of Mind by requiring agents to act optimally on implicit beliefs in partially observable 3D environments.

PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

PDDL-Mind improves LLM accuracy on theory-of-mind benchmarks by over 5% by translating stories into verifiable PDDL states that decouple environment tracking from belief inference.

CogniFold: Always-On Proactive Memory via Cognitive Folding

cs.AI · 2026-05-13 · unverdicted · novelty 5.0 · 2 refs

CogniFold extends Complementary Learning Systems theory to three layers with a prefrontal intent layer and uses graph self-organization to build proactive agent memory from continuous event streams.

DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories

cs.CL · 2026-04-22

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Revisiting the evaluation of theory of mind through question answering.Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pages 5872–5877

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer