arXiv preprint arXiv:2412.19726 , year=

Position: Theory of Mind Benchmarks are Broken for Large Language Models , author= · 2025 · arXiv 2412.19726

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Theory of Mind and Persuasion Beyond Conversation: Assessing the Capacity of LLMs to Induce Belief States via Planning and Action

cs.CL · 2026-06-30 · unverdicted · novelty 7.0

Introduces NCP-ExploreToM framework to evaluate LLMs on inducing belief states via planning and action, with GPT-5 succeeding on ~80% of tasks and outperforming humans.

Bayesian Social Deduction with Graph-Informed Language Models

cs.AI · 2025-06-21 · unverdicted · novelty 7.0

Hybrid Bayesian-graph LLM agent reaches competitive performance against large models and achieves 67% win rate against humans in controlled Avalon play, outperforming baselines and human teammates.

Embodied Multi-Agent Coordination by Aligning World Models Through Dialogue

cs.MA · 2026-05-13 · unverdicted · novelty 6.0

Dialogue between partially-observing LLM agents cuts action conflicts by 40-83 points but lowers task success versus silent coordination, with new metrics exposing limited genuine world-model alignment.

DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories

cs.CL · 2026-04-22

citing papers explorer

Showing 1 of 1 citing paper after filters.

Bayesian Social Deduction with Graph-Informed Language Models cs.AI · 2025-06-21 · unverdicted · none · ref 42
Hybrid Bayesian-graph LLM agent reaches competitive performance against large models and achieves 67% win rate against humans in controlled Avalon play, outperforming baselines and human teammates.

arXiv preprint arXiv:2412.19726 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer