arXiv preprint arXiv:2412.19726 , year=

Position: Theory of mind benchmarks are broken for large language models , author= · 2025 · arXiv 2412.19726

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Bayesian Social Deduction with Graph-Informed Language Models

cs.AI · 2025-06-21 · unverdicted · novelty 7.0

Hybrid Bayesian-graph LLM agent reaches competitive performance against large models and achieves 67% win rate against humans in controlled Avalon play, outperforming baselines and human teammates.

Embodied Multi-Agent Coordination by Aligning World Models Through Dialogue

cs.MA · 2026-05-13 · unverdicted · novelty 6.0

Dialogue between partially-observing LLM agents cuts action conflicts by 40-83 points but lowers task success versus silent coordination, with new metrics exposing limited genuine world-model alignment.

DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories

cs.CL · 2026-04-22

citing papers explorer

Showing 3 of 3 citing papers.

Bayesian Social Deduction with Graph-Informed Language Models cs.AI · 2025-06-21 · unverdicted · none · ref 42
Hybrid Bayesian-graph LLM agent reaches competitive performance against large models and achieves 67% win rate against humans in controlled Avalon play, outperforming baselines and human teammates.
Embodied Multi-Agent Coordination by Aligning World Models Through Dialogue cs.MA · 2026-05-13 · unverdicted · none · ref 41
Dialogue between partially-observing LLM agents cuts action conflicts by 40-83 points but lowers task success versus silent coordination, with new metrics exposing limited genuine world-model alignment.
DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories cs.CL · 2026-04-22 · unreviewed · ref 31

arXiv preprint arXiv:2412.19726 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer