Introduces NCP-ExploreToM framework to evaluate LLMs on inducing belief states via planning and action, with GPT-5 succeeding on ~80% of tasks and outperforming humans.
Persuasivetom: A bench- mark for evaluating machine theory of mind in persuasive dialogues
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
PERSUASIONTRACE introduces a Bayesian-network simulated target for multi-turn persuasion that matches human belief dynamics (81 vs 80) better than LLM baselines (64) and enables process-level evaluation.
Introduces ToM-PD task and ToM-BPD dataset plus TTBYS dual-knowledge framework, with Qwen3-8B outperforming GPT-5 on desire, belief, and strategy prediction.
CoSToM maps ToM features inside LLMs with causal tracing and steers activations in critical layers to boost intrinsic social reasoning and dialogue quality.
citing papers explorer
-
Think Thrice Before You Speak: Dual knowledge-enhanced Theory-of-Mind Reasoning for Persuasive Agents
Introduces ToM-PD task and ToM-BPD dataset plus TTBYS dual-knowledge framework, with Qwen3-8B outperforming GPT-5 on desire, belief, and strategy prediction.