Introduces NCP-ExploreToM framework to evaluate LLMs on inducing belief states via planning and action, with GPT-5 succeeding on ~80% of tasks and outperforming humans.
Persuasivetom: A bench- mark for evaluating machine theory of mind in persuasive dialogues
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
PERSUASIONTRACE introduces a Bayesian-network simulated target for multi-turn persuasion that matches human belief dynamics (81 vs 80) better than LLM baselines (64) and enables process-level evaluation.
Introduces ToM-PD task and ToM-BPD dataset plus TTBYS dual-knowledge framework, with Qwen3-8B outperforming GPT-5 on desire, belief, and strategy prediction.
CoSToM maps ToM features inside LLMs with causal tracing and steers activations in critical layers to boost intrinsic social reasoning and dialogue quality.
citing papers explorer
-
Theory of Mind and Persuasion Beyond Conversation: Assessing the Capacity of LLMs to Induce Belief States via Planning and Action
Introduces NCP-ExploreToM framework to evaluate LLMs on inducing belief states via planning and action, with GPT-5 succeeding on ~80% of tasks and outperforming humans.
-
A Model of Multi-turn Human Persuadability Using Probabilistic Belief Tracing
PERSUASIONTRACE introduces a Bayesian-network simulated target for multi-turn persuasion that matches human belief dynamics (81 vs 80) better than LLM baselines (64) and enables process-level evaluation.
-
CoSToM:Causal-oriented Steering for Intrinsic Theory-of-Mind Alignment in Large Language Models
CoSToM maps ToM features inside LLMs with causal tracing and steers activations in critical layers to boost intrinsic social reasoning and dialogue quality.