Introduces NCP-ExploreToM framework to evaluate LLMs on inducing belief states via planning and action, with GPT-5 succeeding on ~80% of tasks and outperforming humans.
arXiv preprint arXiv:2504.10839 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
background 1polarities
background 1representative citing papers
GroupToM-Bench is presented as the first multimodal benchmark for group-level Theory of Mind spanning micro BDI states to macro outcome prediction, with experiments showing current MLLMs lag human baselines on nonlinear social dynamics.
LLM outputs are meaningful according to standard theories of human language, without requiring anthropomorphic assumptions about the models.
MedMSA framework retrieves knowledge via language models then builds formal probabilistic models to produce uncertainty-weighted differential diagnoses from symptoms.
citing papers explorer
-
GroupToM-Bench: Benchmarking Group Theory of Mind and Nonlinear Social Emergence in MLLMs
GroupToM-Bench is presented as the first multimodal benchmark for group-level Theory of Mind spanning micro BDI states to macro outcome prediction, with experiments showing current MLLMs lag human baselines on nonlinear social dynamics.