Towards dynamic theory of mind: Evaluating llm adaptation to temporal evolution of human states.arXiv:2505.17663

Towards Dynamic Theory of Mind: Evaluating LLM Adaptation to Temporal Evolution of Human States , author= · arXiv 2505.17663

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Theory of Mind and Persuasion Beyond Conversation: Assessing the Capacity of LLMs to Induce Belief States via Planning and Action

cs.CL · 2026-06-30 · unverdicted · novelty 7.0

Introduces NCP-ExploreToM framework to evaluate LLMs on inducing belief states via planning and action, with GPT-5 succeeding on ~80% of tasks and outperforming humans.

Emergent Coordination in Multi-Agent Language Models

cs.MA · 2025-10-05 · unverdicted · novelty 7.0

Multi-agent LLM systems can be steered via prompt design from mere aggregates to higher-order collectives with identity-linked differentiation and goal-directed complementarity, as measured by partial information decomposition of time-delayed mutual information.

AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts

cs.AI · 2026-01-16 · conditional · novelty 6.0

AgencyBench is a new benchmark with 138 tasks in 32 scenarios that measures autonomous agent performance on extended real-world problems using simulated feedback and sandboxed assessment.

citing papers explorer

Showing 3 of 3 citing papers.

Theory of Mind and Persuasion Beyond Conversation: Assessing the Capacity of LLMs to Induce Belief States via Planning and Action cs.CL · 2026-06-30 · unverdicted · none · ref 25
Introduces NCP-ExploreToM framework to evaluate LLMs on inducing belief states via planning and action, with GPT-5 succeeding on ~80% of tasks and outperforming humans.
Emergent Coordination in Multi-Agent Language Models cs.MA · 2025-10-05 · unverdicted · none · ref 16
Multi-agent LLM systems can be steered via prompt design from mere aggregates to higher-order collectives with identity-linked differentiation and goal-directed complementarity, as measured by partial information decomposition of time-delayed mutual information.
AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts cs.AI · 2026-01-16 · conditional · none · ref 7
AgencyBench is a new benchmark with 138 tasks in 32 scenarios that measures autonomous agent performance on extended real-world problems using simulated feedback and sandboxed assessment.

Towards dynamic theory of mind: Evaluating llm adaptation to temporal evolution of human states.arXiv:2505.17663

fields

years

verdicts

representative citing papers

citing papers explorer