Larger LLMs acquire basic situation modeling before mentalizing on false-belief tasks, with performance depending on size, training volume, and post-training, yet remaining sensitive to non-factive verbs and agent knowledge states.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LLMs trained via rubric-based self-rewarding RL with GRPO enhanced feeling expression and sycophancy robustness but degraded truthful QA performance.
citing papers explorer
-
Developmental Trajectories of Situation Modeling and Mentalizing in Transformer Language Models
Larger LLMs acquire basic situation modeling before mentalizing on false-belief tasks, with performance depending on size, training volume, and post-training, yet remaining sensitive to non-factive verbs and agent knowledge states.