Evaluating Theory of Mind in Question Answering

Aida Nematzadeh; Alison Gopnik; Erin Grant; Kaylee Burns; Thomas L. Griffiths

arxiv: 1808.09352 · v1 · pith:G4UJVISVnew · submitted 2018-08-28 · 💻 cs.CL

Evaluating Theory of Mind in Question Answering

Aida Nematzadeh , Kaylee Burns , Erin Grant , Alison Gopnik , Thomas L. Griffiths This is my paper

classification 💻 cs.CL

keywords beliefsmodelstasksansweringevaluatingquestionreasonwhen

0 comments

read the original abstract

We propose a new dataset for evaluating question answering models with respect to their capacity to reason about beliefs. Our tasks are inspired by theory-of-mind experiments that examine whether children are able to reason about the beliefs of others, in particular when those beliefs differ from reality. We evaluate a number of recent neural models with memory augmentation. We find that all fail on our tasks, which require keeping track of inconsistent states of the world; moreover, the models' accuracy decreases notably when random sentences are introduced to the tasks at test.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Reinforcing Human Behavior Simulation via Verbal Feedback
cs.LG 2026-05 unverdicted novelty 6.0

DITTO uses RL with verbal feedback to train LLMs for human behavior simulation, reporting 36% average gains over base models and outperforming GPT-5.4 on 6 of 10 SOUL benchmark tasks.