SOTOPIA-TOM benchmark reveals that even GPT-5 scores only 62% on information management in multi-agent interactions, with Theory-of-Mind prompting cutting privacy violations and raising overall scores.
* Could my phrasing indirectly reveal restricted information? If yes, rephrase
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.MA 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SOTOPIA-TOM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind
SOTOPIA-TOM benchmark reveals that even GPT-5 scores only 62% on information management in multi-agent interactions, with Theory-of-Mind prompting cutting privacy violations and raising overall scores.