A new evaluation framework for LLM social intelligence finds that influence, transparency, and adaptability predict agent success in games better than theory of mind or deep planning, with metrics achieving AUC 0.82 in predicting pairwise outcomes.
arXiv preprint arXiv:2402.01704 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.
SFT followed by RLVR on Qwen2.5-3B-Instruct raises syntactic and execution correctness when generating Game Code World Models across 30 games.
citing papers explorer
-
Communicate-Predict-Act: Evaluating Social Intelligence of Agents
A new evaluation framework for LLM social intelligence finds that influence, transparency, and adaptability predict agent success in games better than theory of mind or deep planning, with metrics achieving AUC 0.82 in predicting pairwise outcomes.
-
Common-agency Games for Multi-Objective Test-Time Alignment
CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.
-
Distilling Game Code World Model Generation into Lightweight Large Language Models
SFT followed by RLVR on Qwen2.5-3B-Instruct raises syntactic and execution correctness when generating Game Code World Models across 30 games.