Profile-conditioned LLMs achieve higher tacit alignment with humans on subjective spectra when traits match, as quantified by the new Tacit Understanding Index (TUX) from 241 humans and 200 agents.
Spin-bench: How well do llms plan strategically and reason socially?
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
dataset 1
citation-polarity summary
roles
dataset 1polarities
background 1representative citing papers
Mindgames introduces a four-game evaluation platform for multi-agent LLM reasoning, runs a 944-agent competition, surfaces rule-adherence and error-survival limitations, and releases a 29k-game dataset with an offline scoring protocol.
A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.
citing papers explorer
-
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review
A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.