C2C is a new testbed where LM agents negotiate differently from humans and targeted prompting raises their win rate from 22.2% to 32.7% across 1,100+ games.
Evaluating generalization capabilities of llm-based agents in mixed-motive scenarios using Concordia
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2representative citing papers
Mindgames introduces a four-game evaluation platform for multi-agent LLM reasoning, runs a 944-agent competition, surfaces rule-adherence and error-survival limitations, and releases a 29k-game dataset with an offline scoring protocol.
citing papers explorer
-
Cooperate to Compete: Strategic Coordination in Multi-Agent Conquest
C2C is a new testbed where LM agents negotiate differently from humans and targeted prompting raises their win rate from 22.2% to 32.7% across 1,100+ games.
-
MINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMs
Mindgames introduces a four-game evaluation platform for multi-agent LLM reasoning, runs a 944-agent competition, surfaces rule-adherence and error-survival limitations, and releases a 29k-game dataset with an offline scoring protocol.