Large-scale experiments on two million agents reveal that collective intelligence does not emerge from scale alone due to sparse and shallow interactions.
Evaluating collective behaviour of hundreds of llm agents
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Empirical tests on four new frontier LLMs show cooperative equilibria favored in most balanced conditions, with provider identity correlating more strongly with outcomes than model generation.
AlphaEval is a benchmark of 94 production-sourced tasks from seven companies for evaluating full AI agent products across six domains using multiple judgment methods, plus a framework to build similar benchmarks.
LLM agents exhibit emergent deception in a sustainability game even without lying permission, with neighbor info increasing attacks while aiding biosphere retention.
citing papers explorer
-
Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents
Large-scale experiments on two million agents reveal that collective intelligence does not emerge from scale alone due to sparse and shallow interactions.
-
Evolutionary Dynamics of Cooperation in Next-Generation LLM Agent Systems: A Cross-Provider Empirical Extension
Empirical tests on four new frontier LLMs show cooperative equilibria favored in most balanced conditions, with provider identity correlating more strongly with outcomes than model generation.
-
AlphaEval: Evaluating Agents in Production
AlphaEval is a benchmark of 94 production-sourced tasks from seven companies for evaluating full AI agent products across six domains using multiple judgment methods, plus a framework to build similar benchmarks.
-
Is Lying an Emergent Behaviour in LLMs? Evidence from Gaslighting AI agents in a Sustainability Game
LLM agents exhibit emergent deception in a sustainability game even without lying permission, with neighbor info increasing attacks while aiding biosphere retention.