Multi-Agent Teams Hold Experts Back

Aneesh Pappu; Batu El; Carmelo di Nolfo; Hancheng Cao; James Zou; Meng Cao; Yanchao Sun

arxiv: 2602.01011 · v4 · pith:KTRRBE4Enew · submitted 2026-02-01 · 💻 cs.MA · cs.AI

Multi-Agent Teams Hold Experts Back

Aneesh Pappu , Batu El , Hancheng Cao , Carmelo di Nolfo , Yanchao Sun , Meng Cao , James Zou This is my paper

classification 💻 cs.MA cs.AI

keywords teamsexpertperformancecoordinationexpertisemulti-agentratherself-organizing

0 comments

read the original abstract

Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective coordination cannot be fully designed in advance and must instead emerge through interaction. However, most prior work enforces coordination through fixed roles, workflows, or aggregation rules, leaving open the question of how well self-organizing teams perform when coordination is unconstrained. Drawing on organizational psychology, we study whether self-organizing LLM teams achieve strong synergy, where team performance matches or exceeds the best individual member. Across human-inspired and frontier ML benchmarks, we find that -- unlike human teams -- LLM teams consistently fail to match their expert agent's performance, even when explicitly told who the expert is, incurring performance losses of up to 41.1% on ML benchmarks. Decomposing this failure, we show that expert leveraging, rather than identification, is the primary bottleneck. Conversational analysis reveals a tendency toward integrative compromise -- averaging expert and non-expert views rather than appropriately weighting expertise -- which increases with team size and correlates negatively with performance. Interestingly, this consensus-seeking behavior improves robustness to adversarial agents, suggesting a trade-off between alignment and effective expertise utilization. Our findings reveal a significant gap in the ability of self-organizing multi-agent teams to harness the collective expertise of their members.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Declarative Data Services: Structured Agentic Discovery for Composing Data Systems
cs.AI 2026-05 unverdicted novelty 7.0

DDS decomposes agentic data-system composition into bounded sub-searches via intent, operator DAG, per-system skills, and runtime attribution contracts, turning runtime failures into cited skill patches.
Improving the Efficiency of Language Agent Teams with Adaptive Task Graphs
cs.MA 2026-05 unverdicted novelty 7.0

LATTE coordinates LLM agent teams with an evolving shared task graph, cutting token use, time, and failures while matching or beating accuracy of MetaGPT, leader-worker, and static methods.
Emergent Social Intelligence Risks in Generative Multi-Agent Systems
cs.MA 2026-03 unverdicted novelty 5.0

Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.