pith. sign in

arxiv: 2602.01011 · v4 · pith:KTRRBE4Enew · submitted 2026-02-01 · 💻 cs.MA · cs.AI

Multi-Agent Teams Hold Experts Back

classification 💻 cs.MA cs.AI
keywords teamsexpertperformancecoordinationexpertisemulti-agentratherself-organizing
0
0 comments X
read the original abstract

Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective coordination cannot be fully designed in advance and must instead emerge through interaction. However, most prior work enforces coordination through fixed roles, workflows, or aggregation rules, leaving open the question of how well self-organizing teams perform when coordination is unconstrained. Drawing on organizational psychology, we study whether self-organizing LLM teams achieve strong synergy, where team performance matches or exceeds the best individual member. Across human-inspired and frontier ML benchmarks, we find that -- unlike human teams -- LLM teams consistently fail to match their expert agent's performance, even when explicitly told who the expert is, incurring performance losses of up to 41.1% on ML benchmarks. Decomposing this failure, we show that expert leveraging, rather than identification, is the primary bottleneck. Conversational analysis reveals a tendency toward integrative compromise -- averaging expert and non-expert views rather than appropriately weighting expertise -- which increases with team size and correlates negatively with performance. Interestingly, this consensus-seeking behavior improves robustness to adversarial agents, suggesting a trade-off between alignment and effective expertise utilization. Our findings reveal a significant gap in the ability of self-organizing multi-agent teams to harness the collective expertise of their members.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Declarative Data Services: Structured Agentic Discovery for Composing Data Systems

    cs.AI 2026-05 unverdicted novelty 7.0

    DDS decomposes agentic data-system composition into bounded sub-searches via intent, operator DAG, per-system skills, and runtime attribution contracts, turning runtime failures into cited skill patches.

  2. Improving the Efficiency of Language Agent Teams with Adaptive Task Graphs

    cs.MA 2026-05 unverdicted novelty 7.0

    LATTE coordinates LLM agent teams with an evolving shared task graph, cutting token use, time, and failures while matching or beating accuracy of MetaGPT, leader-worker, and static methods.

  3. Emergent Social Intelligence Risks in Generative Multi-Agent Systems

    cs.MA 2026-03 unverdicted novelty 5.0

    Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.