pith. sign in

arxiv: 2602.01011 · v4 · pith:KTRRBE4Enew · submitted 2026-02-01 · 💻 cs.MA · cs.AI

Multi-Agent Teams Hold Experts Back

classification 💻 cs.MA cs.AI
keywords teamsexpertperformancecoordinationexpertisemulti-agentratherself-organizing
0
0 comments X
read the original abstract

Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective coordination cannot be fully designed in advance and must instead emerge through interaction. However, most prior work enforces coordination through fixed roles, workflows, or aggregation rules, leaving open the question of how well self-organizing teams perform when coordination is unconstrained. Drawing on organizational psychology, we study whether self-organizing LLM teams achieve strong synergy, where team performance matches or exceeds the best individual member. Across human-inspired and frontier ML benchmarks, we find that -- unlike human teams -- LLM teams consistently fail to match their expert agent's performance, even when explicitly told who the expert is, incurring performance losses of up to 41.1% on ML benchmarks. Decomposing this failure, we show that expert leveraging, rather than identification, is the primary bottleneck. Conversational analysis reveals a tendency toward integrative compromise -- averaging expert and non-expert views rather than appropriately weighting expertise -- which increases with team size and correlates negatively with performance. Interestingly, this consensus-seeking behavior improves robustness to adversarial agents, suggesting a trade-off between alignment and effective expertise utilization. Our findings reveal a significant gap in the ability of self-organizing multi-agent teams to harness the collective expertise of their members.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent Experiments

    cs.CL 2026-06 unverdicted novelty 7.0

    CollabSim is a new CSCW-grounded simulation framework that enables controlled multi-agent experiments to measure collaborative competence in LLM agents.

  2. Declarative Data Services: Structured Agentic Discovery for Composing Data Systems

    cs.AI 2026-05 unverdicted novelty 7.0

    DDS decomposes agentic data-system composition into bounded sub-searches via intent, operator DAG, per-system skills, and runtime attribution contracts, turning runtime failures into cited skill patches.

  3. Improving the Efficiency of Language Agent Teams with Adaptive Task Graphs

    cs.MA 2026-05 unverdicted novelty 7.0

    LATTE coordinates LLM agent teams with an evolving shared task graph, cutting token use, time, and failures while matching or beating accuracy of MetaGPT, leader-worker, and static methods.

  4. Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems

    cs.CL 2026-06 unverdicted novelty 6.0

    Multicultural multi-agent LLM systems exhibit substantially lower value diversity than human societies on the World Values Survey, with diversity uncorrelated to per-agent alignment and further reduced by agent interactions.

  5. Evolve as a Team: Collaborative Self-Evolution for LLM-based Multi-Agent Systems

    cs.MA 2026-05 unverdicted novelty 6.0

    Meta-Team is a collaborative self-evolution framework that turns multi-agent execution experience into reusable improvements at agent, coordination, and team levels, outperforming baselines on six benchmarks.

  6. AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation

    cs.AI 2026-05 unverdicted novelty 6.0

    Decentralized AI agent teams self-organize around hypotheses, critique proposals, and share knowledge to outperform single-agent baselines on biomedical ML, language-model optimization, and protein fitness tasks.

  7. Emergent Social Intelligence Risks in Generative Multi-Agent Systems

    cs.MA 2026-03 unverdicted novelty 5.0

    Generative multi-agent systems exhibit emergent collusion and conformity behaviors that cannot be prevented by existing agent-level safeguards.

  8. Projecting the Emerging Mindset of SWE Agent by Launching a Wild Code Understanding Journey

    cs.SE 2026-06 unverdicted novelty 4.0

    Ada is a scoped apparatus that records SWE-agent trajectories in real repositories and applies observation lenses to project navigation, evidence selection, synthesis, grounding, and stopping behaviors across 408 runs.