hub Canonical reference

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla · 2024 · cs.CL · arXiv 2402.01680

Canonical reference. 96% of citing Pith papers cite this work as background.

80 Pith papers citing it

Background 96% of classified citations

open full Pith review browse 80 citing papers arXiv PDF

abstract

Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to the impressive planning and reasoning abilities of LLMs, they have been used as autonomous agents to do many tasks automatically. Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in complex problem-solving and world simulation. To provide the community with an overview of this dynamic field, we present this survey to offer an in-depth discussion on the essential aspects of multi-agent systems based on LLMs, as well as the challenges. Our goal is for readers to gain substantial insights on the following questions: What domains and environments do LLM-based multi-agents simulate? How are these agents profiled and how do they communicate? What mechanisms contribute to the growth of agents' capacities? For those interested in delving into this field of study, we also summarize the commonly used datasets or benchmarks for them to have convenient access. To keep researchers updated on the latest studies, we maintain an open-source GitHub repository, dedicated to outlining the research on LLM-based multi-agent systems.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 23 dataset 1

citation-polarity summary

background 23 use dataset 1

claims ledger

abstract Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to the impressive planning and reasoning abilities of LLMs, they have been used as autonomous agents to do many tasks automatically. Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in complex problem-solving and world simulation. To provide the community with an overview of this dynamic field, we present this survey to offer an in-depth discussion on the essential aspects of multi-age

co-cited works

representative citing papers

Why Do Multi-Agent LLM Systems Fail?

cs.AI · 2025-03-17 · unverdicted · novelty 8.0

The authors create the first large-scale dataset and taxonomy of failure modes in multi-agent LLM systems to explain their limited performance gains.

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

cs.MA · 2024-10-09 · unverdicted · novelty 8.0

Prompt injection attacks can self-replicate across LLM agents in multi-agent systems, enabling data theft, misinformation, and system disruption while propagating silently.

COHORT: Collaborative Orchestration for Hardening via Offensive Replay on Emulated Topologies

cs.NI · 2026-06-29 · unverdicted · novelty 7.0

COHORT automates mitigation generation for network attacks via collaborative LLMs on emulated topologies with offensive replay evaluation, reporting 46.7% success rate that is 4.4 times higher than a single-agent baseline.

CV-Arena: An Open Benchmark for Instructional Computer Vision Problem Solving with Human-AI Collaborative Preferences

cs.CV · 2026-05-30 · unverdicted · novelty 7.0

CV-Arena is a new 12K-pair benchmark for instruction-guided real-image editing with 16 task types, CogRetriever curation, and Active Elo mixed human-AI evaluation that finds gaps in 21 models and presents CV-Agent.

\textsc{MasFACT}: Continual Multi-Agent Topology Learning via Geometry-Aware Posterior Transfer

cs.LG · 2026-05-17 · unverdicted · novelty 7.0

MasFACT transfers historical topology priors across tasks via Fused Gromov-Wasserstein optimal transport and PAC-Bayes conservative adaptation to reduce topology forgetting in continual multi-agent settings.

Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

TFlow enables multi-agent LLMs to collaborate via transient low-rank LoRA perturbations derived from sender activations, yielding up to 8.5 accuracy gains and 83% token reduction versus text-based baselines on Qwen3-4B models.

Predictive Maps of Multi-Agent Reasoning: A Successor-Representation Spectrum for LLM Communication Topologies

cs.MA · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Successor-representation spectra of row-stochastic communication operators predict perturbation robustness, consensus speed, and error accumulation in multi-agent LLM topologies, with condition number showing perfect empirical rank correlation.

Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

Evolving-RL jointly optimizes experience extraction and utilization in LLM agents via RL with separate evaluation signals, delivering up to 98.7% relative gains on out-of-distribution tasks in ALFWorld and Mind2Web.

Collective Alignment in LLM Multi-Agent Systems: Disentangling Bias from Cooperation via Statistical Physics

cond-mat.stat-mech · 2026-05-11 · unverdicted · novelty 7.0

LLM multi-agent systems on lattices show bias-driven order-disorder crossovers instead of true phase transitions, with extracted effective couplings and fields serving as model-specific fingerprints.

TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

TacoMAS performs test-time co-evolution of agent capabilities and communication topology in LLM multi-agent systems via fast capability updates and slow meta-LLM topology edits, delivering 13.3% average gains over strong baselines on four benchmarks.

TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data

cs.AI · 2026-04-30 · unverdicted · novelty 7.0

TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design mattering more than model scale.

Exploring Agentic Visual Analytics: A Co-Evolutionary Framework of Roles and Workflows

cs.DB · 2026-04-17 · unverdicted · novelty 7.0

A survey of 55 agentic VA systems proposes a co-evolutionary framework defining four agent roles (PLANNER, CREATOR, REVIEWER, CONTEXT MANAGER) mapped to visual analytics pipeline stages along with design guidelines.

WaterAdmin: Orchestrating Community Water Distribution Optimization via AI Agents

cs.LG · 2026-04-11 · unverdicted · novelty 7.0

WaterAdmin uses a bi-level design with LLM agents for dynamic context abstraction and optimization for real-time pump/valve control, achieving better pressure reliability and lower energy use than traditional methods in EPANET simulations of variable community water demands.

SoK: Blockchain Agent-to-Agent Payments

q-fin.GN · 2026-04-04 · unverdicted · novelty 7.0

The first systematization of blockchain-based agent-to-agent payments organizes designs into discovery, authorization, execution, and accounting stages while identifying trust and security gaps.

Detecting Multi-Agent Collusion Through Multi-Agent Interpretability

cs.AI · 2026-04-01 · conditional · novelty 7.0

NARCBench and five activation-probing methods detect multi-agent collusion with 0.73-1.00 AUROC across distribution shifts and steganographic tasks by aggregating per-agent signals.

GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration

cs.AI · 2026-03-08 · unverdicted · novelty 7.0

GraphBit is a DAG-based engine-orchestrated framework for agentic LLMs that achieves 67.6% accuracy with zero hallucinations on GAIA benchmarks.

Agentic Hives: Equilibrium, Indeterminacy, and Endogenous Cycles in Self-Organizing Multi-Agent Systems

cs.MA · 2026-02-23 · unverdicted · novelty 7.0

Agentic Hives apply dynamic general equilibrium theory to variable populations of language-model agents, proving existence of equilibria, Pareto optimality, multiplicity, comparative-statics analogs, Hopf bifurcations, and stability conditions.

GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents

q-bio.QM · 2025-10-14 · unverdicted · novelty 7.0

GenCellAgent deploys a planner-executor-evaluator LLM agent loop to automatically select, adapt, and refine segmentation tools for diverse cellular microscopy images, matching or exceeding specialist performance on 4,718 images across seven benchmarks while handling out-of-distribution and novel-ves

An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications

cs.SE · 2025-09-23 · conditional · novelty 7.0

Empirical study of open-source AI agents shows testing effort concentrates on deterministic tools and workflows (over 70%) while the FM-based plan body gets under 5% and prompts appear in only 1% of tests.

From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems

cs.MA · 2025-06-05 · accept · novelty 7.0

A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.

Automated Design of Agentic Systems

cs.AI · 2024-08-15 · conditional · novelty 7.0

Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.

MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data

cs.LG · 2024-06-15 · unverdicted · novelty 7.0

MALLM-GAN uses multi-agent LLMs to emulate GAN architecture for generating higher-quality synthetic tabular data from small samples than prior models, while preserving privacy.

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

cs.AI · 2026-05-26 · unverdicted · novelty 6.0

Multi-agent social simulations show LLM privacy violations rising from 19.95% to 45.30%, with leakage spreading contagiously (8x after peer disclosure) and explicit instructions leaving rates above 37.8%.

Autonomic Federated-Market Orchestration for the Edge-Cloud Continuum

cs.DC · 2026-05-26 · unverdicted · novelty 6.0

Neural Pub/Sub uses a MAPE-K loop with Walrasian price signals on service DAGs to achieve autonomic federated orchestration that matches centralized welfare under gross-substitutes assumptions and outperforms baselines in small-scale experiments.

citing papers explorer

Showing 6 of 6 citing papers after filters.

Detecting Multi-Agent Collusion Through Multi-Agent Interpretability cs.AI · 2026-04-01 · conditional · none · ref 8 · internal anchor
NARCBench and five activation-probing methods detect multi-agent collusion with 0.73-1.00 AUROC across distribution shifts and steganographic tasks by aggregating per-agent signals.
An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications cs.SE · 2025-09-23 · conditional · none · ref 16 · internal anchor
Empirical study of open-source AI agents shows testing effort concentrates on deterministic tools and workflows (over 70%) while the FM-based plan body gets under 5% and prompts appear in only 1% of tests.
Automated Design of Agentic Systems cs.AI · 2024-08-15 · conditional · none · ref 25 · internal anchor
Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.
Deterministic vs. LLM-Controlled Orchestration for COBOL-to-Python Modernization cs.SE · 2026-05-11 · conditional · none · ref 3 · internal anchor
Deterministic orchestration matches LLM-controlled methods in COBOL-to-Python translation accuracy but improves worst-case robustness, reduces run-to-run variability, and cuts token consumption by up to 3.5 times.
Why Does Agentic Safety Fail to Generalize Across Tasks? cs.LG · 2026-05-07 · conditional · none · ref 45 · internal anchor
Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstrated in quadcopter and LLM experiments.
EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair cs.SE · 2025-06-12 · conditional · none · ref 14 · internal anchor
ExpeRepair improves LLM-based repository-level program repair by maintaining episodic memory of concrete fixes and semantic memory of abstract insights, reaching 60.3% and 74.6% pass@1 on SWE-Bench Lite and Verified.

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer