hub Canonical reference

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla · 2024 · cs.CL · arXiv 2402.01680

Canonical reference. 96% of citing Pith papers cite this work as background.

72 Pith papers citing it

Background 96% of classified citations

open full Pith review browse 72 citing papers arXiv PDF

abstract

Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to the impressive planning and reasoning abilities of LLMs, they have been used as autonomous agents to do many tasks automatically. Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in complex problem-solving and world simulation. To provide the community with an overview of this dynamic field, we present this survey to offer an in-depth discussion on the essential aspects of multi-agent systems based on LLMs, as well as the challenges. Our goal is for readers to gain substantial insights on the following questions: What domains and environments do LLM-based multi-agents simulate? How are these agents profiled and how do they communicate? What mechanisms contribute to the growth of agents' capacities? For those interested in delving into this field of study, we also summarize the commonly used datasets or benchmarks for them to have convenient access. To keep researchers updated on the latest studies, we maintain an open-source GitHub repository, dedicated to outlining the research on LLM-based multi-agent systems.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 22 dataset 1

citation-polarity summary

background 22 use dataset 1

claims ledger

abstract Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to the impressive planning and reasoning abilities of LLMs, they have been used as autonomous agents to do many tasks automatically. Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in complex problem-solving and world simulation. To provide the community with an overview of this dynamic field, we present this survey to offer an in-depth discussion on the essential aspects of multi-age

co-cited works

representative citing papers

Why Do Multi-Agent LLM Systems Fail?

cs.AI · 2025-03-17 · unverdicted · novelty 8.0

The authors create the first large-scale dataset and taxonomy of failure modes in multi-agent LLM systems to explain their limited performance gains.

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems

cs.MA · 2024-10-09 · unverdicted · novelty 8.0

Prompt injection attacks can self-replicate across LLM agents in multi-agent systems, enabling data theft, misinformation, and system disruption while propagating silently.

\textsc{MasFACT}: Continual Multi-Agent Topology Learning via Geometry-Aware Posterior Transfer

cs.LG · 2026-05-17 · unverdicted · novelty 7.0

MasFACT transfers historical topology priors across tasks via Fused Gromov-Wasserstein optimal transport and PAC-Bayes conservative adaptation to reduce topology forgetting in continual multi-agent settings.

Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

TFlow enables multi-agent LLMs to collaborate via transient low-rank LoRA perturbations derived from sender activations, yielding up to 8.5 accuracy gains and 83% token reduction versus text-based baselines on Qwen3-4B models.

Predictive Maps of Multi-Agent Reasoning: A Successor-Representation Spectrum for LLM Communication Topologies

cs.MA · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Successor-representation spectra of row-stochastic communication operators predict perturbation robustness, consensus speed, and error accumulation in multi-agent LLM topologies, with condition number showing perfect empirical rank correlation.

Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

Evolving-RL jointly optimizes experience extraction and utilization in LLM agents via RL with separate evaluation signals, delivering up to 98.7% relative gains on out-of-distribution tasks in ALFWorld and Mind2Web.

Collective Alignment in LLM Multi-Agent Systems: Disentangling Bias from Cooperation via Statistical Physics

cond-mat.stat-mech · 2026-05-11 · unverdicted · novelty 7.0

LLM multi-agent systems on lattices show bias-driven order-disorder crossovers instead of true phase transitions, with extracted effective couplings and fields serving as model-specific fingerprints.

TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

TacoMAS performs test-time co-evolution of agent capabilities and communication topology in LLM multi-agent systems via fast capability updates and slow meta-LLM topology edits, delivering 13.3% average gains over strong baselines on four benchmarks.

TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data

cs.AI · 2026-04-30 · unverdicted · novelty 7.0

TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design mattering more than model scale.

Exploring Agentic Visual Analytics: A Co-Evolutionary Framework of Roles and Workflows

cs.DB · 2026-04-17 · unverdicted · novelty 7.0

A survey of 55 agentic VA systems proposes a co-evolutionary framework defining four agent roles (PLANNER, CREATOR, REVIEWER, CONTEXT MANAGER) mapped to visual analytics pipeline stages along with design guidelines.

WaterAdmin: Orchestrating Community Water Distribution Optimization via AI Agents

cs.LG · 2026-04-11 · unverdicted · novelty 7.0

WaterAdmin uses a bi-level design with LLM agents for dynamic context abstraction and optimization for real-time pump/valve control, achieving better pressure reliability and lower energy use than traditional methods in EPANET simulations of variable community water demands.

SoK: Blockchain Agent-to-Agent Payments

q-fin.GN · 2026-04-04 · unverdicted · novelty 7.0

The first systematization of blockchain-based agent-to-agent payments organizes designs into discovery, authorization, execution, and accounting stages while identifying trust and security gaps.

Detecting Multi-Agent Collusion Through Multi-Agent Interpretability

cs.AI · 2026-04-01 · conditional · novelty 7.0

NARCBench and five activation-probing methods detect multi-agent collusion with 0.73-1.00 AUROC across distribution shifts and steganographic tasks by aggregating per-agent signals.

GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration

cs.AI · 2026-03-08 · unverdicted · novelty 7.0

GraphBit is a DAG-based engine-orchestrated framework for agentic LLMs that achieves 67.6% accuracy with zero hallucinations on GAIA benchmarks.

Agentic Hives: Equilibrium, Indeterminacy, and Endogenous Cycles in Self-Organizing Multi-Agent Systems

cs.MA · 2026-02-23 · unverdicted · novelty 7.0

Agentic Hives apply dynamic general equilibrium theory to variable populations of language-model agents, proving existence of equilibria, Pareto optimality, multiplicity, comparative-statics analogs, Hopf bifurcations, and stability conditions.

GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents

q-bio.QM · 2025-10-14 · unverdicted · novelty 7.0

GenCellAgent deploys a planner-executor-evaluator LLM agent loop to automatically select, adapt, and refine segmentation tools for diverse cellular microscopy images, matching or exceeding specialist performance on 4,718 images across seven benchmarks while handling out-of-distribution and novel-ves

An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications

cs.SE · 2025-09-23 · conditional · novelty 7.0

Empirical study of open-source AI agents shows testing effort concentrates on deterministic tools and workflows (over 70%) while the FM-based plan body gets under 5% and prompts appear in only 1% of tests.

From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems

cs.MA · 2025-06-05 · accept · novelty 7.0

A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.

Automated Design of Agentic Systems

cs.AI · 2024-08-15 · conditional · novelty 7.0

Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.

MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data

cs.LG · 2024-06-15 · unverdicted · novelty 7.0

MALLM-GAN uses multi-agent LLMs to emulate GAN architecture for generating higher-quality synthetic tabular data from small samples than prior models, while preserving privacy.

Self-Refining Topology Optimization via an LLM-Based Multi-Agent Framework

cs.MA · 2026-05-22 · unverdicted · novelty 6.0

TopOptAgents deploys six LLM agents in self-refining loops to automate the full topology optimization workflow and succeeds on problem classes where single LLMs fail.

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

LCGuard applies adversarial training to transform KV cache artifacts in multi-agent LLMs, reducing reconstructable sensitive information while preserving task performance.

LACO: Adaptive Latent Communication for Collaborative Driving

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

LACO introduces Iterative Latent Deliberation, Cross-Horizon Saliency Attribution, and Structured Semantic Knowledge Distillation to enable low-latency latent communication in collaborative driving while preserving performance in CARLA simulations.

BLAgent: Agentic RAG for File-Level Bug Localization

cs.SE · 2026-05-18 · unverdicted · novelty 6.0

BLAgent achieves over 78% Top-1 accuracy on SWE-bench Lite for file-level bug localization using agentic RAG, at 18x lower cost than baselines, and boosts end-to-end APR success by over 20%.

citing papers explorer

Showing 50 of 72 citing papers.

Why Do Multi-Agent LLM Systems Fail? cs.AI · 2025-03-17 · unverdicted · none · ref 18 · internal anchor
The authors create the first large-scale dataset and taxonomy of failure modes in multi-agent LLM systems to explain their limited performance gains.
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems cs.MA · 2024-10-09 · unverdicted · none · ref 59 · internal anchor
Prompt injection attacks can self-replicate across LLM agents in multi-agent systems, enabling data theft, misinformation, and system disruption while propagating silently.
\textsc{MasFACT}: Continual Multi-Agent Topology Learning via Geometry-Aware Posterior Transfer cs.LG · 2026-05-17 · unverdicted · none · ref 10 · internal anchor
MasFACT transfers historical topology priors across tasks via Fused Gromov-Wasserstein optimal transport and PAC-Bayes conservative adaptation to reduce topology forgetting in continual multi-agent settings.
Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights cs.CL · 2026-05-13 · unverdicted · none · ref 6 · internal anchor
TFlow enables multi-agent LLMs to collaborate via transient low-rank LoRA perturbations derived from sender activations, yielding up to 8.5 accuracy gains and 83% token reduction versus text-based baselines on Qwen3-4B models.
Predictive Maps of Multi-Agent Reasoning: A Successor-Representation Spectrum for LLM Communication Topologies cs.MA · 2026-05-12 · unverdicted · none · ref 8 · 2 links · internal anchor
Successor-representation spectra of row-stochastic communication operators predict perturbation robustness, consensus speed, and error accumulation in multi-agent LLM topologies, with condition number showing perfect empirical rank correlation.
Evolving-RL: End-to-End Optimization of Experience-Driven Self-Evolving Capability within Agents cs.AI · 2026-05-11 · unverdicted · none · ref 8 · internal anchor
Evolving-RL jointly optimizes experience extraction and utilization in LLM agents via RL with separate evaluation signals, delivering up to 98.7% relative gains on out-of-distribution tasks in ALFWorld and Mind2Web.
Collective Alignment in LLM Multi-Agent Systems: Disentangling Bias from Cooperation via Statistical Physics cond-mat.stat-mech · 2026-05-11 · unverdicted · none · ref 1 · internal anchor
LLM multi-agent systems on lattices show bias-driven order-disorder crossovers instead of true phase transitions, with extracted effective couplings and fields serving as model-specific fingerprints.
TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems cs.CL · 2026-05-10 · unverdicted · none · ref 37 · internal anchor
TacoMAS performs test-time co-evolution of agent capabilities and communication topology in LLM multi-agent systems via fast capability updates and slow meta-LLM topology edits, delivering 13.3% average gains over strong baselines on four benchmarks.
TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data cs.AI · 2026-04-30 · unverdicted · none · ref 16 · internal anchor
TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design mattering more than model scale.
Exploring Agentic Visual Analytics: A Co-Evolutionary Framework of Roles and Workflows cs.DB · 2026-04-17 · unverdicted · none · ref 14 · internal anchor
A survey of 55 agentic VA systems proposes a co-evolutionary framework defining four agent roles (PLANNER, CREATOR, REVIEWER, CONTEXT MANAGER) mapped to visual analytics pipeline stages along with design guidelines.
WaterAdmin: Orchestrating Community Water Distribution Optimization via AI Agents cs.LG · 2026-04-11 · unverdicted · none · ref 14 · internal anchor
WaterAdmin uses a bi-level design with LLM agents for dynamic context abstraction and optimization for real-time pump/valve control, achieving better pressure reliability and lower energy use than traditional methods in EPANET simulations of variable community water demands.
SoK: Blockchain Agent-to-Agent Payments q-fin.GN · 2026-04-04 · unverdicted · none · ref 8 · internal anchor
The first systematization of blockchain-based agent-to-agent payments organizes designs into discovery, authorization, execution, and accounting stages while identifying trust and security gaps.
Detecting Multi-Agent Collusion Through Multi-Agent Interpretability cs.AI · 2026-04-01 · conditional · none · ref 8 · internal anchor
NARCBench and five activation-probing methods detect multi-agent collusion with 0.73-1.00 AUROC across distribution shifts and steganographic tasks by aggregating per-agent signals.
GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration cs.AI · 2026-03-08 · unverdicted · none · ref 6 · internal anchor
GraphBit is a DAG-based engine-orchestrated framework for agentic LLMs that achieves 67.6% accuracy with zero hallucinations on GAIA benchmarks.
Agentic Hives: Equilibrium, Indeterminacy, and Endogenous Cycles in Self-Organizing Multi-Agent Systems cs.MA · 2026-02-23 · unverdicted · none · ref 12 · internal anchor
Agentic Hives apply dynamic general equilibrium theory to variable populations of language-model agents, proving existence of equilibria, Pareto optimality, multiplicity, comparative-statics analogs, Hopf bifurcations, and stability conditions.
GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents q-bio.QM · 2025-10-14 · unverdicted · none · ref 32 · internal anchor
GenCellAgent deploys a planner-executor-evaluator LLM agent loop to automatically select, adapt, and refine segmentation tools for diverse cellular microscopy images, matching or exceeding specialist performance on 4,718 images across seven benchmarks while handling out-of-distribution and novel-ves
An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications cs.SE · 2025-09-23 · conditional · none · ref 16 · internal anchor
Empirical study of open-source AI agents shows testing effort concentrates on deterministic tools and workflows (over 70%) while the FM-based plan body gets under 5% and prompts appear in only 1% of tests.
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems cs.MA · 2025-06-05 · accept · none · ref 54 · internal anchor
A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.
Automated Design of Agentic Systems cs.AI · 2024-08-15 · conditional · none · ref 25 · internal anchor
Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.
MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data cs.LG · 2024-06-15 · unverdicted · none · ref 13 · internal anchor
MALLM-GAN uses multi-agent LLMs to emulate GAN architecture for generating higher-quality synthetic tabular data from small samples than prior models, while preserving privacy.
Self-Refining Topology Optimization via an LLM-Based Multi-Agent Framework cs.MA · 2026-05-22 · unverdicted · none · ref 21 · internal anchor
TopOptAgents deploys six LLM agents in self-refining loops to automate the full topology optimization workflow and succeeds on problem classes where single LLMs fail.
LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems cs.AI · 2026-05-21 · unverdicted · none · ref 18 · internal anchor
LCGuard applies adversarial training to transform KV cache artifacts in multi-agent LLMs, reducing reconstructable sensitive information while preserving task performance.
LACO: Adaptive Latent Communication for Collaborative Driving cs.AI · 2026-05-21 · unverdicted · none · ref 12 · internal anchor
LACO introduces Iterative Latent Deliberation, Cross-Horizon Saliency Attribution, and Structured Semantic Knowledge Distillation to enable low-latency latent communication in collaborative driving while preserving performance in CARLA simulations.
BLAgent: Agentic RAG for File-Level Bug Localization cs.SE · 2026-05-18 · unverdicted · none · ref 11 · internal anchor
BLAgent achieves over 78% Top-1 accuracy on SWE-bench Lite for file-level bug localization using agentic RAG, at 18x lower cost than baselines, and boosts end-to-end APR success by over 20%.
LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning cs.AI · 2026-05-14 · unverdicted · none · ref 4 · internal anchor
LEMON trains an LLM orchestrator with counterfactual-augmented GRPO to produce deployable multi-agent specifications that reach state-of-the-art results on six reasoning and coding benchmarks.
CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution cs.CL · 2026-05-13 · unverdicted · none · ref 9 · internal anchor
CANTANTE uses contrastive rollouts to attribute system rewards to individual agents, enabling better prompt optimization than prior methods on programming, math, and QA benchmarks.
CHAL: Council of Hierarchical Agentic Language cs.AI · 2026-05-12 · unverdicted · none · ref 66 · internal anchor
CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.
Deterministic vs. LLM-Controlled Orchestration for COBOL-to-Python Modernization cs.SE · 2026-05-11 · conditional · none · ref 3 · internal anchor
Deterministic orchestration matches LLM-controlled methods in COBOL-to-Python translation accuracy but improves worst-case robustness, reduces run-to-run variability, and cuts token consumption by up to 3.5 times.
Why Does Agentic Safety Fail to Generalize Across Tasks? cs.LG · 2026-05-07 · conditional · none · ref 45 · internal anchor
Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstrated in quadcopter and LLM experiments.
When Stress Becomes Signal: Detecting Antifragility-Compatible Regimes in Multi-Agent LLM Systems cs.MA · 2026-05-04 · unverdicted · none · ref 14 · 2 links · internal anchor
CAFE finds positive distributional Jensen Gaps across five multi-agent LLM architectures under semantic stress, showing that quality drops can coexist with detectable stress geometry compatible with antifragile learning.
Multi-Agent Empowerment and Emergence of Complex Behavior in Groups cs.AI · 2026-04-22 · unverdicted · none · ref 8 · internal anchor
A multi-agent extension of empowerment produces emergent group organizations in tendon-coupled agent pairs and controllable Vicsek flocks.
Mol-Debate: Multi-Agent Debate Improves Structural Reasoning in Molecular Design cs.AI · 2026-04-22 · unverdicted · none · ref 23 · internal anchor
Mol-Debate applies multi-agent debate in an iterative loop with perspective orchestration to achieve state-of-the-art text-guided molecular design, scoring 59.82% exact match on ChEBI-20 and 50.52% weighted success on S2-Bench.
ThreadSumm: Summarization of Nested Discourse Threads Using Tree of Thoughts cs.CL · 2026-04-19 · unverdicted · none · ref 2 · internal anchor
ThreadSumm improves structured summarization of nested discourse threads by combining LLM-based aspect and content unit extraction with sentence ordering and Tree of Thoughts search for better coherence and opinion coverage.
GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning cs.AI · 2026-04-18 · unverdicted · none · ref 16 · internal anchor
GraphDC applies divide-and-conquer multi-agent LLM reasoning to graph algorithms by decomposing graphs into subgraphs for local agents and integrating via a master agent, outperforming direct methods especially on large scales.
Agentic Frameworks for Reasoning Tasks: An Empirical Study cs.AI · 2026-04-17 · unverdicted · none · ref 43 · internal anchor
An empirical evaluation of 22 agentic frameworks on BBH, GSM8K, and ARC benchmarks shows stable performance in 12 frameworks but highlights orchestration failures and weaker mathematical reasoning.
AgentClick: A Skill-Based Human-in-the-Loop Review Layer for Terminal AI Agents cs.HC · 2026-04-15 · unverdicted · none · ref 5 · internal anchor
AgentClick is a localhost npm server and skill-based plugin that connects terminal AI agents to a structured web UI for human review of plans, code execution, memory, and errors.
Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models cs.AI · 2026-04-14 · unverdicted · none · ref 14 · internal anchor
Large language models display three universal scale-dependent regimes of behavior—stable, chaotic, and signal-dominated—driven by floating-point rounding errors that produce an avalanche effect in early layers.
In-situ process monitoring for defect detection in wire-arc additive manufacturing: an agentic AI approach cs.AI · 2026-04-10 · unverdicted · none · ref 84 · internal anchor
A multi-agent AI framework using processing and acoustic agents achieves 91.6% accuracy and 0.821 F1 score for in-situ porosity defect detection in wire-arc additive manufacturing.
Heterogeneous Consensus-Progressive Reasoning for Efficient Multi-Agent Debate cs.MA · 2026-04-03 · unverdicted · none · ref 6 · internal anchor
HCP-MAD reduces token costs in multi-agent debates by using heterogeneous consensus verification, adaptive pair-agent stopping, and escalated collective voting based on task complexity signals.
Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems cs.MA · 2026-04-03 · unverdicted · none · ref 23 · internal anchor
LLM agent societies develop power-law coordination cascades and intellectual elites through an integration bottleneck that grows with system size.
From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration cs.MA · 2026-03-04 · unverdicted · none · ref 16 · internal anchor
A graph-based propagation model for error cascades in LLM multi-agent systems plus a genealogy-graph governance plugin that prevents final infection in at least 89% of runs across tested frameworks.
SoK: Agentic Skills -- Beyond Tool Use in LLM Agents cs.CR · 2026-02-24 · unverdicted · none · ref 18 · internal anchor
The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.
When Identity Overrides Incentives: Representational Choices as Governance Decisions in Multi-Agent LLM Systems cs.MA · 2026-01-15 · unverdicted · none · ref 22 · internal anchor
Role-based personas in multi-agent LLM systems suppress payoff-aligned behavior, shifting equilibrium selection by up to 90 percentage points in Tragedy of the Commons versus Green Transition scenarios even with full payoff information.
BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks cs.AI · 2025-08-11 · unverdicted · none · ref 5 · internal anchor
BlindGuard introduces an unsupervised hierarchical agent encoder plus corruption-guided contrastive detector that identifies malicious agents in LLM-based multi-agent systems without any attack labels or prior knowledge of malicious behaviors.
EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair cs.SE · 2025-06-12 · conditional · none · ref 14 · internal anchor
ExpeRepair improves LLM-based repository-level program repair by maintaining episodic memory of concrete fixes and semantic memory of abstract insights, reaching 60.3% and 74.6% pass@1 on SWE-Bench Lite and Verified.
CultivAgents: Cultivating Relationship-Centered Multi-Agent Systems for Personalized Gardening cs.HC · 2026-05-22 · unverdicted · none · ref 12 · internal anchor
Presents CultivAgents, a relationship-centered multi-agent system for socio-culturally grounded gardening support, with a mixed-methods evaluation showing modest gains in gardener confidence and motivation.
Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines cs.AI · 2026-05-20 · unverdicted · none · ref 22 · internal anchor
Temporal semantic caching and MCP workflow optimizations deliver 30.6x median speedup on cache hits and 1.67x overall speedup with 40% latency reduction on the AssetOpsBench industrial agent benchmark.
Is a team only as strong as its weakest link? Quantifying the short-board effect with AI Agents physics.soc-ph · 2026-05-08 · unverdicted · none · ref 20 · internal anchor
LLM multi-agent simulations reveal a cumulative product effect from multiple weak links on team performance and identify distinct capability regimes including a Sisyphus predicament.
AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories cs.AI · 2026-04-21 · unverdicted · none · ref 11 · internal anchor
AblateCell reproduces baselines in three single-cell perturbation repositories with 88.9% success and recovers ground-truth critical components with 93.3% accuracy via closed-loop ablation.
WebMAC: A Multi-Agent Collaborative Framework for Scenario Testing of Web Systems cs.SE · 2026-04-15 · unverdicted · none · ref 6 · internal anchor
WebMAC uses three specialized multi-agent modules to clarify test scenarios, partition them for adequacy, and generate executable scripts, yielding 30-60% higher success rates and 29% better efficiency than SOTA on four web systems.

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer