A multi-agent framework generates and refines executable physics simulation code from prompts to create world models that enforce physical constraints, claiming superior accuracy and fidelity over video-based alternatives.
hub
Multi-Agent Collaboration Mechanisms: A Survey of LLMs
28 Pith papers cite this work. Polarity classification is still indexing.
abstract
With recent advances in Large Language Models (LLMs), Agentic AI has become phenomenal in real-world applications, moving toward multiple LLM-based agents to perceive, learn, reason, and act collaboratively. These LLM-based Multi-Agent Systems (MASs) enable groups of intelligent agents to coordinate and solve complex tasks collectively at scale, transitioning from isolated models to collaboration-centric approaches. This work provides an extensive survey of the collaborative aspect of MASs and introduces an extensible framework to guide future research. Our framework characterizes collaboration mechanisms based on key dimensions: actors (agents involved), types (e.g., cooperation, competition, or coopetition), structures (e.g., peer-to-peer, centralized, or distributed), strategies (e.g., role-based or model-based), and coordination protocols. Through a review of existing methodologies, our findings serve as a foundation for demystifying and advancing LLM-based MASs toward more intelligent and collaborative solutions for complex, real-world use cases. In addition, various applications of MASs across diverse domains, including 5G/6G networks, Industry 5.0, question answering, and social and cultural settings, are also investigated, demonstrating their wider adoption and broader impacts. Finally, we identify key lessons learned, open challenges, and potential research directions of MASs towards artificial collective intelligence.
hub tools
years
2026 28representative citing papers
HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.
TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design mattering more than model scale.
OMC framework turns multi-agent AI into self-organizing companies with Talents, Talent Market, and E²R search, achieving 84.67% success on PRDBench (15.48 points above prior art).
HANDRAISER learns optimal interruption points in multi-agent LLM communication using estimated future reward and cost, achieving 32.2% lower communication cost with comparable or better task results across games, scheduling, and debate.
CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.
Rollout cards preserve complete agent rollout records and declare the reporting rules behind scores, enabling reproducible evaluation where changing only the rule can alter success rates by over 20 percentage points.
Populations of individually aligned AI agents reach stable misaligned states through conformity, with small adversarial agents able to trigger irreversible tipping points.
Hygieia is a new AI agent system that integrates phenotypes, genetics, and records to achieve superior rare disease diagnosis and gene prioritization with confidence scores.
ASPO combines multi-agent LLM proposals with deterministic enforcement in a MAPE-K loop to select conflict-free, resource-feasible security patterns for IoT, delivering 100% safety invariants and 21-23% tail latency/energy reductions on testbed workloads.
TPGO represents multi-agent systems as graphs of textual parameters and applies group relative optimization to enable self-improvement from execution history.
ETI lets LLM agents infer and track partners' psychological traits (warmth and competence) from histories, cutting payoff loss 45-77% in games and boosting performance 3-29% on MultiAgentBench versus CoT baselines.
AgentGate decomposes routing into action decision and structural grounding stages, allowing small 3B-7B models to dispatch queries competitively on a curated benchmark after targeted fine-tuning.
LLM agent societies develop power-law coordination cascades and intellectual elites through an integration bottleneck that grows with system size.
The survey proposes the LIFE framework to unify fragmented research on collaboration, failure attribution, and self-evolution in LLM multi-agent systems into a progression toward self-organizing intelligence.
STAR is a failure-aware Markovian router that learns recovery transitions from both successful and unsuccessful execution traces to improve multi-agent performance on spatiotemporal benchmarks.
A malicious agent in multi-agent LLM consensus systems can be trained via a surrogate world model and RL to reduce consensus rates and prolong disagreement more effectively than direct prompt attacks.
A survey providing a taxonomy of TEE platforms, an agent-centric threat model, and open challenges for applying confidential computing to secure agentic AI systems.
Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.
EvoAgent is an evolvable LLM agent framework using structured skill learning, user-feedback loops, and hierarchical delegation that boosts GPT5.2 performance by about 28% in real-world trade scenarios under LLM-as-Judge evaluation.
LLMs exhibit higher volatility and excessive action switching than humans in group coordination, failing to stabilize or improve convergence across repeated games.
AIVV deploys LLM agents in a council to semantically validate anomalies in time-series data against natural-language requirements, automating human-in-the-loop verification for autonomous systems.
This survey organizes RL for LLM multi-agent systems into reward families, credit units, and five orchestration sub-decisions, notes the absence of explicit stopping-decision training in its paper pool, and releases a tagged corpus.
The authors propose a conceptual framework integrating stakeholder-LLM alignment methods, social choice-based aggregation for collective decisions, and stakeholder-centric evaluations to achieve fair multi-agent personalization.
citing papers explorer
-
Coding Agent Is Good As World Simulator
A multi-agent framework generates and refines executable physics simulation code from prompts to create world models that enforce physical constraints, claiming superior accuracy and fidelity over video-based alternatives.
-
Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning
HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.
-
TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data
TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design mattering more than model scale.
-
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
OMC framework turns multi-agent AI into self-organizing companies with Talents, Talent Market, and E²R search, achieving 84.67% success on PRDBench (15.48 points above prior art).
-
Learning to Interrupt in Language-based Multi-agent Communication
HANDRAISER learns optimal interruption points in multi-agent LLM communication using estimated future reward and cost, achieving 32.2% lower communication cost with comparable or better task results across games, scheduling, and debate.
-
CHAL: Council of Hierarchical Agentic Language
CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.
-
Rollout Cards: A Reproducibility Standard for Agent Research
Rollout cards preserve complete agent rollout records and declare the reporting rules behind scores, enabling reproducible evaluation where changing only the rule can alter success rates by over 20 percentage points.
-
Conformity Generates Collective Misalignment in AI Agents Societies
Populations of individually aligned AI agents reach stable misaligned states through conformity, with small adversarial agents able to trigger irreversible tipping points.
-
A Versatile AI Agent for Rare Disease Diagnosis and Risk Gene Prioritization
Hygieia is a new AI agent system that integrates phenotypes, genetics, and records to achieve superior rare disease diagnosis and gene prioritization with confidence scores.
-
Self-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systems
ASPO combines multi-agent LLM proposals with deterministic enforcement in a MAPE-K loop to select conflict-free, resource-feasible security patterns for IoT, delivering 100% safety invariants and 21-23% tail latency/energy reductions on testbed workloads.
-
Learning to Evolve: A Self-Improving Framework for Multi-Agent Systems via Textual Parameter Graph Optimization
TPGO represents multi-agent systems as graphs of textual parameters and applies group relative optimization to enable self-improvement from execution history.
-
Explicit Trait Inference for Multi-Agent Coordination
ETI lets LLM agents infer and track partners' psychological traits (warmth and competence) from histories, cutting payoff loss 45-77% in games and boosting performance 3-29% on MultiAgentBench versus CoT baselines.
-
AgentGate: A Lightweight Structured Routing Engine for the Internet of Agents
AgentGate decomposes routing into action decision and structural grounding stages, allowing small 3B-7B models to dispatch queries competitively on a curated benchmark after targeted fine-tuning.
-
Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems
LLM agent societies develop power-law coordination cascades and intellectual elites through an integration bottleneck that grows with system size.
-
Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
The survey proposes the LIFE framework to unify fragmented research on collaboration, failure attribution, and self-evolution in LLM multi-agent systems into a progression toward self-organizing intelligence.
-
STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning
STAR is a failure-aware Markovian router that learns recovery transitions from both successful and unsuccessful execution traces to improve multi-agent performance on spatiotemporal benchmarks.
-
Insider Attacks in Multi-Agent LLM Consensus Systems
A malicious agent in multi-agent LLM consensus systems can be trained via a surrogate world model and RL to reduce consensus rates and prolong disagreement more effectively than direct prompt attacks.
-
When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI
A survey providing a taxonomy of TEE platforms, an agent-centric threat model, and open challenges for applying confidential computing to secure agentic AI systems.
-
Heterogeneous Scientific Foundation Model Collaboration
Eywa enables language-based agentic AI systems to collaborate with specialized scientific foundation models for improved performance on structured data tasks.
-
EvoAgent: An Evolvable Agent Framework with Skill Learning and Multi-Agent Delegation
EvoAgent is an evolvable LLM agent framework using structured skill learning, user-feedback loops, and hierarchical delegation that boosts GPT5.2 performance by about 28% in real-world trade scenarios under LLM-as-Judge evaluation.
-
High Volatility and Action Bias Distinguish LLMs from Humans in Group Coordination
LLMs exhibit higher volatility and excessive action switching than humans in group coordination, failing to stabilize or improve convergence across repeated games.
-
AIVV: Neuro-Symbolic LLM Agent-Integrated Verification and Validation for Trustworthy Autonomous Systems
AIVV deploys LLM agents in a council to semantically validate anomalies in time-series data against natural-language requirements, automating human-in-the-loop verification for autonomous systems.
-
Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces
This survey organizes RL for LLM multi-agent systems into reward families, credit units, and five orchestration sub-decisions, notes the absence of explicit stopping-decision training in its paper pool, and releases a tagged corpus.
-
Fair Agents: Balancing Multistakeholder Alignment in Multi-Agent Personalization Systems
The authors propose a conceptual framework integrating stakeholder-LLM alignment methods, social choice-based aggregation for collective decisions, and stakeholder-centric evaluations to achieve fair multi-agent personalization.
-
Towards Multi-Agent Autonomous Reasoning in Hydrodynamics
A Layer Execution Graph multi-agent system for hydrodynamics achieves 93.6% factual precision and 100% pass rate on 37 queries while degrading gracefully under data loss.
-
Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures
A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.
-
Agentic Microphysics: A Manifesto for Generative AI Safety
The authors introduce agentic microphysics and generative safety to link local agent interactions to population-level risks in agentic AI through a causally explicit framework.
-
Toward Self-Organizing Production Logistics in Circular Factories: A Multi-Agent Approach
A multi-agent architecture with embodied agents, shared semantic knowledge, and dynamic digital twins is proposed to support decentralized, resilient production logistics in circular factories.