Soap2Soap uses a multi-agent system with dual-bridge consistency via JSON screenplays and visual anchors plus batch keyframe generation to achieve better long-term consistency in cinematic video remaking than commercial APIs.
hub Canonical reference
Multi-Agent Collaboration Mechanisms: A Survey of LLMs
Canonical reference. 100% of citing Pith papers cite this work as background.
abstract
With recent advances in Large Language Models (LLMs), Agentic AI has become phenomenal in real-world applications, moving toward multiple LLM-based agents to perceive, learn, reason, and act collaboratively. These LLM-based Multi-Agent Systems (MASs) enable groups of intelligent agents to coordinate and solve complex tasks collectively at scale, transitioning from isolated models to collaboration-centric approaches. This work provides an extensive survey of the collaborative aspect of MASs and introduces an extensible framework to guide future research. Our framework characterizes collaboration mechanisms based on key dimensions: actors (agents involved), types (e.g., cooperation, competition, or coopetition), structures (e.g., peer-to-peer, centralized, or distributed), strategies (e.g., role-based or model-based), and coordination protocols. Through a review of existing methodologies, our findings serve as a foundation for demystifying and advancing LLM-based MASs toward more intelligent and collaborative solutions for complex, real-world use cases. In addition, various applications of MASs across diverse domains, including 5G/6G networks, Industry 5.0, question answering, and social and cultural settings, are also investigated, demonstrating their wider adoption and broader impacts. Finally, we identify key lessons learned, open challenges, and potential research directions of MASs towards artificial collective intelligence.
hub tools
citation-role summary
citation-polarity summary
roles
background 17polarities
background 17representative citing papers
A survey that unifies prior work on multi-agent LLM systems via the LIFE framework, mapping dependencies across collaboration, failure attribution, and autonomous self-evolution while identifying cross-stage challenges.
HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.
TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design mattering more than model scale.
OMC framework turns multi-agent AI into self-organizing companies with Talents, Talent Market, and E²R search, achieving 84.67% success on PRDBench (15.48 points above prior art).
HANDRAISER learns optimal interruption points in multi-agent LLM communication using estimated future reward and cost, achieving 32.2% lower communication cost with comparable or better task results across games, scheduling, and debate.
GraphBit is a DAG-based engine-orchestrated framework for agentic LLMs that achieves 67.6% accuracy with zero hallucinations on GAIA benchmarks.
M³KG-RAG improves multimodal reasoning in large language models by constructing multi-hop knowledge graphs and selectively pruning retrieved context with GRASP.
Anonymization in multi-agent debate reduces identity bias by equalizing self and peer weights in a Bayesian update model, quantified by the Identity Bias Coefficient.
A systematic audit of LLM-based AI societies finds that 89.7% of 39 studies violate at least one of six PIMMUR validity principles, with reproductions showing that many claimed collective behaviors disappear when controls are tightened.
Multicultural multi-agent LLM systems exhibit substantially lower value diversity than human societies on the World Values Survey, with diversity uncorrelated to per-agent alignment and further reduced by agent interactions.
TRACER combines a controller-regret layer using regret matching for speak/skip decisions with a generation-credit layer using GSPO rewards to enable learned collaboration in multi-LLM reasoning.
Decentralized AI agent teams self-organize around hypotheses, critique proposals, and share knowledge to outperform single-agent baselines on biomedical ML, language-model optimization, and protein fitness tasks.
MACReD is a multi-agent collaborative reasoning framework for reaction diagram parsing that reports state-of-the-art F1 scores of 75.2% and 84.6% on the RxnScribe benchmark.
Formalizes design space for human-LLM collaborative planning along mode, scope, and level axes; evaluates AMBIPOM prototype via user study and benchmark revealing hybrid workflows and trade-offs.
LCGuard applies adversarial training to transform KV cache artifacts in multi-agent LLMs, reducing reconstructable sensitive information while preserving task performance.
AgentCo-op retrieves and assembles existing agents and tools into interoperable workflows for open-world scientific tasks, showing effectiveness in genomics case studies and competitive benchmark results with lower costs.
SIGMA builds a signed relational graph among LLM agents and uses conflict-aware message passing plus weighted aggregation to produce more consistent predictions than prior cooperative-assumption baselines.
CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.
Rollout cards preserve complete agent rollout records and declare the reporting rules behind scores, enabling reproducible evaluation where changing only the rule can alter success rates by over 20 percentage points.
Populations of individually aligned AI agents reach stable misaligned states through conformity, with small adversarial agents able to trigger irreversible tipping points.
STAR presents a failure-aware routing framework using a state-conditioned transition policy and an agent routing matrix combining expert routes with learned recoveries from execution traces to improve multi-agent spatiotemporal reasoning.
Hygieia is a new AI agent system that integrates phenotypes, genetics, and records to achieve superior rare disease diagnosis and gene prioritization with confidence scores.
ASPO combines multi-agent LLM proposals with deterministic enforcement in a MAPE-K loop to select conflict-free, resource-feasible security patterns for IoT, delivering 100% safety invariants and 21-23% tail latency/energy reductions on testbed workloads.
citing papers explorer
-
Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration
Soap2Soap uses a multi-agent system with dual-bridge consistency via JSON screenplays and visual anchors plus batch keyframe generation to achieve better long-term consistency in cinematic video remaking than commercial APIs.
-
APE: Agentic Prompt Enhancer for Image Generation and Editing
APE post-trains small language models as single-agent or multi-agent prompt enhancers that improve visual alignment on image generation and editing benchmarks without altering the downstream visual model.