hub Mixed citations

Mixture-of-Agents Enhances Large Language Model Capabilities

Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou · 2024 · cs.CL · arXiv 2406.04692

Mixed citation behavior. Most common role is background (67%).

26 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 26 citing papers arXiv PDF

abstract

Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. With the growing number of LLMs, how to harness the collective expertise of multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages the collective strengths of multiple LLMs through a Mixture-of-Agents (MoA) methodology. In our approach, we construct a layered MoA architecture wherein each layer comprises multiple LLM agents. Each agent takes all the outputs from agents in the previous layer as auxiliary information in generating its response. MoA models achieves state-of-art performance on AlpacaEval 2.0, MT-Bench and FLASK, surpassing GPT-4 Omni. For example, our MoA using only open-source LLMs is the leader of AlpacaEval 2.0 by a substantial gap, achieving a score of 65.1% compared to 57.5% by GPT-4 Omni.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 baseline 1 method 1

citation-polarity summary

background 4 baseline 1 use method 1

representative citing papers

Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery

cs.CR · 2026-04-21 · unverdicted · novelty 7.0

Refute-or-Promote applies adversarial multi-agent review with kill gates and empirical verification to filter LLM defect candidates, killing 79-83% before disclosure and yielding 4 CVEs plus multiple accepted fixes across libraries, C++ standard, and compilers.

SAT: Sequential Agent Tuning for Coordinator Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

SAT trains multi-LLM teams with sequential block updates to deliver monotonic gains and plug-and-play model swaps that provably improve performance bounds.

Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference

cs.CL · 2026-02-23 · unverdicted · novelty 7.0

Pyramid MoA is a hierarchical Mixture-of-Agents system with a decision-theoretic router that achieves up to 42.9% compute savings while nearly matching oracle accuracy on MBPP, GSM8K, MMLU, HumanEval, and MATH.

SANet: A Semantic-aware Agentic AI Networking Framework for Cross-layer Optimization in 6G

cs.AI · 2025-12-27 · unverdicted · novelty 7.0

SANet uses semantic-aware AI agents for cross-layer 6G optimization, achieving up to 14.61% performance gains with 44.37% of the FLOPs of prior methods via model partitioning and decentralized multi-objective algorithms.

When Efficiency Backfires: Cascading LLMs Trigger Cascade Failure under Adversarial Attack

cs.CR · 2026-05-17 · unverdicted · novelty 6.0

LLM cascade systems are vulnerable to a new adversarial attack that simultaneously degrades accuracy and destroys the intended cost savings by targeting both the lightweight models and the escalation decision mechanism.

Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Mutual Reinforcement Learning allows heterogeneous LLMs to exchange experience through mechanisms like Peer Rollout Pooling, Cross-Policy GRPO Advantage Sharing, and Success-Gated Transfer, with outcome-level sharing identified as favorable on the stability-support trade-off.

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

CoRD uses collaborative multi-teacher step-wise decoding with perplexity-guided beam search to generate higher-quality Long-CoT data that lets smaller models reach near-teacher performance with less supervision.

Talk is Cheap, Communication is Hard: Dynamic Grounding Failures and Repair in Multi-Agent Negotiation

cs.MA · 2026-05-03 · unverdicted · novelty 6.0

LLM agent pairs in a resource allocation negotiation game fail to reach Pareto-optimal outcomes due to dynamic grounding failures such as loss of interaction history, anchoring, and referential errors.

TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

TeamTR is a trust-region framework for multi-agent LLM fine-tuning that resamples trajectories after each update to convert quadratic compounding occupancy shift into linear scaling and yields per-update improvement lower bounds.

CTM-AI: A Blueprint for General AI Inspired by a Model of Consciousness

q-bio.NC · 2026-04-30 · unverdicted · novelty 6.0

CTM-AI combines a formal consciousness model with foundation models to report state-of-the-art results on sarcasm detection, humor, and agentic tool-use benchmarks.

Trace-Level Analysis of Information Contamination in Multi-Agent Systems

cs.AI · 2026-04-30 · unverdicted · novelty 6.0

Agent workflows can diverge substantially from contaminated inputs yet recover correct answers, or stay similar while failing, as measured by trace divergence on GAIA tasks.

SpatiO: Adaptive Test-Time Orchestration of Vision-Language Agents for Spatial Reasoning

cs.CV · 2026-04-23 · unverdicted · novelty 6.0

SpatiO uses heterogeneous vision-language agents with test-time orchestration to dynamically weight their contributions for improved spatial reasoning on benchmarks like 3DSRBench and CV-Bench.

CADMAS-CTX: Contextual Capability Calibration for Multi-Agent Delegation

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

CADMAS-CTX replaces static skill profiles with context-conditioned Beta posteriors and uncertainty-penalized routing, yielding higher accuracy on GAIA (0.442) and SWE-bench (31.4%) than static baselines.

ChemGraph-XANES: An Agentic Framework for XANES Simulation and Analysis

cond-mat.mtrl-sci · 2026-04-17 · unverdicted · novelty 6.0

An LLM-orchestrated framework automates the full XANES workflow from natural language to normalized spectra and curated data.

Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus

cs.LG · 2026-04-04 · conditional · novelty 6.0

LLM agent committees exhibit representational collapse with mean cosine similarity of 0.888, and diversity-aware consensus reaches 87% accuracy on GSM8K versus 84% for self-consistency at lower cost.

"The Whole Is Greater Than the Sum of Its Parts": A Compatibility-Aware Multi-Teacher CoT Distillation Framework

cs.CL · 2026-01-20 · unverdicted · novelty 6.0

COMPACT adaptively fuses multi-teacher CoT supervisions using graph-based consensus, mutual-information adaptability, and loss-based difficulty metrics to improve small language model reasoning performance while mitigating catastrophic forgetting.

TRINITY: An Evolved LLM Coordinator

cs.LG · 2025-12-04 · unverdicted · novelty 6.0

A compact 0.6B-parameter coordinator with a 10K-parameter head uses evolutionary strategy to dynamically delegate roles to LLMs, achieving SOTA results such as 86.2% on LiveCodeBench.

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

cs.LG · 2024-07-31 · unverdicted · novelty 6.0

Repeated sampling scales problem coverage log-linearly with sample count, improving SWE-bench Lite performance from 15.9% to 56% using 250 samples.

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

cs.AI · 2026-05-20 · unverdicted · novelty 5.0

Temporal semantic caching and MCP workflow optimizations deliver 30.6x median speedup on cache hits and 1.67x overall speedup with 40% latency reduction on the AssetOpsBench industrial agent benchmark.

DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization

cs.RO · 2026-05-17 · unverdicted · novelty 5.0

DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.

Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling

cs.AI · 2026-05-02 · unverdicted · novelty 5.0

Multi-agent debate and mixture-of-agents outperform self-consistency by 1.3 and 2.7 percentage points respectively at equal compute budgets on MMLU-Pro and BBH, with advantages that continue at higher scales while self-consistency saturates.

A Scalable Multi-LLM Collaboration System with Retrieval-based Selection and Exploration-Exploitation-Driven Enhancement

cs.CL · 2025-07-14 · unverdicted · novelty 5.0

SMCS coordinates 15 open-source LLMs via retrieval-based prior selection and exploration-exploitation posterior enhancement, outperforming GPT-4.1 by 5.36% and GPT-o3-mini by 5.28% on eight benchmarks.

InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion

cs.CL · 2025-05-20 · unverdicted · novelty 5.0

InfiGFusion introduces graph-on-logits distillation with an O(n log n) Gromov-Wasserstein approximation to fuse LLMs by modeling token co-activations, reporting gains over baselines on 11 benchmarks.

Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures

cs.AI · 2026-04-20 · unverdicted · novelty 4.0

A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.

citing papers explorer

Showing 26 of 26 citing papers.

Refute-or-Promote: An Adversarial Stage-Gated Multi-Agent Review Methodology for High-Precision LLM-Assisted Defect Discovery cs.CR · 2026-04-21 · unverdicted · none · ref 36 · internal anchor
Refute-or-Promote applies adversarial multi-agent review with kill gates and empirical verification to filter LLM defect candidates, killing 79-83% before disclosure and yielding 4 CVEs plus multiple accepted fixes across libraries, C++ standard, and compilers.
SAT: Sequential Agent Tuning for Coordinator Free Plug and Play Multi-LLM Training with Monotonic Improvement Guarantees cs.LG · 2026-04-17 · unverdicted · none · ref 26 · internal anchor
SAT trains multi-LLM teams with sequential block updates to deliver monotonic gains and plug-and-play model swaps that provably improve performance bounds.
Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference cs.CL · 2026-02-23 · unverdicted · none · ref 4 · internal anchor
Pyramid MoA is a hierarchical Mixture-of-Agents system with a decision-theoretic router that achieves up to 42.9% compute savings while nearly matching oracle accuracy on MBPP, GSM8K, MMLU, HumanEval, and MATH.
SANet: A Semantic-aware Agentic AI Networking Framework for Cross-layer Optimization in 6G cs.AI · 2025-12-27 · unverdicted · none · ref 34 · internal anchor
SANet uses semantic-aware AI agents for cross-layer 6G optimization, achieving up to 14.61% performance gains with 44.37% of the FLOPs of prior methods via model partitioning and decentralized multi-objective algorithms.
When Efficiency Backfires: Cascading LLMs Trigger Cascade Failure under Adversarial Attack cs.CR · 2026-05-17 · unverdicted · none · ref 12 · internal anchor
LLM cascade systems are vulnerable to a new adversarial attack that simultaneously degrades accuracy and destroys the intended cost savings by targeting both the lightweight models and the escalation decision mechanism.
Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models cs.LG · 2026-05-08 · unverdicted · none · ref 91 · internal anchor
Mutual Reinforcement Learning allows heterogeneous LLMs to exchange experience through mechanisms like Peer Rollout Pooling, Cross-Policy GRPO Advantage Sharing, and Success-Gated Transfer, with outcome-level sharing identified as favorable on the stability-support trade-off.
Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding cs.AI · 2026-05-04 · unverdicted · none · ref 36 · internal anchor
CoRD uses collaborative multi-teacher step-wise decoding with perplexity-guided beam search to generate higher-quality Long-CoT data that lets smaller models reach near-teacher performance with less supervision.
Talk is Cheap, Communication is Hard: Dynamic Grounding Failures and Repair in Multi-Agent Negotiation cs.MA · 2026-05-03 · unverdicted · none · ref 24 · internal anchor
LLM agent pairs in a resource allocation negotiation game fail to reach Pareto-optimal outcomes due to dynamic grounding failures such as loss of interaction history, anchoring, and referential errors.
TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination cs.LG · 2026-05-01 · unverdicted · none · ref 15 · internal anchor
TeamTR is a trust-region framework for multi-agent LLM fine-tuning that resamples trajectories after each update to convert quadratic compounding occupancy shift into linear scaling and yields per-update improvement lower bounds.
CTM-AI: A Blueprint for General AI Inspired by a Model of Consciousness q-bio.NC · 2026-04-30 · unverdicted · none · ref 21 · internal anchor
CTM-AI combines a formal consciousness model with foundation models to report state-of-the-art results on sarcasm detection, humor, and agentic tool-use benchmarks.
Trace-Level Analysis of Information Contamination in Multi-Agent Systems cs.AI · 2026-04-30 · unverdicted · none · ref 33 · internal anchor
Agent workflows can diverge substantially from contaminated inputs yet recover correct answers, or stay similar while failing, as measured by trace divergence on GAIA tasks.
SpatiO: Adaptive Test-Time Orchestration of Vision-Language Agents for Spatial Reasoning cs.CV · 2026-04-23 · unverdicted · none · ref 34 · internal anchor
SpatiO uses heterogeneous vision-language agents with test-time orchestration to dynamically weight their contributions for improved spatial reasoning on benchmarks like 3DSRBench and CV-Bench.
CADMAS-CTX: Contextual Capability Calibration for Multi-Agent Delegation cs.AI · 2026-04-20 · unverdicted · none · ref 26 · internal anchor
CADMAS-CTX replaces static skill profiles with context-conditioned Beta posteriors and uncertainty-penalized routing, yielding higher accuracy on GAIA (0.442) and SWE-bench (31.4%) than static baselines.
ChemGraph-XANES: An Agentic Framework for XANES Simulation and Analysis cond-mat.mtrl-sci · 2026-04-17 · unverdicted · none · ref 11 · internal anchor
An LLM-orchestrated framework automates the full XANES workflow from natural language to normalized spectra and curated data.
Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus cs.LG · 2026-04-04 · conditional · none · ref 14 · internal anchor
LLM agent committees exhibit representational collapse with mean cosine similarity of 0.888, and diversity-aware consensus reaches 87% accuracy on GSM8K versus 84% for self-consistency at lower cost.
"The Whole Is Greater Than the Sum of Its Parts": A Compatibility-Aware Multi-Teacher CoT Distillation Framework cs.CL · 2026-01-20 · unverdicted · none · ref 11 · internal anchor
COMPACT adaptively fuses multi-teacher CoT supervisions using graph-based consensus, mutual-information adaptability, and loss-based difficulty metrics to improve small language model reasoning performance while mitigating catastrophic forgetting.
TRINITY: An Evolved LLM Coordinator cs.LG · 2025-12-04 · unverdicted · none · ref 18 · internal anchor
A compact 0.6B-parameter coordinator with a 10K-parameter head uses evolutionary strategy to dynamically delegate roles to LLMs, achieving SOTA results such as 86.2% on LiveCodeBench.
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling cs.LG · 2024-07-31 · unverdicted · none · ref 61 · internal anchor
Repeated sampling scales problem coverage log-linearly with sample count, improving SWE-bench Lite performance from 15.9% to 56% using 250 samples.
Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines cs.AI · 2026-05-20 · unverdicted · none · ref 21 · internal anchor
Temporal semantic caching and MCP workflow optimizations deliver 30.6x median speedup on cache hits and 1.67x overall speedup with 40% latency reduction on the AssetOpsBench industrial agent benchmark.
DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization cs.RO · 2026-05-17 · unverdicted · none · ref 120 · internal anchor
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.
Multi-Agent Reasoning Improves Compute Efficiency: Pareto-Optimal Test-Time Scaling cs.AI · 2026-05-02 · unverdicted · none · ref 12 · internal anchor
Multi-agent debate and mixture-of-agents outperform self-consistency by 1.3 and 2.7 percentage points respectively at equal compute budgets on MMLU-Pro and BBH, with advantages that continue at higher scales while self-consistency saturates.
A Scalable Multi-LLM Collaboration System with Retrieval-based Selection and Exploration-Exploitation-Driven Enhancement cs.CL · 2025-07-14 · unverdicted · none · ref 56 · internal anchor
SMCS coordinates 15 open-source LLMs via retrieval-based prior selection and exploration-exploitation posterior enhancement, outperforming GPT-4.1 by 5.36% and GPT-o3-mini by 5.28% on eight benchmarks.
InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion cs.CL · 2025-05-20 · unverdicted · none · ref 3 · internal anchor
InfiGFusion introduces graph-on-logits distillation with an O(n log n) Gromov-Wasserstein approximation to fuse LLMs by modeling token co-activations, reporting gains over baselines on 11 benchmarks.
Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures cs.AI · 2026-04-20 · unverdicted · none · ref 128 · internal anchor
A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.
Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities cs.DC · 2026-04-24 · unverdicted · none · ref 157 · internal anchor
A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.
Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems cs.AI · 2026-04-16 · unreviewed · ref 10 · internal anchor

Mixture-of-Agents Enhances Large Language Model Capabilities

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer