Models delayed verification in multi-agent LLMs as graph consensus, derives stability thresholds (inverse golden ratio for delay two) via grounded Laplacian, and gives a supermodular greedy rule for corrector placement; experiments on five models confirm dose-delay oscillations.
hub Mixed citations
K., Zhu, X., and Li, S
Mixed citation behavior. Most common role is background (40%).
hub tools
citation-role summary
citation-polarity summary
years
2026 13verdicts
UNVERDICTED 13roles
background 5representative citing papers
ATLAS introduces an LLM-orchestrated agentic framework for dynamic test-time scaling via extensible 'explore' actions, achieving higher accuracy with fewer API calls than fixed-workflow baselines on four benchmarks.
MAD-OPD recasts on-policy distillation teachers as a debating collective to supply better supervision, lifting agentic and code performance over single-teacher OPD across multiple model sizes.
WORC improves multi-agent LLM reasoning to 82.2% average accuracy by predicting and compensating for the weakest agent via targeted extra sampling rather than uniform reinforcement.
SDRL trains LLMs via self-generated multi-path debates and joint optimization of standalone plus debate-conditioned responses to boost both single-model reasoning and multi-agent debate performance.
TreeAgent uses a Decoupled Declarative Decision (D3) Framework to orchestrate expert rules and VLMs for tree bias classification, outperforming supervised ML baselines with reduced expert labeling effort.
A kNN lower-confidence-bound approach for act-or-defer decisions in multi-agent LLM debates respects user-declared wrong-action budgets while achieving high automation rates on benchmarks.
Mixture of Debaters uses MoE to enable dynamic self-debate inside one model, claiming better accuracy than multi-agent systems at 3.7x lower latency and 87% fewer tokens on multimodal benchmarks.
Conformal Social Choice aggregates verbalized probabilities from LLM debates via linear opinion pooling and uses split conformal prediction to generate prediction sets that guarantee inclusion of the correct answer with probability at least 1-alpha, enabling adjustable safe act-or-escalate decisions
AgentCity introduces a Separation of Power constitutional architecture on blockchain for governing autonomous agent economies through agent legislation, automated execution, and human accountability.
A multi-agent LLM system with majority voting achieves reported Micro F1 of 0.872 for delusion detection and 0.779 for classification on naturalistic speech transcripts.
On Codeforces problems, independent k-shot sampling achieves better accuracy-cost and accuracy-query tradeoffs than agentic reasoning, even with prompt caching.
EMS reduces the average number of agents invoked for majority voting by 32% via reliability-aware prioritization and early stopping on six benchmarks.
citing papers explorer
-
Delayed Verification Destabilizes Multi-Agent LLM Belief: Instability Thresholds and Optimal Corrector Placement
Models delayed verification in multi-agent LLMs as graph consensus, derives stability thresholds (inverse golden ratio for delay two) via grounded Laplacian, and gives a supermodular greedy rule for corrector placement; experiments on five models confirm dose-delay oscillations.
-
ATLAS: Agentic Test-time Learning-to-Allocate Scaling
ATLAS introduces an LLM-orchestrated agentic framework for dynamic test-time scaling via extensible 'explore' actions, achieving higher accuracy with fewer API calls than fixed-workflow baselines on four benchmarks.
-
MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate
MAD-OPD recasts on-policy distillation teachers as a debating collective to supply better supervision, lifting agentic and code performance over single-teacher OPD across multiple model sizes.
-
Weak-Link Optimization for Multi-Agent Reasoning and Collaboration
WORC improves multi-agent LLM reasoning to 82.2% average accuracy by predicting and compensating for the weakest agent via targeted extra sampling rather than uniform reinforcement.
-
Learning from Self-Debate: Preparing Reasoning Models for Multi-Agent Debate
SDRL trains LLMs via self-generated multi-path debates and joint optimization of standalone plus debate-conditioned responses to boost both single-model reasoning and multi-agent debate performance.
-
TreeAgent: A Generalizable Multi-Agent Framework for Automated Bias Labeling in Forestry via Compiled Expert Rules and Vision-Language Models
TreeAgent uses a Decoupled Declarative Decision (D3) Framework to orchestrate expert rules and VLMs for tree bias classification, outperforming supervised ML baselines with reduced expert labeling effort.
-
Budgeted Act-or-Defer Multi-Agent LLM Deliberation with Local Reliability Bounds
A kNN lower-confidence-bound approach for act-or-defer decisions in multi-agent LLM debates respects user-declared wrong-action budgets while achieving high automation rates on benchmarks.
-
Mixture of Debaters: Learn to Debate at Architectural Level in Multi-Agent Reasoning
Mixture of Debaters uses MoE to enable dynamic self-debate inside one model, claiming better accuracy than multi-agent systems at 3.7x lower latency and 87% fewer tokens on multimodal benchmarks.
-
From Debate to Decision: Conformal Social Choice for Safe Multi-Agent Deliberation
Conformal Social Choice aggregates verbalized probabilities from LLM debates via linear opinion pooling and uses split conformal prediction to generate prediction sets that guarantee inclusion of the correct answer with probability at least 1-alpha, enabling adjustable safe act-or-escalate decisions
-
AgentCity: Constitutional Governance for Autonomous Agent Economies via Separation of Power
AgentCity introduces a Separation of Power constitutional architecture on blockchain for governing autonomous agent economies through agent legislation, automated execution, and human accountability.
-
Automated Detection and Classification of Delusion-related Content in Naturalistic Audio Diaries Using Multi-Agent Language Models
A multi-agent LLM system with majority voting achieves reported Micro F1 of 0.872 for delusion detection and 0.779 for classification on naturalistic speech transcripts.
-
When Independent Sampling Outperforms Agentic Reasoning
On Codeforces problems, independent k-shot sampling achieves better accuracy-cost and accuracy-query tradeoffs than agentic reasoning, even with prompt caching.
-
EMS: Multi-Agent Voting via Efficient Majority-then-Stopping
EMS reduces the average number of agents invoked for majority voting by 32% via reliability-aware prioritization and early stopping on six benchmarks.