WORC improves multi-agent LLM reasoning to 82.2% average accuracy by predicting and compensating for the weakest agent via targeted extra sampling rather than uniform reinforcement.
Qineng Wang, Zihao Wang, Ying Su, Hanghang Tong, and Yangqiu Song
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
support 1representative citing papers
ARM evolves specialized reasoning modules from basic CoT via tree search to serve as reusable components in multi-agent systems that generalize across models and domains without per-task re-optimization.
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
A layered Mixture-of-Agents system combining multiple LLMs achieves state-of-the-art results on AlpacaEval 2.0 (65.1%), MT-Bench, and FLASK, outperforming GPT-4 Omni.
AstroVLM deploys expert multi-agent collaboration with VLMs to outperform baselines on real-world astronomical imaging quality diagnosis.
A survey of emerging AI agent architectures that organizes single and multi-agent designs around reasoning, planning, tool use, communication, and reflection phases.
citing papers explorer
-
Weak-Link Optimization for Multi-Agent Reasoning and Collaboration
WORC improves multi-agent LLM reasoning to 82.2% average accuracy by predicting and compensating for the weakest agent via targeted extra sampling rather than uniform reinforcement.
-
ARM: Discovering Agentic Reasoning Modules for Generalizable Multi-Agent Systems
ARM evolves specialized reasoning modules from basic CoT via tree search to serve as reusable components in multi-agent systems that generalize across models and domains without per-task re-optimization.
-
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
-
Mixture-of-Agents Enhances Large Language Model Capabilities
A layered Mixture-of-Agents system combining multiple LLMs achieves state-of-the-art results on AlpacaEval 2.0 (65.1%), MT-Bench, and FLASK, outperforming GPT-4 Omni.
-
AstroVLM: Expert Multi-agent Collaborative Reasoning for Astronomical Imaging Quality Diagnosis
AstroVLM deploys expert multi-agent collaboration with VLMs to outperform baselines on real-world astronomical imaging quality diagnosis.
-
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey
A survey of emerging AI agent architectures that organizes single and multi-agent designs around reasoning, planning, tool use, communication, and reflection phases.