pith. sign in

arxiv: 2605.15706 · v1 · pith:SC7ZEXFGnew · submitted 2026-05-15 · 💻 cs.LG

Differentiable Mixture-of-Agents Incentivizes Swarm Intelligence of Large Language Models

Pith reviewed 2026-05-20 19:49 UTC · model grok-4.3

classification 💻 cs.LG
keywords multi-agent systemslarge language modelsdifferentiable routingadaptive collaborationswarm intelligencepredictive entropyself-supervised optimizationdynamic topologies
0
0 comments X

The pith

Differentiable Mixture-of-Agents lets large language models dynamically route and activate agents at each reasoning step without pre-defined communication topologies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes that multi-agent systems built from large language models can evolve their own collaboration patterns during inference rather than relying on fixed structures chosen in advance. It does this by introducing a routing process that treats agent selection as a differentiable operation informed by context from prior steps, then tunes that process using the model's own uncertainty measured as predictive entropy. A sympathetic reader would care because the approach removes the need for manual redesign of agent workflows when tasks change, potentially making collective reasoning more practical for open-ended problems. The method is tested across nine benchmarks and reported to match or exceed prior systems in accuracy while using agents more efficiently. This points toward AI setups that can reconfigure themselves on the fly as new information arrives.

Core claim

DMoA is a self-evolving multi-agent framework that enables elastic and adaptive agent collaboration during inference by dynamically routing and activating agents at each reasoning step. It relies on a differentiable, context-aware routing mechanism with recurrent structures to incorporate historical and contextual information and produce sparse activations. Predictive entropy serves as a self-supervised signal to optimize the routing process, allowing the system to implicitly simulate diverse communication topologies and adapt to evolving task demands without external annotations.

What carries the argument

Differentiable context-aware routing mechanism with recurrent structures that produces sparse agent activations in a step-wise manner.

If this is right

  • The system can adapt its collaboration pattern to changing task demands during a single inference run.
  • Sparse activations improve efficiency while maintaining or improving accuracy across benchmarks.
  • Ensembling emerges naturally from the dynamic routing without requiring pre-compiled workflows.
  • Test-time adaptation occurs using only internal model signals rather than labeled data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same routing idea could be tested on non-language tasks such as planning or code generation where agent roles shift mid-process.
  • If the entropy signal proves sufficient, it may reduce reliance on human-designed agent graphs in other multi-model setups.
  • Extending the recurrent memory to longer horizons might reveal limits in how well the system tracks evolving demands.

Load-bearing premise

Predictive entropy alone, without external annotations, can guide a differentiable routing process to discover effective and adaptable agent collaboration patterns.

What would settle it

A controlled comparison on the same nine benchmarks where the recurrent context or entropy optimization is removed and performance falls to the level of static multi-agent baselines.

Figures

Figures reproduced from arXiv: 2605.15706 by Bin Yang, Chenjuan Guo, Jilin Hu, Junkai Lu, Siyu Yan, Xiangfei Qiu, Xingjian Wu.

Figure 1
Figure 1. Figure 1: Current Multi-agent Systems (MAS) show inflexibility in different aspects, which have [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The overview of DMoA. An agent pool is initialized to possess diverse expert capabilities. During optimization, DMoA runs all agents to collect the predictive entropy, and utilize it as the supervision signal. During inference, only several agents are activated in each reasoning step. which reflects the progress of query processing and intermediate demands; (2) the historical routing decisions, which helps… view at source ↗
Figure 3
Figure 3. Figure 3: Analyses of robustness. We compare the accuracy (%) of multiple multi-agent systems before and after prompt attacks on all benchmarks, and report the average accuracies. Test Time Training. DMoA is optimized through self-supervision signals from the step-wise pre￾dictive entropy, which is dense and easily ob￾tainable, thus DMoA originally supports test time training. Specifically, facing the first 10–30 qu… view at source ↗
Figure 4
Figure 4. Figure 4: Comparisons among different routing mechanisms and loss functions on all benchmarks. Ablation studies. We conduct ablation studies in [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of accuracy and consump￾tion of multi-agent systems across MMLU, Hu￾manEval, GSM8K, and SVAMP. The diameters of circles represent the scales of token consumption. (4, 2) (6, 3) (10, 4) (15, 6) (20, 7) Configuration (N,K) 85 90 95 100 Accuracy (%) MMLU GSM8K HumanEval SVAMP [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Robustness analysis under different adversarial-agent ratios. We compare the average [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Case study on GSM8K. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Case study on MultiArith. HumanEval (case c) from typing import List def f(n: int) -> List[int]: """ Implement the function f that takes n as a parameter, and returns a list of size n, such that the value of the element at index i is the factorial of i if i is even or the sum of numbers from 1 to i otherwise. i starts from 1. the factorial of i is the multiplication of the numbers from 1 to i (1 * 2 * ... … view at source ↗
Figure 10
Figure 10. Figure 10: Case study on HumanEval. F Details of Baselines For fair comparison, all baselines use gpt-oss-120b, the same prompt template as DMoA, and decoding temperature 0.1. For method-specific hyperparameters, we follow the original papers whenever available and tune validation-dependent thresholds under the same training budget of 40–80 queries. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Case study on DS-1000. MMLU (case e) Which of the following is not one of nor follows directly from Kepler's laws? A. As a planet moves around its orbit it sweeps out equal areas in equal times. B. The orbit of each planet about the Sun is an ellipse with the Sun at one focus. C. The force of attraction between any two objects decreases with the square of the distance between their centers. D. A planet tr… view at source ↗
Figure 12
Figure 12. Figure 12: Case study on MMLU. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗
read the original abstract

Recent advances in Large Language Models (LLMs) have catalyzed the development of multi-agent systems (MAS) for complex reasoning tasks. However, existing MAS typically rely on pre-defined or pre-compiled communication topologies, which limits their flexibility and adaptability to dynamic task requirements. In this work, we propose Differentiable Mixture-of-Agents (DMoA), a self-evolving multi-agent framework that enables elastic and adaptive agent collaboration during inference. Instead of statically constructing workflows, DMoA dynamically routes and activates agents at each reasoning step, allowing the system to implicitly simulate diverse communication topologies and adapt to evolving demands. To achieve this, we design a differentiable, context-aware routing mechanism that leverages recurrent structures to incorporate historical and contextual information, producing sparse agent activations in a step-wise manner. Furthermore, we introduce predictive entropy as self-supervised signals to optimize the routing process, enabling efficient test-time adaptation without external annotations. Extensive experiments across 9 benchmarks demonstrate that DMoA achieves state-of-the-art performance while exhibiting strong efficiency, robustness, and ensembling capabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Differentiable Mixture-of-Agents (DMoA), a multi-agent LLM framework that replaces static communication topologies with a differentiable, context-aware router using recurrent structures to produce sparse, step-wise agent activations. Routing parameters are optimized at test time solely via predictive entropy on agent outputs as a self-supervised objective, allowing the system to implicitly discover diverse collaboration patterns and adapt to task demands without external labels. Experiments across nine benchmarks are reported to establish state-of-the-art performance together with gains in efficiency, robustness, and ensembling.

Significance. If the central mechanism is shown to produce genuinely distinct activation topologies rather than merely sparse but topologically similar selections, the approach would offer a practical route to annotation-free, adaptive multi-agent reasoning. The idea of using predictive entropy directly as a routing objective is conceptually clean and could generalize beyond the specific LLM agents tested.

major comments (2)
  1. [§3.2] §3.2 (recurrent router and entropy loss): the claim that predictive entropy alone supplies a gradient signal sufficient to discover and switch among qualitatively different communication topologies (sequential, parallel, hierarchical) is not yet supported by direct evidence. Entropy quantifies output uncertainty but does not explicitly penalize or reward particular activation graphs; without an ablation that measures topological diversity (e.g., graph-edit distance or activation-pattern clustering across tasks), it remains possible that the reported gains arise from sparse but structurally similar selections.
  2. [§4] §4 (experiments): the abstract asserts SOTA results and robustness across nine benchmarks, yet the manuscript supplies neither per-benchmark accuracy tables with error bars, nor ablation studies isolating the recurrent state versus the entropy objective, nor dataset descriptions. These omissions make it impossible to assess whether the performance delta is load-bearing or reducible to the fitted routing parameters themselves.
minor comments (2)
  1. [§3.1] Notation for the recurrent hidden state and the precise form of the entropy loss should be introduced with an equation number in §3.1 so that readers can trace the gradient path without ambiguity.
  2. [Figure 2] Figure 2 (activation heatmaps) would benefit from an additional panel showing the same tasks under a non-recurrent baseline to visually demonstrate the claimed topological diversity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (recurrent router and entropy loss): the claim that predictive entropy alone supplies a gradient signal sufficient to discover and switch among qualitatively different communication topologies (sequential, parallel, hierarchical) is not yet supported by direct evidence. Entropy quantifies output uncertainty but does not explicitly penalize or reward particular activation graphs; without an ablation that measures topological diversity (e.g., graph-edit distance or activation-pattern clustering across tasks), it remains possible that the reported gains arise from sparse but structurally similar selections.

    Authors: We agree that explicit quantification of topological diversity would strengthen the central claim. The predictive entropy objective is intended to drive the router toward lower-uncertainty outputs, which in our framework encourages selection of agent combinations that produce qualitatively different collaboration patterns. Nevertheless, the current manuscript does not include direct measurements such as activation-pattern clustering or graph-edit distances. In the revision we will add an analysis that clusters routing decisions across tasks and reports the diversity of emergent topologies to address this point. revision: yes

  2. Referee: [§4] §4 (experiments): the abstract asserts SOTA results and robustness across nine benchmarks, yet the manuscript supplies neither per-benchmark accuracy tables with error bars, nor ablation studies isolating the recurrent state versus the entropy objective, nor dataset descriptions. These omissions make it impossible to assess whether the performance delta is load-bearing or reducible to the fitted routing parameters themselves.

    Authors: We acknowledge that the experimental section would benefit from greater detail. The revised manuscript will include full per-benchmark accuracy tables with means and standard deviations from multiple runs, ablation studies that separately disable the recurrent state and the entropy objective, and expanded dataset descriptions with references and statistics. These additions will make the source of the reported gains clearer. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical gains on external benchmarks are independent of routing optimization

full rationale

The paper introduces a differentiable recurrent router optimized at test time via predictive entropy as a self-supervised loss on agent outputs. This is a standard self-supervised training step whose objective is defined on the model's own predictions. The load-bearing claims (SOTA performance, implicit simulation of diverse topologies, robustness) are supported by direct evaluation on 9 held-out benchmarks whose labels and metrics are external to the entropy signal. No equation or derivation reduces a reported result to a quantity that is definitionally identical to the fitted router parameters; the benchmarks serve as an independent falsification test. Self-citations, if present, are not load-bearing for the central empirical result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested premise that the proposed routing mechanism produces useful sparse activations and that entropy provides effective self-supervision; no free parameters or invented entities are explicitly listed in the abstract.

axioms (1)
  • domain assumption Differentiable context-aware routing with recurrent structures can simulate diverse communication topologies and adapt without external labels
    Invoked when describing the elastic collaboration and test-time adaptation.

pith-pipeline@v0.9.0 · 5730 in / 1097 out tokens · 27360 ms · 2026-05-20T19:49:08.279287+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

99 extracted references · 99 canonical work pages · 13 internal anchors

  1. [1]

    , title =

    LangChain Inc. , title =. 2024 , howpublished =

  2. [2]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation , author=

  3. [5]

    Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=

    Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x , author=. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=

  4. [7]

    Forty-first International Conference on Machine Learning , year=

    Gptswarm: Language agents as optimizable graphs , author=. Forty-first International Conference on Machine Learning , year=

  5. [8]

    G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks , author=

  6. [9]

    2026 , archivePrefix=

    TimeART: Towards Agentic Time Series Reasoning via Tool-Augmentation , author=. 2026 , archivePrefix=

  7. [11]

    Forty-first International Conference on Machine Learning , year=

    Improving factuality and reasoning in language models through multiagent debate , author=. Forty-first International Conference on Machine Learning , year=

  8. [12]

    The Eleventh International Conference on Learning Representations , year=

    Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. The Eleventh International Conference on Learning Representations , year=

  9. [13]

    Advances in neural information processing systems , volume=

    Ddxplus: A new dataset for automatic medical diagnosis , author=. Advances in neural information processing systems , volume=

  10. [14]

    Proceedings of the 2018 conference on empirical methods in natural language processing , pages=

    HotpotQA: A dataset for diverse, explainable multi-hop question answering , author=. Proceedings of the 2018 conference on empirical methods in natural language processing , pages=

  11. [15]

    International Conference on Machine Learning , pages=

    DS-1000: A natural and reliable benchmark for data science code generation , author=. International Conference on Machine Learning , pages=. 2023 , organization=

  12. [16]

    Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , author=. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  13. [18]

    Proceedings of the 2015 conference on empirical methods in natural language processing , pages=

    Solving general arithmetic word problems , author=. Proceedings of the 2015 conference on empirical methods in natural language processing , pages=

  14. [20]

    Measuring Massive Multitask Language Understanding , author=

  15. [21]

    Neural computation , volume=

    A practical Bayesian framework for backpropagation networks , author=. Neural computation , volume=. 1992 , publisher=

  16. [22]

    Wu, Xingjian and Qiu, Xiangfei and Li, Zhengyu and Wang, Yihang and Hu, Jilin and Guo, Chenjuan and Xiong, Hui and Yang, Bin , booktitle =

  17. [23]

    Categorical Reparameterization with Gumbel-Softmax

    Categorical reparameterization with gumbel-softmax , author=. arXiv preprint arXiv:1611.01144 , year=

  18. [24]

    Proceedings of the IEEE/CVF international conference on computer vision , pages=

    Scalable diffusion models with transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

  19. [25]

    Computational Social Networks , volume=

    Graph convolutional networks: a comprehensive review , author=. Computational Social Networks , volume=. 2019 , publisher=

  20. [26]

    Forty-second International Conference on Machine Learning , year=

    Sundial: A Family of Highly Capable Time Series Foundation Models , author=. Forty-second International Conference on Machine Learning , year=

  21. [27]

    Large language model agent in financial trading: A survey,

    Large language model agent in financial trading: A survey , author=. arXiv preprint arXiv:2408.06361 , year=

  22. [29]

    Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) , pages=

    Learning phrase representations using RNN encoder--decoder for statistical machine translation , author=. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) , pages=

  23. [30]

    Proceedings of the 22nd international conference on Machine learning , pages=

    Learning to rank using gradient descent , author=. Proceedings of the 22nd international conference on Machine learning , pages=

  24. [33]

    Machine Learning , volume=

    Learning multi-agent coordination through connectivity-driven communication , author=. Machine Learning , volume=. 2023 , publisher=

  25. [34]

    LLM Multi-Agent Systems: Challenges and Open Problems

    LLM multi-agent systems: Challenges and open problems , author=. arXiv preprint arXiv:2402.03578 , year=

  26. [35]

    The Thirteenth International Conference on Learning Representations , year=

    AFlow: Automating Agentic Workflow Generation , author=. The Thirteenth International Conference on Learning Representations , year=

  27. [39]

    Advances in neural information processing systems , volume=

    Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

  28. [40]

    The twelfth international conference on learning representations , year=

    MetaGPT: Meta programming for a multi-agent collaborative framework , author=. The twelfth international conference on learning representations , year=

  29. [42]

    The eleventh international conference on learning representations , year=

    React: Synergizing reasoning and acting in language models , author=. The eleventh international conference on learning representations , year=

  30. [43]

    arXiv preprint arXiv:2501.07834 , year=

    Flow: Modularized agentic workflow automation , author=. arXiv preprint arXiv:2501.07834 , year=

  31. [45]

    A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

    A comprehensive survey of self-evolving ai agents: A new paradigm bridging foundation models and lifelong agentic systems , author=. arXiv preprint arXiv:2508.07407 , year=

  32. [46]

    Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , pages=

    Evoagentx: An automated framework for evolving agentic workflows , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , pages=

  33. [48]

    ACM Computing Surveys , year=

    A survey on diffusion models for time series and spatio-temporal data , author=. ACM Computing Surveys , year=

  34. [49]

    IEEE transactions on knowledge and data engineering , volume=

    Deep learning for spatio-temporal data mining: A survey , author=. IEEE transactions on knowledge and data engineering , volume=. 2020 , publisher=

  35. [50]

    Yaron Lipman and Ricky T. Q. Chen and Heli Ben. Flow Matching for Generative Modeling , booktitle =. 2023 , timestamp =

  36. [51]

    ICLR , year =

    Aurora: Towards Universal Generative Multimodal Time Series Forecasting , author =. ICLR , year =

  37. [52]

    arXiv preprint arXiv:2510.08558 , year=

    Agent learning via early experience , author=. arXiv preprint arXiv:2510.08558 , year=

  38. [53]

    Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence , pages=

    Large language model as a policy teacher for training reinforcement learning agents , author=. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence , pages=

  39. [54]

    2024 , eprint=

    ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving , author=. 2024 , eprint=

  40. [55]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Chatdev: Communicative agents for software development , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  41. [56]

    Advances in Neural Information Processing Systems , volume=

    Multi-LLM debate: Framework, principals, and interventions , author=. Advances in Neural Information Processing Systems , volume=

  42. [57]

    Le and Geoffrey E

    Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc V. Le and Geoffrey E. Hinton and Jeff Dean , title =. 5th International Conference on Learning Representations,. 2017 , url =

  43. [58]

    Proceedings of the 25th international conference on Machine learning , pages=

    Listwise approach to learning to rank: theory and algorithm , author=. Proceedings of the 25th international conference on Machine learning , pages=

  44. [59]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Facenet: A unified embedding for face recognition and clustering , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  45. [60]

    Advances in Neural Information Processing Systems , volume=

    Mixture-of-experts with expert choice routing , author=. Advances in Neural Information Processing Systems , volume=

  46. [61]

    9th International Conference on Learning Representations,

    Dmitry Lepikhin and HyoukJoong Lee and Yuanzhong Xu and Dehao Chen and Orhan Firat and Yanping Huang and Maxim Krikun and Noam Shazeer and Zhifeng Chen , title =. 9th International Conference on Learning Representations,. 2021 , url =

  47. [62]

    RMoA: Optimizing Mixture-of-Agents through Diversity Maximization and Residual Compensation , booktitle =

    Zhentao Xie and Chengcheng Han and Jinxin Shi and Wenjun Cui and Xin Zhao and Xingjiao Wu and Jiabao Zhao , editor =. RMoA: Optimizing Mixture-of-Agents through Diversity Maximization and Residual Compensation , booktitle =. 2025 , url =

  48. [63]

    The Thirteenth International Conference on Learning Representations , year=

    Mixture-of-Agents Enhances Large Language Model Capabilities , author=. The Thirteenth International Conference on Learning Representations , year=

  49. [64]

    ChemAgent: Self-updating memories in large language models improves chemical reasoning

    Chemagent: Self-updating library in large language models improves chemical reasoning , author=. arXiv preprint arXiv:2501.06590 , year=

  50. [65]

    Proceedings of the Annual Meeting of the Cognitive Science Society , volume=

    Temporal dynamic weighted graph convolution for multi-agent reinforcement learning , author=. Proceedings of the Annual Meeting of the Cognitive Science Society , volume=

  51. [69]

    Tora: A tool-integrated reasoning agent for mathematical problem solving, 2024

    Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, and Weizhu Chen. Tora: A tool-integrated reasoning agent for mathematical problem solving, 2024

  52. [70]

    Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x

    Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Lei Shen, Zihan Wang, Andi Wang, Yang Li, et al. Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5673--5684, 2023

  53. [71]

    Surrealdriver: Designing llm-powered generative driver agent framework based on human drivers’ driving-thinking data,

    Ye Jin, Xiaoxi Shen, Huiling Peng, Xiaoan Liu, Jingli Qin, Jiayang Li, Jintao Xie, Peizhong Gao, Guyue Zhou, and Jiangtao Gong. Surrealdriver: Designing generative driver agent simulation framework in urban contexts based on large language model. arXiv preprint arXiv:2309.13193, 5 0 (7): 0 8, 2023

  54. [72]

    DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments

    Yuxiang Zheng, Dayuan Fu, Xiangkun Hu, Xiaojie Cai, Lyumanshan Ye, Pengrui Lu, and Pengfei Liu. Deepresearcher: Scaling deep research via reinforcement learning in real-world environments. arXiv preprint arXiv:2504.03160, 2025

  55. [73]

    Metagpt: Meta programming for a multi-agent collaborative framework

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collaborative framework. In The twelfth international conference on learning representations, 2023

  56. [74]

    Significant Gravitas . Autogpt. https://github.com/Significant-Gravitas/AutoGPT, 2023

  57. [75]

    Evoagentx: An automated framework for evolving agentic workflows

    Yingxu Wang, Siwei Liu, Jinyuan Fang, and Zaiqiao Meng. Evoagentx: An automated framework for evolving agentic workflows. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 643--655, 2025 a

  58. [76]

    G-designer: Architecting multi-agent communication topologies via graph neural networks

    Guibin Zhang, Yanwei Yue, Xiangguo Sun, Guancheng Wan, Miao Yu, Junfeng Fang, Kun Wang, Tianlong Chen, and Dawei Cheng. G-designer: Architecting multi-agent communication topologies via graph neural networks. In Forty-second International Conference on Machine Learning

  59. [77]

    Clement Vignac et al

    Shiyuan Li, Yixin Liu, Qingsong Wen, Chengqi Zhang, and Shirui Pan. Assemble your crew: Automatic multi-agent communication topology design via autoregressive graph generation. arXiv preprint arXiv:2507.18224, 2025 a

  60. [78]

    Safesieve: From heuristics to experience in progressive pruning for llm-based multi-agent communication

    Ruijia Zhang, Xinyan Zhao, Ruixiang Wang, Sigen Chen, Guibin Zhang, An Zhang, Kun Wang, and Qingsong Wen. Safesieve: From heuristics to experience in progressive pruning for llm-based multi-agent communication. arXiv preprint arXiv:2508.11733, 2025

  61. [79]

    Mixture-of-agents enhances large language model capabilities

    Junlin Wang, WANG Jue, Ben Athiwaratkun, Ce Zhang, and James Zou. Mixture-of-agents enhances large language model capabilities. In The Thirteenth International Conference on Learning Representations, 2025 b

  62. [80]

    Rmoa: Optimizing mixture-of-agents through diversity maximization and residual compensation

    Zhentao Xie, Chengcheng Han, Jinxin Shi, Wenjun Cui, Xin Zhao, Xingjiao Wu, and Jiabao Zhao. Rmoa: Optimizing mixture-of-agents through diversity maximization and residual compensation. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors, Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austr...

  63. [81]

    Le, Geoffrey E

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc V. Le, Geoffrey E. Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings . OpenReview.net, 2017. URL h...

  64. [82]

    Gshard: Scaling giant models with conditional computation and automatic sharding

    Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. Gshard: Scaling giant models with conditional computation and automatic sharding. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net, 2021. URL http...

  65. [83]

    A practical bayesian framework for backpropagation networks

    David JC MacKay. A practical bayesian framework for backpropagation networks. Neural computation, 4 0 (3): 0 448--472, 1992

  66. [84]

    Search-o1: Agentic Search-Enhanced Large Reasoning Models

    Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, and Zhicheng Dou. Search-o1: Agentic search-enhanced large reasoning models. arXiv preprint arXiv:2501.05366, 2025 b

  67. [85]

    Tree of Thoughts: Deliberate Problem Solving with Large Language Models

    Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models, 2023. URL https://arxiv. org/abs/2305.10601, 3: 0 1, 2023

  68. [86]

    Autogen: Enabling next-gen llm applications via multi-agent conversation

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversation. In ICLR 2024 Workshop on Large Language Model (LLM) Agents

  69. [87]

    Gptswarm: Language agents as optimizable graphs

    Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and J \"u rgen Schmidhuber. Gptswarm: Language agents as optimizable graphs. In Forty-first International Conference on Machine Learning, 2024

  70. [88]

    Langgraph

    LangChain Inc. Langgraph. https://github.com/langchain-ai/langgraph, 2024

  71. [89]

    A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration

    Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, and Diyi Yang. Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization. arXiv preprint arXiv:2310.02170, 2023

  72. [90]

    Temporal dynamic weighted graph convolution for multi-agent reinforcement learning

    Yuntao Liu, Yong Dou, Yuan Li, Xinhai Xu, and Donghong Liu. Temporal dynamic weighted graph convolution for multi-agent reinforcement learning. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 44, 2022

  73. [91]

    ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

    Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201, 2023

  74. [92]

    Learning multi-agent communication from graph modeling perspective

    Shengchao Hu, Li Shen, Ya Zhang, and Dacheng Tao. Learning multi-agent communication from graph modeling perspective. arXiv preprint arXiv:2405.08550, 2024

  75. [93]

    Yu Shang et al

    Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Kunlun Zhu, Hanchen Xia, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, et al. Scaling large language model-based multi-agent collaboration. arXiv preprint arXiv:2406.07155, 2024

  76. [94]

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

    Nils Reimers, I Sentence-BERT Gurevych, et al. Sentence embeddings using siamese bert-networks. arxiv 2019. arXiv preprint arXiv:1908.10084, 10, 1908

  77. [95]

    Learning phrase representations using rnn encoder--decoder for statistical machine translation

    Kyunghyun Cho, Bart Van Merri \"e nboer, C a g lar Gul c ehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder--decoder for statistical machine translation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1724--1734, 2014

  78. [96]

    Learning multi-agent coordination through connectivity-driven communication

    Emanuele Pesce and Giovanni Montana. Learning multi-agent coordination through connectivity-driven communication. Machine Learning, 112 0 (2): 0 483--514, 2023

  79. [97]

    Agentic Reinforced Policy Optimization

    Guanting Dong, Hangyu Mao, Kai Ma, Licheng Bao, Yifei Chen, Zhongyuan Wang, Zhongxia Chen, Jiazhen Du, Huiyang Wang, Fuzheng Zhang, et al. Agentic reinforced policy optimization. arXiv preprint arXiv:2507.19849, 2025 a

  80. [98]

    Agentic entropy-balanced policy optimization.arXiv preprint arXiv:2510.14545, 2025

    Guanting Dong, Licheng Bao, Zhongyuan Wang, Kangzhi Zhao, Xiaoxi Li, Jiajie Jin, Jinghan Yang, Hangyu Mao, Fuzheng Zhang, Kun Gai, et al. Agentic entropy-balanced policy optimization. arXiv preprint arXiv:2510.14545, 2025 b

Showing first 80 references.