Differentiable Mixture-of-Agents Incentivizes Swarm Intelligence of Large Language Models
Pith reviewed 2026-05-20 19:49 UTC · model grok-4.3
The pith
Differentiable Mixture-of-Agents lets large language models dynamically route and activate agents at each reasoning step without pre-defined communication topologies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DMoA is a self-evolving multi-agent framework that enables elastic and adaptive agent collaboration during inference by dynamically routing and activating agents at each reasoning step. It relies on a differentiable, context-aware routing mechanism with recurrent structures to incorporate historical and contextual information and produce sparse activations. Predictive entropy serves as a self-supervised signal to optimize the routing process, allowing the system to implicitly simulate diverse communication topologies and adapt to evolving task demands without external annotations.
What carries the argument
Differentiable context-aware routing mechanism with recurrent structures that produces sparse agent activations in a step-wise manner.
If this is right
- The system can adapt its collaboration pattern to changing task demands during a single inference run.
- Sparse activations improve efficiency while maintaining or improving accuracy across benchmarks.
- Ensembling emerges naturally from the dynamic routing without requiring pre-compiled workflows.
- Test-time adaptation occurs using only internal model signals rather than labeled data.
Where Pith is reading between the lines
- The same routing idea could be tested on non-language tasks such as planning or code generation where agent roles shift mid-process.
- If the entropy signal proves sufficient, it may reduce reliance on human-designed agent graphs in other multi-model setups.
- Extending the recurrent memory to longer horizons might reveal limits in how well the system tracks evolving demands.
Load-bearing premise
Predictive entropy alone, without external annotations, can guide a differentiable routing process to discover effective and adaptable agent collaboration patterns.
What would settle it
A controlled comparison on the same nine benchmarks where the recurrent context or entropy optimization is removed and performance falls to the level of static multi-agent baselines.
Figures
read the original abstract
Recent advances in Large Language Models (LLMs) have catalyzed the development of multi-agent systems (MAS) for complex reasoning tasks. However, existing MAS typically rely on pre-defined or pre-compiled communication topologies, which limits their flexibility and adaptability to dynamic task requirements. In this work, we propose Differentiable Mixture-of-Agents (DMoA), a self-evolving multi-agent framework that enables elastic and adaptive agent collaboration during inference. Instead of statically constructing workflows, DMoA dynamically routes and activates agents at each reasoning step, allowing the system to implicitly simulate diverse communication topologies and adapt to evolving demands. To achieve this, we design a differentiable, context-aware routing mechanism that leverages recurrent structures to incorporate historical and contextual information, producing sparse agent activations in a step-wise manner. Furthermore, we introduce predictive entropy as self-supervised signals to optimize the routing process, enabling efficient test-time adaptation without external annotations. Extensive experiments across 9 benchmarks demonstrate that DMoA achieves state-of-the-art performance while exhibiting strong efficiency, robustness, and ensembling capabilities.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Differentiable Mixture-of-Agents (DMoA), a multi-agent LLM framework that replaces static communication topologies with a differentiable, context-aware router using recurrent structures to produce sparse, step-wise agent activations. Routing parameters are optimized at test time solely via predictive entropy on agent outputs as a self-supervised objective, allowing the system to implicitly discover diverse collaboration patterns and adapt to task demands without external labels. Experiments across nine benchmarks are reported to establish state-of-the-art performance together with gains in efficiency, robustness, and ensembling.
Significance. If the central mechanism is shown to produce genuinely distinct activation topologies rather than merely sparse but topologically similar selections, the approach would offer a practical route to annotation-free, adaptive multi-agent reasoning. The idea of using predictive entropy directly as a routing objective is conceptually clean and could generalize beyond the specific LLM agents tested.
major comments (2)
- [§3.2] §3.2 (recurrent router and entropy loss): the claim that predictive entropy alone supplies a gradient signal sufficient to discover and switch among qualitatively different communication topologies (sequential, parallel, hierarchical) is not yet supported by direct evidence. Entropy quantifies output uncertainty but does not explicitly penalize or reward particular activation graphs; without an ablation that measures topological diversity (e.g., graph-edit distance or activation-pattern clustering across tasks), it remains possible that the reported gains arise from sparse but structurally similar selections.
- [§4] §4 (experiments): the abstract asserts SOTA results and robustness across nine benchmarks, yet the manuscript supplies neither per-benchmark accuracy tables with error bars, nor ablation studies isolating the recurrent state versus the entropy objective, nor dataset descriptions. These omissions make it impossible to assess whether the performance delta is load-bearing or reducible to the fitted routing parameters themselves.
minor comments (2)
- [§3.1] Notation for the recurrent hidden state and the precise form of the entropy loss should be introduced with an equation number in §3.1 so that readers can trace the gradient path without ambiguity.
- [Figure 2] Figure 2 (activation heatmaps) would benefit from an additional panel showing the same tasks under a non-recurrent baseline to visually demonstrate the claimed topological diversity.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (recurrent router and entropy loss): the claim that predictive entropy alone supplies a gradient signal sufficient to discover and switch among qualitatively different communication topologies (sequential, parallel, hierarchical) is not yet supported by direct evidence. Entropy quantifies output uncertainty but does not explicitly penalize or reward particular activation graphs; without an ablation that measures topological diversity (e.g., graph-edit distance or activation-pattern clustering across tasks), it remains possible that the reported gains arise from sparse but structurally similar selections.
Authors: We agree that explicit quantification of topological diversity would strengthen the central claim. The predictive entropy objective is intended to drive the router toward lower-uncertainty outputs, which in our framework encourages selection of agent combinations that produce qualitatively different collaboration patterns. Nevertheless, the current manuscript does not include direct measurements such as activation-pattern clustering or graph-edit distances. In the revision we will add an analysis that clusters routing decisions across tasks and reports the diversity of emergent topologies to address this point. revision: yes
-
Referee: [§4] §4 (experiments): the abstract asserts SOTA results and robustness across nine benchmarks, yet the manuscript supplies neither per-benchmark accuracy tables with error bars, nor ablation studies isolating the recurrent state versus the entropy objective, nor dataset descriptions. These omissions make it impossible to assess whether the performance delta is load-bearing or reducible to the fitted routing parameters themselves.
Authors: We acknowledge that the experimental section would benefit from greater detail. The revised manuscript will include full per-benchmark accuracy tables with means and standard deviations from multiple runs, ablation studies that separately disable the recurrent state and the entropy objective, and expanded dataset descriptions with references and statistics. These additions will make the source of the reported gains clearer. revision: yes
Circularity Check
No significant circularity; empirical gains on external benchmarks are independent of routing optimization
full rationale
The paper introduces a differentiable recurrent router optimized at test time via predictive entropy as a self-supervised loss on agent outputs. This is a standard self-supervised training step whose objective is defined on the model's own predictions. The load-bearing claims (SOTA performance, implicit simulation of diverse topologies, robustness) are supported by direct evaluation on 9 held-out benchmarks whose labels and metrics are external to the entropy signal. No equation or derivation reduces a reported result to a quantity that is definitionally identical to the fitted router parameters; the benchmarks serve as an independent falsification test. Self-citations, if present, are not load-bearing for the central empirical result.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Differentiable context-aware routing with recurrent structures can simulate diverse communication topologies and adapt without external labels
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we design a differentiable, context-aware routing mechanism that leverages recurrent structures ... predictive entropy as self-supervised signals
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation , author=
-
[5]
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=
Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x , author=. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages=
-
[7]
Forty-first International Conference on Machine Learning , year=
Gptswarm: Language agents as optimizable graphs , author=. Forty-first International Conference on Machine Learning , year=
-
[8]
G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks , author=
-
[9]
TimeART: Towards Agentic Time Series Reasoning via Tool-Augmentation , author=. 2026 , archivePrefix=
work page 2026
-
[11]
Forty-first International Conference on Machine Learning , year=
Improving factuality and reasoning in language models through multiagent debate , author=. Forty-first International Conference on Machine Learning , year=
-
[12]
The Eleventh International Conference on Learning Representations , year=
Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. The Eleventh International Conference on Learning Representations , year=
-
[13]
Advances in neural information processing systems , volume=
Ddxplus: A new dataset for automatic medical diagnosis , author=. Advances in neural information processing systems , volume=
-
[14]
Proceedings of the 2018 conference on empirical methods in natural language processing , pages=
HotpotQA: A dataset for diverse, explainable multi-hop question answering , author=. Proceedings of the 2018 conference on empirical methods in natural language processing , pages=
work page 2018
-
[15]
International Conference on Machine Learning , pages=
DS-1000: A natural and reliable benchmark for data science code generation , author=. International Conference on Machine Learning , pages=. 2023 , organization=
work page 2023
-
[16]
Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , author=. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[18]
Proceedings of the 2015 conference on empirical methods in natural language processing , pages=
Solving general arithmetic word problems , author=. Proceedings of the 2015 conference on empirical methods in natural language processing , pages=
work page 2015
-
[20]
Measuring Massive Multitask Language Understanding , author=
-
[21]
A practical Bayesian framework for backpropagation networks , author=. Neural computation , volume=. 1992 , publisher=
work page 1992
-
[22]
Wu, Xingjian and Qiu, Xiangfei and Li, Zhengyu and Wang, Yihang and Hu, Jilin and Guo, Chenjuan and Xiong, Hui and Yang, Bin , booktitle =
-
[23]
Categorical Reparameterization with Gumbel-Softmax
Categorical reparameterization with gumbel-softmax , author=. arXiv preprint arXiv:1611.01144 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[24]
Proceedings of the IEEE/CVF international conference on computer vision , pages=
Scalable diffusion models with transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
-
[25]
Computational Social Networks , volume=
Graph convolutional networks: a comprehensive review , author=. Computational Social Networks , volume=. 2019 , publisher=
work page 2019
-
[26]
Forty-second International Conference on Machine Learning , year=
Sundial: A Family of Highly Capable Time Series Foundation Models , author=. Forty-second International Conference on Machine Learning , year=
-
[27]
Large language model agent in financial trading: A survey,
Large language model agent in financial trading: A survey , author=. arXiv preprint arXiv:2408.06361 , year=
-
[29]
Learning phrase representations using RNN encoder--decoder for statistical machine translation , author=. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) , pages=
work page 2014
-
[30]
Proceedings of the 22nd international conference on Machine learning , pages=
Learning to rank using gradient descent , author=. Proceedings of the 22nd international conference on Machine learning , pages=
-
[33]
Learning multi-agent coordination through connectivity-driven communication , author=. Machine Learning , volume=. 2023 , publisher=
work page 2023
-
[34]
LLM Multi-Agent Systems: Challenges and Open Problems
LLM multi-agent systems: Challenges and open problems , author=. arXiv preprint arXiv:2402.03578 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[35]
The Thirteenth International Conference on Learning Representations , year=
AFlow: Automating Agentic Workflow Generation , author=. The Thirteenth International Conference on Learning Representations , year=
-
[39]
Advances in neural information processing systems , volume=
Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
-
[40]
The twelfth international conference on learning representations , year=
MetaGPT: Meta programming for a multi-agent collaborative framework , author=. The twelfth international conference on learning representations , year=
-
[42]
The eleventh international conference on learning representations , year=
React: Synergizing reasoning and acting in language models , author=. The eleventh international conference on learning representations , year=
-
[43]
arXiv preprint arXiv:2501.07834 , year=
Flow: Modularized agentic workflow automation , author=. arXiv preprint arXiv:2501.07834 , year=
-
[45]
A comprehensive survey of self-evolving ai agents: A new paradigm bridging foundation models and lifelong agentic systems , author=. arXiv preprint arXiv:2508.07407 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[46]
Evoagentx: An automated framework for evolving agentic workflows , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations , pages=
work page 2025
-
[48]
A survey on diffusion models for time series and spatio-temporal data , author=. ACM Computing Surveys , year=
-
[49]
IEEE transactions on knowledge and data engineering , volume=
Deep learning for spatio-temporal data mining: A survey , author=. IEEE transactions on knowledge and data engineering , volume=. 2020 , publisher=
work page 2020
-
[50]
Yaron Lipman and Ricky T. Q. Chen and Heli Ben. Flow Matching for Generative Modeling , booktitle =. 2023 , timestamp =
work page 2023
-
[51]
Aurora: Towards Universal Generative Multimodal Time Series Forecasting , author =. ICLR , year =
-
[52]
arXiv preprint arXiv:2510.08558 , year=
Agent learning via early experience , author=. arXiv preprint arXiv:2510.08558 , year=
-
[53]
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence , pages=
Large language model as a policy teacher for training reinforcement learning agents , author=. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence , pages=
-
[54]
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving , author=. 2024 , eprint=
work page 2024
-
[55]
Chatdev: Communicative agents for software development , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[56]
Advances in Neural Information Processing Systems , volume=
Multi-LLM debate: Framework, principals, and interventions , author=. Advances in Neural Information Processing Systems , volume=
-
[57]
Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc V. Le and Geoffrey E. Hinton and Jeff Dean , title =. 5th International Conference on Learning Representations,. 2017 , url =
work page 2017
-
[58]
Proceedings of the 25th international conference on Machine learning , pages=
Listwise approach to learning to rank: theory and algorithm , author=. Proceedings of the 25th international conference on Machine learning , pages=
-
[59]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Facenet: A unified embedding for face recognition and clustering , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[60]
Advances in Neural Information Processing Systems , volume=
Mixture-of-experts with expert choice routing , author=. Advances in Neural Information Processing Systems , volume=
-
[61]
9th International Conference on Learning Representations,
Dmitry Lepikhin and HyoukJoong Lee and Yuanzhong Xu and Dehao Chen and Orhan Firat and Yanping Huang and Maxim Krikun and Noam Shazeer and Zhifeng Chen , title =. 9th International Conference on Learning Representations,. 2021 , url =
work page 2021
-
[62]
Zhentao Xie and Chengcheng Han and Jinxin Shi and Wenjun Cui and Xin Zhao and Xingjiao Wu and Jiabao Zhao , editor =. RMoA: Optimizing Mixture-of-Agents through Diversity Maximization and Residual Compensation , booktitle =. 2025 , url =
work page 2025
-
[63]
The Thirteenth International Conference on Learning Representations , year=
Mixture-of-Agents Enhances Large Language Model Capabilities , author=. The Thirteenth International Conference on Learning Representations , year=
-
[64]
ChemAgent: Self-updating memories in large language models improves chemical reasoning
Chemagent: Self-updating library in large language models improves chemical reasoning , author=. arXiv preprint arXiv:2501.06590 , year=
-
[65]
Proceedings of the Annual Meeting of the Cognitive Science Society , volume=
Temporal dynamic weighted graph convolution for multi-agent reinforcement learning , author=. Proceedings of the Annual Meeting of the Cognitive Science Society , volume=
-
[69]
Tora: A tool-integrated reasoning agent for mathematical problem solving, 2024
Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, and Weizhu Chen. Tora: A tool-integrated reasoning agent for mathematical problem solving, 2024
work page 2024
-
[70]
Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x
Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Lei Shen, Zihan Wang, Andi Wang, Yang Li, et al. Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5673--5684, 2023
work page 2023
-
[71]
Ye Jin, Xiaoxi Shen, Huiling Peng, Xiaoan Liu, Jingli Qin, Jiayang Li, Jintao Xie, Peizhong Gao, Guyue Zhou, and Jiangtao Gong. Surrealdriver: Designing generative driver agent simulation framework in urban contexts based on large language model. arXiv preprint arXiv:2309.13193, 5 0 (7): 0 8, 2023
-
[72]
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
Yuxiang Zheng, Dayuan Fu, Xiangkun Hu, Xiaojie Cai, Lyumanshan Ye, Pengrui Lu, and Pengfei Liu. Deepresearcher: Scaling deep research via reinforcement learning in real-world environments. arXiv preprint arXiv:2504.03160, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[73]
Metagpt: Meta programming for a multi-agent collaborative framework
Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collaborative framework. In The twelfth international conference on learning representations, 2023
work page 2023
-
[74]
Significant Gravitas . Autogpt. https://github.com/Significant-Gravitas/AutoGPT, 2023
work page 2023
-
[75]
Evoagentx: An automated framework for evolving agentic workflows
Yingxu Wang, Siwei Liu, Jinyuan Fang, and Zaiqiao Meng. Evoagentx: An automated framework for evolving agentic workflows. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 643--655, 2025 a
work page 2025
-
[76]
G-designer: Architecting multi-agent communication topologies via graph neural networks
Guibin Zhang, Yanwei Yue, Xiangguo Sun, Guancheng Wan, Miao Yu, Junfeng Fang, Kun Wang, Tianlong Chen, and Dawei Cheng. G-designer: Architecting multi-agent communication topologies via graph neural networks. In Forty-second International Conference on Machine Learning
-
[77]
Shiyuan Li, Yixin Liu, Qingsong Wen, Chengqi Zhang, and Shirui Pan. Assemble your crew: Automatic multi-agent communication topology design via autoregressive graph generation. arXiv preprint arXiv:2507.18224, 2025 a
-
[78]
Ruijia Zhang, Xinyan Zhao, Ruixiang Wang, Sigen Chen, Guibin Zhang, An Zhang, Kun Wang, and Qingsong Wen. Safesieve: From heuristics to experience in progressive pruning for llm-based multi-agent communication. arXiv preprint arXiv:2508.11733, 2025
-
[79]
Mixture-of-agents enhances large language model capabilities
Junlin Wang, WANG Jue, Ben Athiwaratkun, Ce Zhang, and James Zou. Mixture-of-agents enhances large language model capabilities. In The Thirteenth International Conference on Learning Representations, 2025 b
work page 2025
-
[80]
Rmoa: Optimizing mixture-of-agents through diversity maximization and residual compensation
Zhentao Xie, Chengcheng Han, Jinxin Shi, Wenjun Cui, Xin Zhao, Xingjiao Wu, and Jiabao Zhao. Rmoa: Optimizing mixture-of-agents through diversity maximization and residual compensation. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors, Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austr...
work page 2025
-
[81]
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc V. Le, Geoffrey E. Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings . OpenReview.net, 2017. URL h...
work page 2017
-
[82]
Gshard: Scaling giant models with conditional computation and automatic sharding
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. Gshard: Scaling giant models with conditional computation and automatic sharding. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 . OpenReview.net, 2021. URL http...
work page 2021
-
[83]
A practical bayesian framework for backpropagation networks
David JC MacKay. A practical bayesian framework for backpropagation networks. Neural computation, 4 0 (3): 0 448--472, 1992
work page 1992
-
[84]
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, and Zhicheng Dou. Search-o1: Agentic search-enhanced large reasoning models. arXiv preprint arXiv:2501.05366, 2025 b
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[85]
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models, 2023. URL https://arxiv. org/abs/2305.10601, 3: 0 1, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[86]
Autogen: Enabling next-gen llm applications via multi-agent conversation
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversation. In ICLR 2024 Workshop on Large Language Model (LLM) Agents
work page 2024
-
[87]
Gptswarm: Language agents as optimizable graphs
Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and J \"u rgen Schmidhuber. Gptswarm: Language agents as optimizable graphs. In Forty-first International Conference on Machine Learning, 2024
work page 2024
- [88]
-
[89]
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration
Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, and Diyi Yang. Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization. arXiv preprint arXiv:2310.02170, 2023
work page internal anchor Pith review arXiv 2023
-
[90]
Temporal dynamic weighted graph convolution for multi-agent reinforcement learning
Yuntao Liu, Yong Dou, Yuan Li, Xinhai Xu, and Donghong Liu. Temporal dynamic weighted graph convolution for multi-agent reinforcement learning. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 44, 2022
work page 2022
-
[91]
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[92]
Learning multi-agent communication from graph modeling perspective
Shengchao Hu, Li Shen, Ya Zhang, and Dacheng Tao. Learning multi-agent communication from graph modeling perspective. arXiv preprint arXiv:2405.08550, 2024
-
[93]
Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Kunlun Zhu, Hanchen Xia, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, et al. Scaling large language model-based multi-agent collaboration. arXiv preprint arXiv:2406.07155, 2024
-
[94]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers, I Sentence-BERT Gurevych, et al. Sentence embeddings using siamese bert-networks. arxiv 2019. arXiv preprint arXiv:1908.10084, 10, 1908
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[95]
Learning phrase representations using rnn encoder--decoder for statistical machine translation
Kyunghyun Cho, Bart Van Merri \"e nboer, C a g lar Gul c ehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder--decoder for statistical machine translation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1724--1734, 2014
work page 2014
-
[96]
Learning multi-agent coordination through connectivity-driven communication
Emanuele Pesce and Giovanni Montana. Learning multi-agent coordination through connectivity-driven communication. Machine Learning, 112 0 (2): 0 483--514, 2023
work page 2023
-
[97]
Agentic Reinforced Policy Optimization
Guanting Dong, Hangyu Mao, Kai Ma, Licheng Bao, Yifei Chen, Zhongyuan Wang, Zhongxia Chen, Jiazhen Du, Huiyang Wang, Fuzheng Zhang, et al. Agentic reinforced policy optimization. arXiv preprint arXiv:2507.19849, 2025 a
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[98]
Agentic entropy-balanced policy optimization.arXiv preprint arXiv:2510.14545, 2025
Guanting Dong, Licheng Bao, Zhongyuan Wang, Kangzhi Zhao, Xiaoxi Li, Jiajie Jin, Jinghan Yang, Hangyu Mao, Fuzheng Zhang, Kun Gai, et al. Agentic entropy-balanced policy optimization. arXiv preprint arXiv:2510.14545, 2025 b
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.