pith. sign in

arxiv: 2606.05304 · v1 · pith:MQIOUAXJnew · submitted 2026-06-03 · 💻 cs.AI

What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems

Pith reviewed 2026-06-28 06:20 UTC · model grok-4.3

classification 💻 cs.AI
keywords multi-agent systemslarge language modelsinter-agent communicationtoken efficiencyaction-state recordsPACT
0
0 comments X

The pith

Projecting each agent's raw output to a compact action-state record lets multi-agent LLM systems match or exceed task performance while using far fewer tokens.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how LLM-based multi-agent systems pass messages and finds that unconstrained natural language quickly inflates token counts and crowds the shared context. Analysis of five common strategies across two topologies shows that no single approach works best everywhere, but messages that keep only the action-centered facts needed by the next agent stay effective. From this observation the authors build PACT, a method that converts every raw output into a short action-state record and treats the shared history as a public state that gets updated rather than appended. Tests on varied layouts and on real coding agents show the compact records deliver the same or better results at much lower token cost.

Core claim

PACT treats inter-agent communication as a public state-update problem and projects each raw agent output into a compact action-state record before it enters shared history. Across different MAS topologies this yields comparable or stronger task performance with substantially fewer tokens. The same gains appear in production coding harnesses: PACT raises OpenHands resolve rate while cutting tokens-per-resolved by 10 percent and keeps SWE-agent resolve rate unchanged while halving input tokens.

What carries the argument

PACT (Protocolized Action-state Communication and Transmission), which converts raw agent outputs into compact action-state records that update a shared public state.

If this is right

  • PACT improves the performance-cost trade-off on every MAS topology tested.
  • PACT raises OpenHands resolve rate while reducing tokens-per-resolved by 10 percent.
  • PACT keeps SWE-agent resolve rate unchanged while halving input tokens.
  • No fixed communication strategy is optimal for all topologies; only messages that retain action-centered information remain reliable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same state-update framing could be applied to non-coding multi-agent workflows where token budgets are tight.
  • If the records lose information on some edge tasks, the method would need task-specific extensions or fallback rules.
  • Large-scale deployments that hit context limits first would see the largest absolute cost savings.

Load-bearing premise

That the compact action-state records always preserve every fact downstream agents need, with no task-specific loss.

What would settle it

Any concrete task and topology where replacing raw language with the compact records produces measurably lower final performance than the baseline communication method.

Figures

Figures reproduced from arXiv: 2606.05304 by Chen Huang, Wenxuan Zhang, Yuhao Wu.

Figure 1
Figure 1. Figure 1: Free-form inter-agent messages accumulate in [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Five inter-agent communication strategies in two MAS settings at three model scales. Top two rows: [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average agent turns per interaction dialogue: [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Illustrative interaction turn with and without [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
read the original abstract

Multi-agent systems (MAS) built on large language models are typically organized around roles, pipelines, and turn schedules, while the content that agents pass to one another is often left as unconstrained natural language. However, this free-form communication can rapidly inflate token usage, consume the shared context window, and ultimately affect both system performance and inference cost. We analyze five common inter-agent communication strategies across two MAS topologies, finding that no fixed strategy is universally optimal. Instead, effective inter-agent messages consistently preserve action-centered information needed by downstream agents. Building on this, we propose the PACT (Protocolized Action-state Communication and Transmission), which treats inter-agent communication as a public state-update problem and projects each raw agent output into a compact action-state record before it enters shared history. Across different MAS topologies, PACT consistently improves the performance-cost trade-off, achieving comparable or stronger task performance with substantially fewer tokens. The gains extend to production coding harnesses: PACT lifts OpenHands' resolve rate at -10% tokens-per-resolved, and is resolve-neutral on SWE-agent while halving input tokens. Our code is publicly available at https://github.com/iNLP-Lab/PACT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper analyzes five common inter-agent communication strategies across two MAS topologies in LLM-based multi-agent systems, finding no fixed strategy is universally optimal and that effective messages preserve action-centered information. It proposes PACT, which projects raw agent outputs into compact action-state records as public state updates to reduce token usage. Empirical results claim consistent improvements in the performance-cost trade-off across topologies, with specific gains on production coding harnesses: improved resolve rate on OpenHands at -10% tokens-per-resolved, and resolve-neutral performance on SWE-agent with halved input tokens. Code is released publicly.

Significance. If the results hold, PACT provides a practical protocol for lowering inference costs and context pressure in multi-agent LLM systems without performance loss. The public code release is a strength that supports reproducibility and further testing.

major comments (1)
  1. [Abstract] Abstract: the claim that projecting raw outputs to compact action-state records 'preserves' all information required by downstream agents is load-bearing for the 'comparable or stronger task performance' result, yet is supported only by empirical outcomes on the tested topologies and harnesses; no formal completeness argument or mechanism is supplied to guarantee retention of private reasoning or dependency chains for arbitrary tasks.
minor comments (1)
  1. The abstract reports consistent improvements but supplies no experimental details on baselines, number of runs, error bars, or exclusion criteria, limiting assessment of the reported gains on OpenHands and SWE-agent.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting this important point on the abstract's phrasing. We address the concern directly below and agree that revisions are warranted to align language with the empirical nature of our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that projecting raw outputs to compact action-state records 'preserves' all information required by downstream agents is load-bearing for the 'comparable or stronger task performance' result, yet is supported only by empirical outcomes on the tested topologies and harnesses; no formal completeness argument or mechanism is supplied to guarantee retention of private reasoning or dependency chains for arbitrary tasks.

    Authors: We agree the results are strictly empirical and provide no formal completeness argument or mechanism that would guarantee retention of all private reasoning or dependency chains for arbitrary tasks. The manuscript demonstrates that action-state records suffice for the tested topologies and harnesses, but does not claim universality. We will revise the abstract (and related sections) to replace any implication of guaranteed preservation with explicit reference to observed empirical outcomes, e.g., 'achieving comparable or stronger task performance with substantially fewer tokens across the evaluated settings.' This change removes the load-bearing overclaim while preserving the paper's core empirical contribution. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation of PACT independent of self-referential definitions

full rationale

The paper analyzes five communication strategies empirically across MAS topologies, observes that action-centered information is effective, and introduces PACT as a projection method. Performance gains are reported from direct experiments on topologies and production harnesses (OpenHands, SWE-agent) with token and resolve-rate metrics. No equations, fitted parameters, uniqueness theorems, or self-citations are invoked in the provided text to derive the central claims; results are presented as measured outcomes rather than reductions by construction. The assumption that action-state records preserve necessary information is tested empirically but not claimed as proven by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the empirical observation that action-state information is sufficient; no free parameters, mathematical axioms, or new physical entities are introduced.

axioms (1)
  • domain assumption Action-centered information is the primary content needed by downstream agents in the tested MAS topologies
    Invoked to justify projecting raw outputs into compact records
invented entities (1)
  • PACT protocol no independent evidence
    purpose: To treat inter-agent communication as a public state-update problem and project outputs into action-state records
    New method introduced by the paper; no independent evidence outside the reported experiments

pith-pipeline@v0.9.1-grok · 5737 in / 1222 out tokens · 28560 ms · 2026-06-28T06:20:53.298884+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

78 extracted references · 32 canonical work pages · 22 internal anchors

  1. [1]

    Cohen and Ruslan Salakhutdinov and Christopher D

    Zhilin Yang and Peng Qi and Saizheng Zhang and Yoshua Bengio and William W. Cohen and Ruslan Salakhutdinov and Christopher D. Manning , title =. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages =. 2018 , doi =

  2. [2]

    Proceedings of the 28th International Conference on Computational Linguistics , pages =

    Xanh Ho and Anh-Khoa Duong Nguyen and Saku Sugawara and Akiko Aizawa , title =. Proceedings of the 28th International Conference on Computational Linguistics , pages =. 2020 , doi =

  3. [3]

    Training Verifiers to Solve Math Word Problems

    Karl Cobbe and Vineet Kosaraju and Mohammad Bavarian and Mark Chen and Heewoo Jun and Lukasz Kaiser and Matthias Plappert and Jerry Tworek and Jacob Hilton and Reiichiro Nakano and Christopher Hesse and John Schulman , title =. arXiv preprint arXiv:2110.14168 , year =

  4. [4]

    Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks , year =

    Dan Hendrycks and Collin Burns and Saurav Kadavath and Akul Arora and Steven Basart and Eric Tang and Dawn Song and Jacob Steinhardt , title =. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks , year =

  5. [6]

    2026 , month = apr, howpublished =

  6. [7]

    2026 , howpublished =

  7. [9]

    Proceedings of the 13th International Conference on Learning Representations , year =

    Weize Chen and Ziming You and Ran Li and Yitong Guan and Chen Qian and Chenyang Zhao and Cheng Yang and Ruobing Xie and Zhiyuan Liu and Maosong Sun , title =. Proceedings of the 13th International Conference on Learning Representations , year =

  8. [10]

    White and Doug Burger and Chi Wang , title =

    Qingyun Wu and Gagan Bansal and Jieyu Zhang and Yiran Wu and Beibin Li and Erkang Zhu and Li Jiang and Xiaoyun Zhang and Shaokun Zhang and Jiale Liu and Ahmed Hassan Awadallah and Ryen W. White and Doug Burger and Chi Wang , title =. Proceedings of the 12th International Conference on Learning Representations , year =

  9. [11]

    Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages =

    Zhenhailong Wang and Shaoguang Mao and Wenshan Wu and Tao Ge and Furu Wei and Heng Ji , title =. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages =. 2024 , doi =

  10. [12]

    Proceedings of the 12th International Conference on Learning Representations , year =

    Weize Chen and Yusheng Su and Jingwei Zuo and Cheng Yang and Chenfei Yuan and Chi-Min Chan and Heyang Yu and Yaxi Lu and Yi-Hsin Hung and Chen Qian and Yujia Qin and Xin Cong and Ruobing Xie and Zhiyuan Liu and Maosong Sun and Jie Zhou , title =. Proceedings of the 12th International Conference on Learning Representations , year =

  11. [13]

    Advances in Neural Information Processing Systems , volume =

    Guohao Li and Hasan Anil Hammoud and Hani Itani and Dmitrii Khizbullin and Bernard Ghanem , title =. Advances in Neural Information Processing Systems , volume =. 2023 , url =

  12. [14]

    Proceedings of the 11th International Conference on Learning Representations , year =

    Shunyu Yao and Jeffrey Zhao and Dian Yu and Nan Du and Izhak Shafran and Karthik Narasimhan and Yuan Cao , title =. Proceedings of the 11th International Conference on Learning Representations , year =

  13. [15]

    Le and Denny Zhou , title =

    Jason Wei and Xuezhi Wang and Dale Schuurmans and Maarten Bosma and Brian Ichter and Fei Xia and Ed Chi and Quoc V. Le and Denny Zhou , title =. Advances in Neural Information Processing Systems , volume =. 2022 , url =

  14. [16]

    Foundations and Trends in Information Retrieval , volume =

    Stephen Robertson and Hugo Zaragoza , title =. Foundations and Trends in Information Retrieval , volume =. 2009 , doi =

  15. [17]

    M., Bohnet, B., Rosias, L., Chan, S., Zhang, B., Anand, A., Abbas, Z., Nova, A., Co-Reyes, J

    Rishabh Agarwal and Avi Singh and Lei M. Zhang and Bernd Bohnet and Luis Rosias and Stephanie Chan and Biao Zhang and Ankesh Anand and Zaheer Abbas and Azade Nova and John D. Co-Reyes and Eric Chu and Feryal Behbahani and Aleksandra Faust and Hugo Larochelle , title =. arXiv preprint arXiv:2404.11018 , year =

  16. [18]

    Tenenbaum and Igor Mordatch , title =

    Yilun Du and Shuang Li and Antonio Torralba and Joshua B. Tenenbaum and Igor Mordatch , title =. Proceedings of the 41st International Conference on Machine Learning , pages =. 2024 , url =

  17. [20]

    arXiv preprint arXiv:2408.13654 , year =

    Hao Zhou and Chengkun Li and Junlang Qian and Zhen Huang and Fandong Meng and Jie Zhou , title =. arXiv preprint arXiv:2408.13654 , year =

  18. [21]

    The Llama 3 Herd of Models

    The. arXiv preprint arXiv:2407.21783 , year =

  19. [23]

    arXiv preprint arXiv:2303.08774 , year =

  20. [24]

    arXiv preprint arXiv:2405.14394 , year =

    Yifan Shen and Zhiqi Bu and Fang Chen and Jing Li , title =. arXiv preprint arXiv:2405.14394 , year =

  21. [25]

    Bowman , title =

    David Rein and Betty Li Hou and Asa Cooper Stickland and Jackson Petty and Richard Yuanzhe Pang and Julien Dirani and Julian Michael and Samuel R. Bowman , title =. Proceedings of the First Conference on Language Modeling (COLM) , year =

  22. [26]

    Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages =

    Todor Mihaylov and Peter Clark and Tushar Khot and Ashish Sabharwal , title =. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages =. 2018 , doi =

  23. [27]

    Evaluating Large Language Models Trained on Code

    Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and others , title =. arXiv preprint arXiv:2107.03374 , year =

  24. [28]

    Program Synthesis with Large Language Models

    Jacob Austin and Augustus Odena and Maxwell Nye and Maarten Bosma and others , title =. arXiv preprint arXiv:2108.07732 , year =

  25. [29]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Jiawei Liu and Chunqiu Steven Xia and Yuyao Wang and Lingming Zhang , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  26. [30]

    Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik Narasimhan , title =

    Carlos E. Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik Narasimhan , title =. Proceedings of the Twelfth International Conference on Learning Representations (ICLR) , year =

  27. [31]

    Xu and Xiangru Tang and Mingchen Zhuge and Jiayi Pan and Yueqi Song and Bowen Li and Jaskirat Singh and Hoang H

    Xingyao Wang and Boxuan Li and Yufan Song and Frank F. Xu and Xiangru Tang and Mingchen Zhuge and Jiayi Pan and Yueqi Song and Bowen Li and Jaskirat Singh and Hoang H. Tran and Fuqiang Li and Ren Ma and Mingzhang Zheng and Bill Qian and Yanjun Shao and Niklas Muennighoff and Yizhe Zhang and Binyuan Hui and Junyang Lin and Robert Brennan and Hao Peng and H...

  28. [32]

    Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik Narasimhan and Ofir Press , title =

    John Yang and Carlos E. Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik Narasimhan and Ofir Press , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  29. [34]

    Advances in Neural Information Processing Systems , volume=

    Why do multi-agent llm systems fail? , author=. Advances in Neural Information Processing Systems , volume=

  30. [35]

    International Conference on Learning Representations , volume=

    MetaGPT: Meta programming for a multi-agent collaborative framework , author=. International Conference on Learning Representations , volume=

  31. [37]

    Advances in Neural Information Processing Systems , volume=

    Chain of agents: Large language models collaborating on long-context tasks , author=. Advances in Neural Information Processing Systems , volume=

  32. [39]

    Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

    Improving multi-agent debate with sparse communication topology , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

  33. [42]

    Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=

    Chatdev: Communicative agents for software development , author=. Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) , pages=

  34. [43]

    2024 , howpublished =

  35. [44]

    2025 , howpublished =

  36. [45]

    International Conference on Learning Representations , volume=

    Cut the crap: An economical communication pipeline for llm-based multi-agent systems , author=. International Conference on Learning Representations , volume=

  37. [46]

    Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

    Understanding the information propagation effects of communication topologies in llm-based multi-agent systems , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

  38. [47]

    Forty-first International Conference on Machine Learning , year=

    Executable code actions elicit better llm agents , author=. Forty-first International Conference on Machine Learning , year=

  39. [48]

    S2-mad: Breaking the token barrier to enhance multi-agent debate efficiency , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

  40. [49]

    Anthropic . 2026 a . Claude Code . https://claude.com/product/claude-code

  41. [50]

    Anthropic . 2026 b . Introducing Claude Opus 4.7 . https://www.anthropic.com/news/claude-opus-4-7

  42. [51]

    Longju Bai, Zhemin Huang, Xingyao Wang, Jiao Sun, Rada Mihalcea, Erik Brynjolfsson, Alex Pentland, and Jiaxin Pei. 2026. How do ai agents spend your money? analyzing and predicting token consumption in agentic coding tasks. arXiv preprint arXiv:2604.22750

  43. [52]

    Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, and 1 others. 2026. Why do multi-agent llm systems fail? Advances in Neural Information Processing Systems, 38

  44. [53]

    Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, and Jie Zhou. 2024. https://arxiv.org/abs/2308.10848 AgentVerse : Facilitating multi-agent collaboration and exploring emergent behaviors in agents . In Proceedings of the 12t...

  45. [54]

    Weize Chen, Ziming You, Ran Li, Yitong Guan, Chen Qian, Chenyang Zhao, Cheng Yang, Ruobing Xie, Zhiyuan Liu, and Maosong Sun. 2025. https://arxiv.org/abs/2410.08115 Optima : Optimizing effectiveness and efficiency for LLM -based multi-agent system . In Proceedings of the 13th International Conference on Learning Representations

  46. [55]

    Improving Factuality and Reasoning in Language Models through Multiagent Debate

    Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. 2024. https://arxiv.org/abs/2305.14325 Improving factuality and reasoning in language models through multiagent debate . In Proceedings of the 41st International Conference on Machine Learning, pages 11850--11881

  47. [56]

    Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. 2020. https://doi.org/10.18653/v1/2020.coling-main.580 Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps . In Proceedings of the 28th International Conference on Computational Linguistics, pages 6609--6625

  48. [57]

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Steven Yau, Zijuan Lin, Liyang Zhou, and 1 others. 2024. Metagpt: Meta programming for a multi-agent collaborative framework. In International Conference on Learning Representations, volume 2024, pages 23247--23275

  49. [58]

    SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

    Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2024. https://arxiv.org/abs/2310.06770 SWE-bench : Can language models resolve real-world GitHub issues? In Proceedings of the Twelfth International Conference on Learning Representations (ICLR)

  50. [59]

    Guohao Li, Hasan Anil Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. 2023. https://arxiv.org/abs/2303.17760 CAMEL : Communicative agents for ``mind'' exploration of large language model society . In Advances in Neural Information Processing Systems, volume 36, pages 51991--52008

  51. [60]

    Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, and Eugene Ie. 2024. Improving multi-agent debate with sparse communication topology. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 7281--7294

  52. [61]

    Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, and Shuming Shi. 2023. https://arxiv.org/abs/2305.19118 Encouraging divergent thinking in large language models through multi-agent debate

  53. [62]

    math-ai . 2025. AIME 2025 dataset. https://huggingface.co/datasets/math-ai/aime25. Hugging Face dataset

  54. [63]

    Maxwell-Jia . 2024. AIME 2024 dataset. https://huggingface.co/datasets/Maxwell-Jia/AIME_2024. Hugging Face dataset

  55. [64]

    Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. 2018. https://doi.org/10.18653/v1/D18-1260 Can a suit of armor conduct electricity? a new dataset for open book question answering . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2381--2391

  56. [65]

    OpenAI . 2026 a . Codex . https://openai.com/codex/

  57. [66]

    OpenAI . 2026 b . Introducing GPT-5.5 . https://openai.com/index/introducing-gpt-5-5/

  58. [67]

    Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, and 1 others. 2024. Chatdev: Communicative agents for software development. In Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers), pages 15174--15186

  59. [68]

    Vignav Ramesh and Kenneth Li. 2025. Communicating activations between language model agents. arXiv preprint arXiv:2501.14082

  60. [69]

    David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R. Bowman. 2024. https://arxiv.org/abs/2311.12022 GPQA : A graduate-level google-proof Q&A benchmark . In Proceedings of the First Conference on Language Modeling (COLM)

  61. [70]

    Xu Shen, Yixin Liu, Yiwei Dai, Yili Wang, Rui Miao, Yue Tan, Shirui Pan, and Xin Wang. 2025. Understanding the information propagation effects of communication topologies in llm-based multi-agent systems. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12358--12372

  62. [71]

    Kimi Team, Tongtong Bai, Yifan Bai, Yiping Bao, SH Cai, Yuan Cao, Y Charles, HS Che, Cheng Chen, Guanduo Chen, and 1 others. 2026. Kimi k2. 5: Visual agentic intelligence. arXiv preprint arXiv:2602.02276

  63. [72]

    Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, and James Zou. 2024 a . https://arxiv.org/abs/2406.04692 Mixture-of-agents enhances large language model capabilities

  64. [73]

    Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, and Heng Ji. 2024 b . Executable code actions elicit better llm agents. In Forty-first International Conference on Machine Learning

  65. [74]

    OpenHands: An Open Platform for AI Software Developers as Generalist Agents

    Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, and 5 others. 2025. https://arxiv.org/abs/2407.16741 OpenHands : An open platform for AI software develope...

  66. [75]

    Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, and Heng Ji. 2024 c . https://doi.org/10.18653/v1/2024.naacl-long.15 Unleashing the emergent cognitive synergy in large language models: A task-solving agent through multi-persona self-collaboration . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Co...

  67. [76]

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc V. Le, and Denny Zhou. 2022. https://arxiv.org/abs/2201.11903 Chain-of-thought prompting elicits reasoning in large language models . In Advances in Neural Information Processing Systems, volume 35, pages 24824--24837

  68. [77]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W. White, Doug Burger, and Chi Wang. 2024. https://arxiv.org/abs/2308.08155 AutoGen : Enabling next-gen LLM applications via multi-agent conversation . In Proceedings of the 12th International Conference o...

  69. [78]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, and 1 others. 2025. Qwen3 technical report. arXiv preprint arXiv:2505.09388

  70. [79]

    SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

    John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. https://arxiv.org/abs/2405.15793 SWE-agent : Agent-computer interfaces enable automated software engineering . In Advances in Neural Information Processing Systems (NeurIPS)

  71. [80]

    In: Proceedings of the 2018 Conference on Empirical Methods in Natu- ral Language Processing

    Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. https://doi.org/10.18653/v1/D18-1259 HotpotQA : A dataset for diverse, explainable multi-hop question answering . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369--2380

  72. [81]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. https://arxiv.org/abs/2210.03629 ReAct : Synergizing reasoning and acting in language models . In Proceedings of the 11th International Conference on Learning Representations

  73. [82]

    Ye Yu, Heming Liu, Haibo Jin, Xiaopeng Yuan, Peng Kuang, and Haohan Wang. 2026. Learning to communicate: Toward end-to-end optimization of multi-agent language systems. arXiv preprint arXiv:2604.21794

  74. [83]

    Yuting Zeng, Weizhe Huang, Lei Jiang, Tongxuan Liu, Xitai Jin, Chen Tianying Tiana, Jing Li, and Xiaohua Xu. 2025. S2-mad: Breaking the token barrier to enhance multi-agent debate efficiency. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1:...

  75. [84]

    Guibin Zhang, Yanwei Yue, Zhixun Li, Sukwon Yun, Guancheng Wan, Kun Wang, Dawei Cheng, Jeffrey Yu, and Tianlong Chen. 2025. Cut the crap: An economical communication pipeline for llm-based multi-agent systems. In International Conference on Learning Representations, volume 2025, pages 75389--75428

  76. [85]

    Yusen Zhang, Ruoxi Sun, Yanfei Chen, Tomas Pfister, Rui Zhang, and Sercan \"O Ar k. 2024. Chain of agents: Large language models collaborating on long-context tasks. Advances in Neural Information Processing Systems, 37:132208--132237

  77. [86]

    Jiaxing Zhao, Hongbin Xie, Yuzhen Lei, Xuan Song, Zhuoran Shi, Lianxin Li, Shuangxue Liu, and Haoran Zhang. 2025. Connecting the dots: A chain-of-collaboration prompting framework for llm agents. arXiv preprint arXiv:2505.10936

  78. [87]

    Jiaru Zou, Xiyuan Yang, Ruizhong Qiu, Gaotang Li, Katherine Tieu, Pan Lu, Ke Shen, Hanghang Tong, Yejin Choi, Jingrui He, and 1 others. 2025. Latent collaboration in multi-agent systems. arXiv preprint arXiv:2511.20639