pith. machine review for the scientific record. sign in

arxiv: 2604.17503 · v1 · submitted 2026-04-19 · 💻 cs.AI · cs.MA

Recognition: unknown

SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:19 UTC · model grok-4.3

classification 💻 cs.AI cs.MA
keywords multi-agent systemsvision-language modelsgraph transformersself-evolving agentsdynamic topologyskill distillationvisual multiagent systems
0
0 comments X

The pith

SkillGraph jointly evolves agent skills from failures and dynamically predicts collaboration topologies using a multimodal graph transformer, yielding consistent gains on visual multi-agent benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies that fixed topologies and static agents limit visual multi-agent systems. SkillGraph addresses this by having a Multimodal Graph Transformer create content-aware collaboration graphs and a Skill Designer extract heuristics from mistakes to update a skill bank. These updates feed back to refine the topology for each query. The result is a self-improving system that outperforms static setups without changing the underlying models.

Core claim

SkillGraph achieves self-evolving multi-agent collaboration by using a Multimodal Graph Transformer to encode visual tokens, semantics, and skills into a query-conditioned graph, while the Skill Designer distills failure cases into an evolving Skill Bank whose embeddings loop back to improve the graph prediction.

What carries the argument

Multimodal Graph Transformer (MMGT) that predicts dynamic collaboration graphs from multimodal inputs including active skill embeddings, coupled with the Skill Designer for self-evolving Skill Bank.

Load-bearing premise

Distilling reasoning heuristics from failure cases will lead to meaningful skill improvements that can be effectively encoded and used to adapt the topology without introducing errors or instability.

What would settle it

Demonstrating that the evolved skills and adapted topologies lead to no improvement or worse results than fixed topologies and static agents on the same benchmarks would falsify the central benefit.

Figures

Figures reproduced from arXiv: 2604.17503 by Bo Yin, Jiangning Zhang, Ruolin Shen, Xiaobin Hu, Xinlei Yu, Zheng Nie.

Figure 1
Figure 1. Figure 1: Comparison of VMAS paradigms. Prior VMAS uses static topologies and frozen skills. Our SkillGraph enables a co-evolution loop: MMGT predicts dynamic collaboration graphs, while a skill bank self-evolves agent capabilities. However, as we scale these visual multi-agent collectives, a fundamental bot￾tleneck emerges: the structural and cognitive rigidity of current VMAS frame￾works. Existing systems primaril… view at source ↗
Figure 2
Figure 2. Figure 2: SkillGraph Framework. The system operates in three stages: VMAS Con￾struction: Agents retrieve dynamic skills to initialize policy-aware node features. Topology Design: The Multimodal Graph Transformer (MMGT) fuses visual patches and task semantics to predict a query-conditioned communication topology. Adaptive Skill Evolution: A Skill Designer refines skills using failure logs. Updated skills feed directl… view at source ↗
Figure 3
Figure 3. Figure 3: Ablation study of SkillGraph components. [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance of SkillGraph across different iteration numbers. [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Evolution of the Skill Bank across iterations. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
read the original abstract

Scaling vision-language models into Visual Multiagent Systems (VMAS) is hindered by two coupled issues. First, communication topologies are fixed before inference, leaving them blind to visual content and query context; second, agent reasoning abilities remain static during deployment. These issues reinforce each other: a rigid topology fails to leverage richer agent expertise, while static agents lack incentives to specialize for a given query. We address this with SkillGraph, a joint framework that evolves both agent expertise and communication topology. Within this framework, a Multimodal Graph Transformer (MMGT) encodes visual tokens, instruction semantics and active skill embeddings to predict a query-conditioned collaboration graph, replacing hand-crafted routing with dynamic, content-aware information flow. Complementing this, a Skill Designer distills and refines reasoning heuristics from failure cases, constructing a self-evolving multimodal Skill Bank. Crucially, updated skill embeddings are fed back into the MMGT, enabling the topology to adapt alongside capability growth. Experiments show that SkillGraph achieves consistent improvements across four benchmarks, five common MAS structures and four base models. Code is available at https://github.com/niez233/skillgraph.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper introduces SkillGraph, a joint framework for evolving both agent expertise and communication topology in Visual Multiagent Systems (VMAS). It uses a Multimodal Graph Transformer (MMGT) to encode visual tokens, instruction semantics, and active skill embeddings to predict query-conditioned collaboration graphs, replacing fixed topologies with dynamic, content-aware routing. A Skill Designer distills reasoning heuristics from failure cases to build and update a multimodal Skill Bank, with updated skill embeddings fed back into the MMGT to enable co-evolution of capabilities and structure. Experiments report consistent improvements across four benchmarks, five common MAS structures, and four base models, with code released.

Significance. If the empirical results hold under detailed scrutiny, SkillGraph offers a meaningful advance in adaptive multi-agent visual reasoning by directly addressing the coupling between rigid topologies and static agent abilities. The multi-setting evaluation (benchmarks, structures, base models) and code release are clear strengths that support reproducibility and generalizability claims.

minor comments (2)
  1. The abstract would be strengthened by including at least one key quantitative result (e.g., average improvement or specific benchmark scores) to better substantiate the claim of consistent gains.
  2. Clarify in the method description how the Skill Bank embeddings are precisely integrated into the MMGT input (e.g., via concatenation, attention, or a dedicated module) to avoid ambiguity in the feedback loop.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The report correctly identifies the core contribution of SkillGraph in jointly evolving agent skills and query-conditioned topologies via the Multimodal Graph Transformer and Skill Designer. No specific major comments were listed in the provided report, so we have no points requiring detailed rebuttal or revision at this stage. We are happy to incorporate any minor suggestions the editor or referee may wish to add.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes SkillGraph as an empirical framework that combines a Multimodal Graph Transformer for dynamic topology prediction with a Skill Designer for distilling heuristics from failures, feeding updated embeddings back into the system. No equations, derivations, or parameter-fitting procedures are presented in the abstract or described structure that would allow any claimed prediction or result to reduce by construction to its own inputs. The central claims rest on experimental improvements across benchmarks, structures, and models, supported by released code for independent reproduction. This is a standard descriptive systems paper without self-definitional loops, fitted-input predictions, or load-bearing self-citations that collapse the argument.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Based on abstract, the framework introduces new components like MMGT and Skill Designer, but without full text, details on parameters or axioms are unknown. No free parameters explicitly mentioned.

invented entities (2)
  • Multimodal Graph Transformer (MMGT) no independent evidence
    purpose: To encode visual tokens, instruction semantics and active skill embeddings to predict query-conditioned collaboration graph
    Introduced as part of the framework.
  • Skill Bank no independent evidence
    purpose: To store and refine reasoning heuristics from failure cases
    New component for self-evolving skills.

pith-pipeline@v0.9.0 · 5509 in / 1116 out tokens · 44699 ms · 2026-05-10T05:19:50.138134+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 40 canonical work pages · 12 internal anchors

  1. [1]

    Evoskill: Automated skill discovery for multi-agent systems.arXiv preprint arXiv:2603.02766, 2026

    Alzubi, S., Provenzano, N., Bingham, J., Chen, W., Vu, T.: Evoskill: Automated skill discovery for multi-agent systems. arXiv preprint arXiv:2603.02766 (2026) 4

  2. [2]

    Qwen3-VL Technical Report

    Bai, S., Cai, Y., Chen, R., Chen, K., Chen, X., Cheng, Z., Deng, L., Ding, W., Gao, C., Ge, C., et al.: Qwen3-vl technical report. arXiv preprint arXiv:2511.21631 (2025) 10

  3. [3]

    Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., Tang, J., Zhong, H., Zhu, Y., Yang, M., Li, Z., Wan, J., Wang, P., Ding, W., Fu, Z., Xu, Y., Ye, J., Zhang, X., Xie, T., Cheng, Z., Zhang, H., Yang, Z., Xu, H., Lin, J.: Qwen2.5-vl technical report (2025),https://arxiv.org/abs/2502.13923 10

  4. [4]

    ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

    Chan, C.M., Chen, W., Su, Y., Yu, J., Xue, W., Zhang, S., Fu, J., Liu, Z.: Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201 (2023) 3

  5. [5]

    In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

    Chen, J., Saha, S., Bansal, M.: Reconcile: Round-table conference improves reason- ing via consensus among diverse llms. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 7066–7085 (2024) 3

  6. [6]

    Chen, S., Gai, J., Zhou, R., Zhang, J., Zhu, T., Li, J., Wang, K., Wang, Z., Chen, Z., Kaleb, K., et al.: Skillcraft: Can llm agents learn to use tools skillfully? arXiv preprint arXiv:2603.00718 (2026) 4

  7. [7]

    Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, and Zhiyong Wu

    Chen, T., Li, Y., Solodko, M., Wang, S., Jiang, N., Cui, T., Hao, J., Ko, J., Abdali, S., Zheng, S., et al.: Cua-skill: Develop skills for computer using agent. arXiv preprint arXiv:2601.21123 (2026) 4

  8. [8]

    arXiv preprint arXiv:2505.19591 , year=

    Dang, Y., Qian, C., Luo, X., Fan, J., Xie, Z., Shi, R., Chen, W., Yang, C., Che, X., Tian, Y., et al.: Multi-agent collaboration via evolving orchestration. arXiv preprint arXiv:2505.19591 (2025) 4

  9. [9]

    In: Forty-first international conference on machine learning (2024) 3 18 Z

    Du, Y., Li, S., Torralba, A., Tenenbaum, J.B., Mordatch, I.: Improving factual- ity and reasoning in language models through multiagent debate. In: Forty-first international conference on machine learning (2024) 3 18 Z. Nie et al

  10. [10]

    mas: Stable reinforcement learning for multi-agent llm systems

    Feng, L., Zheng, L., He, S., Zhang, F., An, B.: Dr. mas: Stable reinforcement learning for multi-agent llm systems. arXiv preprint arXiv:2602.08847 (2026) 17

  11. [11]

    Flowreasoner: Reinforcing query-level meta-agents.arXiv preprint arXiv:2504.15257, 2025

    Gao, H., Liu, Y., He, Y., Dou, L., Du, C., Deng, Z., Hooi, B., Lin, M., Pang, T.: Flowreasoner: Reinforcing query-level meta-agents. arXiv preprint arXiv:2504.15257 (2025) 4

  12. [12]

    arXiv preprint arXiv:2310.02003 (2023) 3

    Holt, S., Luyten, M.R., van der Schaar, M.: L2mac: Large language model au- tomatic computer for extensive code generation. arXiv preprint arXiv:2310.02003 (2023) 3

  13. [13]

    In: The twelfth international conference on learning rep- resentations (2023) 1, 3

    Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Wang, J., Zhang, C., Wang, Z., Yau, S.K.S., Lin, Z., et al.: Metagpt: Meta programming for a multi-agent collaborative framework. In: The twelfth international conference on learning rep- resentations (2023) 1, 3

  14. [14]

    Automated Design of Agentic Systems

    Hu, S., Lu, C., Clune, J.: Automated design of agentic systems. arXiv preprint arXiv:2408.08435 (2024) 4

  15. [15]

    Authorea Preprints (2025) 17

    Hu, X., Qian, Y., Yu, J., Liu, J., Tang, P., Ji, X., Xu, C., Liu, J., Yan, X., Yu, X., et al.: The landscape of medical agents: A survey. Authorea Preprints (2025) 17

  16. [16]

    arXiv preprint arXiv:2410.16946 , year=

    Hu, Y., Cai, Y., Du, Y., Zhu, X., Liu, X., Yu, Z., Hou, Y., Tang, S., Chen, S.: Self-evolving multi-agent collaboration networks for software development. arXiv preprint arXiv:2410.16946 (2024) 4

  17. [17]

    Audited skill-graph self-improvement for agentic llms via verifiable rewards, experience synthesis, and continual memory,

    Huang, K., Huang, J.: Audited skill-graph self-improvement for agentic llms via verifiable rewards, experience synthesis, and continual memory. arXiv preprint arXiv:2512.23760 (2025) 4

  18. [18]

    SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

    Jiang, Y., Li, D., Deng, H., Ma, B., Wang, X., Wang, Q., Yu, G.: Sok: Agentic skills–beyond tool use in llm agents. arXiv preprint arXiv:2602.20867 (2026) 4

  19. [19]

    Finance Research Letters p

    Ke, Z., Cao, Y., Chen, Z., Yin, Y., He, S., Cheng, Y.: Early warning of cryptocur- rency reversal risks via multi-source data. Finance Research Letters p. 107890 (2025) 17

  20. [20]

    In: The Twelfth International Conference on Learning Representations (2023) 4

    Khattab, O., Singhvi, A., Maheshwari, P., Zhang, Z., Santhanam, K., Haq, S., Sharma, A., Joshi, T.T., Moazam, H., Miller, H., et al.: Dspy: compiling declarative language model calls into state-of-the-art pipelines. In: The Twelfth International Conference on Learning Representations (2023) 4

  21. [21]

    Advances in Neural Information Processing Systems37, 79410–79452 (2024) 3

    Kim, Y., Park, C., Jeong, H., Chan, Y.S., Xu, X., McDuff, D., Lee, H., Ghassemi, M., Breazeal, C., Park, H.W.: Mdagents: An adaptive collaboration of llms for medical decision-making. Advances in Neural Information Processing Systems37, 79410–79452 (2024) 3

  22. [22]

    Agent skill acquisition for large language models via CycleQD.arXiv preprint arXiv:2410.14735, 2024

    Kuroki, S., Nakamura, T., Akiba, T., Tang, Y.: Agent skill acquisition for large language models via cycleqd. arXiv preprint arXiv:2410.14735 (2024) 4

  23. [23]

    LLaVA-OneVision: Easy Visual Task Transfer

    Li, B., Zhang, Y., Guo, D., Zhang, R., Li, F., Zhang, H., Zhang, K., Zhang, P., Li, Y., Liu, Z., et al.: Llava-onevision: Easy visual task transfer. arXiv preprint arXiv:2408.03326 (2024) 10

  24. [24]

    Advances in neural information processing systems36, 51991–52008 (2023) 1, 3

    Li, G., Hammoud, H., Itani, H., Khizbullin, D., Ghanem, B.: Camel: Communica- tive agents for" mind" exploration of large language model society. Advances in neural information processing systems36, 51991–52008 (2023) 1, 3

  25. [25]

    In: Workshop on Multi-Agent Learning and Its Opportunities in the Era of Generative AI 4

    Li, Y., Dou, Y., Shao, J.J., Lyu, Y., Tsang, I., Yin, H.: Skilltracer: Structural failure attribution and refinement of agentic skills in long-horizon web tasks. In: Workshop on Multi-Agent Learning and Its Opportunities in the Era of Generative AI 4

  26. [26]

    In: Proceedings of the 2024 conference on empirical methods in natural language processing

    Liang, T., He, Z., Jiao, W., Wang, X., Wang, Y., Wang, R., Yang, Y., Shi, S., Tu, Z.: Encouraging divergent thinking in large language models through multi-agent debate. In: Proceedings of the 2024 conference on empirical methods in natural language processing. pp. 17889–17904 (2024) 3 SkillGraph 19

  27. [27]

    SkillNet: Create, evaluate, and connect AI skills,

    Liang, Y., Zhong, R., Xu, H., Jiang, C., Zhong, Y., Fang, R., Gu, J.C., Deng, S., Yao, Y., Wang, M., et al.: Skillnet: Create, evaluate, and connect ai skills. arXiv preprint arXiv:2603.04448 (2026) 4

  28. [28]

    Liu, Y., Duan, H., Zhang, Y., Li, B., Zhang, S., Zhao, W., Yuan, Y., Wang, J., He, C., Liu, Z., et al.: Mmbench: Is your multi-modal model an all-around player? In: European conference on computer vision. pp. 216–233. Springer (2024) 11

  29. [29]

    Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization

    Liu, Z., Zhang, Y., Li, P., Liu, Y., Yang, D.: Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization. arXiv preprint arXiv:2310.02170 (2023) 4

  30. [30]

    MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

    Lu, P., Bansal, H., Xia, T., Liu, J., Li, C., Hajishirzi, H., Cheng, H., Chang, K.W., Galley, M., Gao, J.: Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts. arXiv preprint arXiv:2310.02255 (2023) 11

  31. [31]

    In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

    Mathew, M., Bagal, V., Tito, R., Karatzas, D., Valveny, E., Jawahar, C.: Info- graphicvqa. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1697–1706 (2022) 11

  32. [32]

    Pan, P., Chen, L., He, Q., Yuan, K., Wang, H., Zhang, W.: Finscra: An llm- powered multi-chain reasoning framework for interpretable node classification on text-attributed graphs (2026) 17

  33. [33]

    In: Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers)

    Qian, C., Liu, W., Liu, H., Chen, N., Dang, Y., Li, J., Yang, C., Chen, W., Su, Y., Cong, X., et al.: Chatdev: Communicative agents for software development. In: Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers). pp. 15174–15186 (2024) 3

  34. [34]

    Scaling large-language-model-based multi-agent collaboration

    Qian, C., Xie, Z., Wang, Y., Liu, W., Zhu, K., Xia, H., Dang, Y., Du, Z., Chen, W., Yang, C., et al.: Scaling large language model-based multi-agent collaboration. arXiv preprint arXiv:2406.07155 (2024) 1, 3

  35. [35]

    Medmaslab: A unified orchestration framework for benchmarking multimodal medical multi-agent systems, 2026

    Qian, Y., Hu, X., Yu, J., Xin, S., Chen, X., Zhang, J., Jiang, P.T., Liu, J., Li, H.B.:Medmaslab:Aunifiedorchestrationframeworkforbenchmarkingmultimodal medical multi-agent systems. arXiv preprint arXiv:2603.09909 (2026) 17

  36. [36]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L., Anand- kumar, A.: Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023) 4

  37. [37]

    Reinforcement learning for self-improving agent with skill library.arXiv preprint arXiv:2512.17102, 2025

    Wang, J., Yan, Q., Wang, Y., Tian, Y., Mishra, S.S., Xu, Z., Gandhi, M., Xu, P., Cheong, L.L.: Reinforcement learning for self-improving agent with skill library. arXiv preprint arXiv:2512.17102 (2025) 4

  38. [38]

    arXiv preprint arXiv:2504.06821 , year=

    Wang, Z.Z., Gandhi, A., Neubig, G., Fried, D.: Inducing programmatic skills for agentic tasks. arXiv preprint arXiv:2504.06821 (2025) 4

  39. [39]

    Agent Workflow Memory

    Wang, Z.Z., Mao, J., Fried, D., Neubig, G.: Agent workflow memory. arXiv preprint arXiv:2409.07429 (2024) 4

  40. [40]

    In: First conference on language modeling (2024) 1, 3

    Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., et al.: Autogen: Enabling next-gen llm applications via multi- agent conversations. In: First conference on language modeling (2024) 1, 3

  41. [41]

    SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

    Xia, P., Chen, J., Wang, H., Liu, J., Zeng, K., Wang, Y., Han, S., Zhou, Y., Zhao, X., Chen, H., et al.: Skillrl: Evolving agents via recursive skill-augmented reinforcement learning. arXiv preprint arXiv:2602.08234 (2026) 4

  42. [42]

    Comas: Co-evolving multi-agent systems via interaction rewards.CoRR, abs/2510.08529, 2025

    Xue, X., Zhou, Y., Zhang, G., Zhang, Z., Li, Y., Zhang, C., Yin, Z., Torr, P., Ouyang, W., Bai, L.: Comas: Co-evolving multi-agent systems via interaction re- wards. arXiv preprint arXiv:2510.08529 (2025) 4

  43. [43]

    arXiv preprint arXiv:2603.01145 , year=

    Yang, Y., Li, J., Pan, Q., Zhan, B., Cai, Y., Du, L., Zhou, J., Chen, K., Chen, Q., Li, X., et al.: Autoskill: Experience-driven lifelong learning via skill self-evolution. arXiv preprint arXiv:2603.01145 (2026) 4 20 Z. Nie et al

  44. [44]

    The latent space: Foundation, evolution, mechanism, ability, and outlook.arXiv preprint arXiv:2604.02029, 2026

    Yu, X., Chen, Z., He, Y., Fu, T., Yang, C., Xu, C., Ma, Y., Hu, X., Cao, Z., Xu, J., et al.: The latent space: Foundation, evolution, mechanism, ability, and outlook. arXiv preprint arXiv:2604.02029 (2026) 17

  45. [45]

    arXiv preprint arXiv:2602.00471 (2026) 1

    Yu, X., Xu, C., Chen, Z., Yin, B., Yang, C., He, Y., Hu, Y., Zhang, J., Tan, C., Hu, X., et al.: Dual latent memory for visual multi-agent system. arXiv preprint arXiv:2602.00471 (2026) 1

  46. [46]

    InAdvances in Neural Information Processing Systems, volume 36, pages 11809–11822

    Yu, X., Xu, C., Chen, Z., Zhang, Y., Lu, S., Yang, C., Zhang, J., Yan, S., Hu, X.: Visual document understanding and reasoning: A multi-agent collaboration frame- work with agent-wise adaptive test-time scaling. arXiv preprint arXiv:2508.03404 (2025) 3

  47. [47]

    Vismem: Latent vision memory unlocks potential of vision-language models.arXiv preprint arXiv:2511.11007, 2025

    Yu, X., Xu, C., Zhang, G., Chen, Z., Zhang, Y., He, Y., Jiang, P.T., Zhang, J., Hu, X., Yan, S.: Vismem: Latent vision memory unlocks potential of vision-language models. arXiv preprint arXiv:2511.11007 (2025) 17

  48. [48]

    arXiv preprint arXiv:2509.21789 (2025)

    Yu, X., Xu, C., Zhang, G., He, Y., Chen, Z., Xue, Z., Zhang, J., Liao, Y., Hu, X., Jiang, Y.G., et al.: Visual multi-agent system: Mitigating hallucination snowballing via visual flow. arXiv preprint arXiv:2509.21789 (2025) 1

  49. [49]

    In: NeurIPS 2025 Workshop on Bridging Language, Agent, and World Models for Reasoning and Planning 4

    Zabounidis, R., Wu, Y., Stepputtis, S., Mitchell, T., Li, Y., Sycara, K.P.: Scalar: Self-supervised composition and learning of skills with llm planning and rl. In: NeurIPS 2025 Workshop on Bridging Language, Agent, and World Models for Reasoning and Planning 4

  50. [50]

    G-designer: Architecting multi-agent communication topologies via graph neural networks.arXiv preprint arXiv:2410.11782, 2024

    Zhang, G., Yue, Y., Sun, X., Wan, G., Yu, M., Fang, J., Wang, K., Chen, T., Cheng, D.: G-designer: Architecting multi-agent communication topologies via graph neu- ral networks. arXiv preprint arXiv:2410.11782 (2024) 2, 3

  51. [51]

    MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

    Zhang, H., Long, Q., Bao, J., Feng, T., Zhang, W., Yue, H., Wang, W.: Mem- skill: Learning and evolving memory skills for self-evolving agents. arXiv preprint arXiv:2602.02474 (2026) 4

  52. [52]

    AFlow: Automating Agentic Workflow Generation

    Zhang, J., Xiang, J., Yu, Z., Teng, F., Chen, X., Chen, J., Zhuge, M., Cheng, X., Hong, S., Wang, J., et al.: Aflow: Automating agentic workflow generation. arXiv preprint arXiv:2410.10762 (2024) 4

  53. [53]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Zhao, A., Huang, D., Xu, Q., Lin, M., Liu, Y.J., Huang, G.: Expel: Llm agents are experiential learners. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 19632–19642 (2024) 4

  54. [54]

    SkillRouter: Retrieve-and-rerank skill selection for LLM agents at scale.arXiv preprint arXiv:2603.22455, 2026

    Zheng, Y., Zhang, Z., Ma, C., Yu, Y., Zhu, J., Dong, B., Zhu, H.: Skill- router: Retrieve-and-rerank skill selection for llm agents at scale. arXiv preprint arXiv:2603.22455 (2026) 4

  55. [55]

    arXiv preprint arXiv:2502.02533 , year=

    Zhou, H., Wan, X., Sun, R., Palangi, H., Iqbal, S., Vulić, I., Korhonen, A., Arık, S.Ö.: Multi-agent design: Optimizing agents with better prompts and topologies. arXiv preprint arXiv:2502.02533 (2025) 3

  56. [56]

    WebArena: A Realistic Web Environment for Building Autonomous Agents

    Zhou, S., Xu, F.F., Zhu, H., Zhou, X., Lo, R., Sridhar, A., Cheng, X., Ou, T., Bisk, Y., Fried, D., et al.: Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv:2307.13854 (2023) 4

  57. [57]

    InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

    Zhu, J., Wang, W., Chen, Z., Liu, Z., Ye, S., Gu, L., Tian, H., Duan, Y., Su, W., Shao, J., et al.: Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models. arXiv preprint arXiv:2504.10479 (2025) 10

  58. [58]

    In: Forty-first International Conference on Machine Learning (2024) 2, 3

    Zhuge, M., Wang, W., Kirsch, L., Faccio, F., Khizbullin, D., Schmidhuber, J.: Gptswarm: Language agents as optimizable graphs. In: Forty-first International Conference on Machine Learning (2024) 2, 3