pith. sign in

arxiv: 2605.17359 · v1 · pith:NSSGX6EXnew · submitted 2026-05-17 · 💻 cs.CL

Learning Transferable Topology Priors for Multi-Agent LLM Collaboration Across Domains

Pith reviewed 2026-05-20 13:32 UTC · model grok-4.3

classification 💻 cs.CL
keywords multi-agent LLMcollaboration topologytransferable graph priorsvariational graph modelsadversarial alignmentmulti-domain reasoningtopology evolutionoffline prior learning
0
0 comments X

The pith

TopoPrior learns reusable collaboration graph priors offline from multiple domains to initialize structures for new multi-agent LLM queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that multi-agent LLM systems can avoid building collaboration topologies from scratch for every new query by instead learning general structural patterns from collections of past graphs across different domains. These patterns are stored in a shared latent space and then adapted to fit the current query before being handed to any existing refinement method. A sympathetic reader would care because repeated per-query searches become expensive in tokens and time as the number of agents and domains grows, and shifting part of the work to offline learning could make the overall process scale better while staying compatible with current approaches.

Core claim

TopoPrior employs a conditional variational graph framework to capture reusable structural regularities across domains in a latent space, then uses a query-conditioned latent adaptation module with adversarial alignment to produce initial collaboration graphs that reduce domain discrepancies while keeping query-relevant variation for downstream topology evolution.

What carries the argument

Conditional variational graph framework that encodes reusable structural regularities from offline multi-domain collaboration graphs, paired with adversarial alignment to adapt those encodings to new queries.

If this is right

  • Plugging the learned initial graphs into several different topology-evolution backbones produces consistent gains on multi-domain reasoning tasks.
  • Online inference uses fewer tokens because the starting collaboration structure is already closer to effective ones.
  • Only a modest number of extra trainable parameters are added beyond the base backbones.
  • The approach works without changing the core search or evolution logic of existing methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the priors hold across wider ranges of agent counts, the same offline learning step could support systems with dozens of agents where per-query search would otherwise become prohibitive.
  • The latent-space transfer idea might extend to other structured outputs in LLM agents, such as debate trees or shared memory layouts, rather than only collaboration graphs.
  • Collecting reference graphs from even more varied sources could further strengthen the priors, suggesting a path toward domain-general multi-agent scaffolding.

Load-bearing premise

The structural patterns found in collaboration graphs from several past domains are similar enough that they can be learned once and then adjusted for a new query without losing useful details or adding mismatches that hurt final performance.

What would settle it

Run the method on held-out domains and compare final task accuracy and token use when starting from the learned priors versus starting from random or per-query scratch graphs; if the prior-initialized versions show no consistent gains, the transfer claim does not hold.

Figures

Figures reproduced from arXiv: 2605.17359 by Chengyu Wang, Jiuheng Wan, Richang Hong, Taolin Zhang, Tingyuan Hu, Xiaofeng He, Zijie Zhou.

Figure 1
Figure 1. Figure 1: Comparison of reasoning paradigms across domains. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of TopoPrior. (1) Transferable Topology Prior Learning captures reusable collaboration patterns from multiple domains through a conditional variational graph framework. (2) Query-Conditioned Latent Adaptation improves cross￾domain robustness by adversarially regularizing the latent space while retaining query-relevant structural information. trainable weights. The query representation hq and the i… view at source ↗
Figure 3
Figure 3. Figure 3: Accuracy gain, inference-time token reduction, and [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: t-SNE visualization of the encoder-induced latent space [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Out-of-domain generalization on unseen domains. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Hyperparameter analysis of the loss coefficients [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
read the original abstract

Large language model (LLM)-based multi-agent systems have shown strong potential for complex reasoning by coordinating specialized agents through structured communication. However, existing topology-evolution methods typically construct or optimize a collaboration topology for each query from scratch, leading to substantial online search overhead, high inference-time token consumption, and limited scalability in multi-domain settings. We propose TopoPrior, a framework for learning transferable topology priors for multi-agent LLM collaboration across domains. Rather than repeatedly searching for effective collaboration structures online, TopoPrior learns reusable topology priors from reference collaboration graphs collected offline from multiple domains and uses them to generate query-conditioned initial collaboration graphs for downstream refinement. By shifting part of topology search from per-query online optimization to offline prior learning, TopoPrior amortizes search cost while remaining compatible with existing topology-evolution backbones. Technically, TopoPrior contains two key components. First, a transferable topology prior learning module employs a conditional variational graph framework to capture reusable structural regularities across domains in a latent space. Second, a query-conditioned latent adaptation module introduces adversarial alignment to reduce unnecessary domain discrepancy while preserving query-relevant structural variation. Experiments on multi-domain reasoning benchmarks show that TopoPrior consistently improves several heterogeneous topology-evolution backbones while reducing online inference-time token usage, with only modest additional trainable parameters. These results suggest that transferable topology initialization is an effective and lightweight mechanism for improving the efficiency of multi-agent LLM collaboration across domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes TopoPrior, a framework that learns reusable topology priors from offline multi-domain reference collaboration graphs via a conditional variational graph model to capture structural regularities in a latent space, then applies a query-conditioned latent adaptation module with adversarial alignment to generate initial collaboration graphs for downstream refinement in multi-agent LLM systems. The approach aims to amortize per-query topology search costs, reduce online inference-time token usage, and improve performance while remaining compatible with existing topology-evolution backbones, with experiments claiming consistent gains on multi-domain reasoning benchmarks using only modest additional trainable parameters.

Significance. If the transferability of latent structural regularities holds and the adversarial alignment preserves query-relevant variation without introducing harmful bias, the result would be significant for scaling multi-agent LLM collaboration: it provides a lightweight, amortizable mechanism for topology initialization that addresses the online search overhead of current per-query optimization methods. The compatibility with heterogeneous backbones and focus on cross-domain reuse are practical strengths that could influence efficiency-focused work in LLM agent systems.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Experiments): the claim of 'consistent improvements' and 'reducing online inference-time token usage' is reported without any quantitative values, error bars, ablation results, data split details, or baseline descriptions. This absence is load-bearing because the central claim—that offline prior learning amortizes search cost while improving performance—cannot be evaluated without these elements.
  2. [§3.2] §3.2 (Query-conditioned latent adaptation module): the adversarial alignment is introduced to reduce domain discrepancy while preserving query-relevant variation, yet no ablation on alignment strength, single- versus multi-domain training, or latent-space transfer gaps is provided. Without these, the weakest assumption—that reusable structural regularities survive alignment without collapse or harmful bias—remains untested and directly affects whether the generated initial graphs add benefit or bias.
minor comments (2)
  1. [§3.1] Notation for the conditional variational graph (e.g., latent variable definitions and conditioning) could be clarified with an explicit diagram or expanded equations to improve readability for readers unfamiliar with variational graph models.
  2. [§4] The manuscript would benefit from a table summarizing the additional trainable parameters across backbones and a clearer statement of the exact multi-domain benchmarks used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the presentation of experimental claims and the need for additional validation of the adversarial alignment component. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): the claim of 'consistent improvements' and 'reducing online inference-time token usage' is reported without any quantitative values, error bars, ablation results, data split details, or baseline descriptions. This absence is load-bearing because the central claim—that offline prior learning amortizes search cost while improving performance—cannot be evaluated without these elements.

    Authors: We agree that the abstract and the opening of §4 would be clearer with explicit quantitative summaries. The full experimental section (§4) contains tables reporting performance metrics across backbones and domains, token consumption reductions, standard deviations from repeated runs, data split details, and baseline descriptions. We will revise the abstract and §4 to include key numerical highlights (e.g., average accuracy gains and token savings) along with direct pointers to the tables and figures, while preserving the existing results. revision: yes

  2. Referee: [§3.2] §3.2 (Query-conditioned latent adaptation module): the adversarial alignment is introduced to reduce domain discrepancy while preserving query-relevant variation, yet no ablation on alignment strength, single- versus multi-domain training, or latent-space transfer gaps is provided. Without these, the weakest assumption—that reusable structural regularities survive alignment without collapse or harmful bias—remains untested and directly affects whether the generated initial graphs add benefit or bias.

    Authors: We concur that targeted ablations would better substantiate the design choices in the query-conditioned latent adaptation module. The current manuscript presents the overall framework and end-to-end results but omits explicit sensitivity analysis on alignment strength, single- versus multi-domain training comparisons, and metrics for latent-space transfer gaps. We will add these ablations in the revised version, including a hyperparameter sweep on the adversarial loss coefficient, a single-domain training baseline, and quantitative measures (e.g., MMD or reconstruction fidelity) of domain alignment in the latent space. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independent offline training and adaptation modules

full rationale

The paper describes a standard transfer-learning setup in which a conditional variational graph model is trained on offline multi-domain reference collaboration graphs to encode structural regularities, after which an adversarial alignment module adapts the latent space for new queries before feeding the result into existing topology-evolution backbones. None of the core components (variational graph encoder, adversarial alignment, or prior generation) are defined in terms of the final downstream performance metric, and the provided abstract and description contain no self-citations, fitted parameters renamed as predictions, or equations that reduce the claimed transferability to a tautology by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

Only abstract available so specific free parameters and axioms cannot be audited; the framework implicitly relies on standard VAE assumptions and graph model assumptions not detailed here.

free parameters (2)
  • latent dimension size
    Used in the conditional variational graph framework to capture structural regularities
  • adversarial alignment strength
    Hyperparameter controlling domain discrepancy reduction in the adaptation module
axioms (1)
  • domain assumption Collaboration topologies contain reusable structural regularities across domains that can be captured in a latent space
    Invoked in the transferable topology prior learning module description
invented entities (1)
  • TopoPrior framework no independent evidence
    purpose: To learn and adapt transferable topology priors for multi-agent LLM collaboration
    New named system introduced to combine the two modules

pith-pipeline@v0.9.0 · 5802 in / 1404 out tokens · 41019 ms · 2026-05-20T13:32:19.787780+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 4 internal anchors

  1. [1]

    Leveraging large language models for medical text classification: a hospital readmission prediction case,

    N. Nazyrova, S. Chahed, T. Chausalet, and M. Dwek, “Leveraging large language models for medical text classification: a hospital readmission prediction case,” inICPRS, 2024, pp. 1–7

  2. [2]

    Measuring massive multitask language understanding,

    D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” in Proc. Int. Conf. Learn. Represent., 2021

  3. [3]

    C-eval: A multi-level multi- discipline chinese evaluation suite for foundation models,

    Y . Huang, Y . Bai, Z. Zhu, J. Zhang, J. Zhang, T. Su, J. Liu, C. Lv, Y . Zhang, J. Lei, Y . Fu, M. Sun, and J. He, “C-eval: A multi-level multi- discipline chinese evaluation suite for foundation models,” inProc. Adv. Neural Inf. Process. Syst., 2023

  4. [4]

    Text classification and prediction in the legal domain,

    M. Nghiem, P. Baylis, A. Freitas, and S. Ananiadou, “Text classification and prediction in the legal domain,” inLREC, 2022, pp. 4717–4722

  5. [5]

    Efficient prompt optimisation for legal text classification with proxy prompt evaluator,

    H. Lee, K. C. Li, M. Grabmair, and S. Xu, “Efficient prompt optimisation for legal text classification with proxy prompt evaluator,”CoRR, vol. abs/2510.08524, 2025

  6. [6]

    Multi-prompt alignment for multi-source unsupervised domain adaptation,

    H. Chen, X. Han, Z. Wu, and Y . Jiang, “Multi-prompt alignment for multi-source unsupervised domain adaptation,” inProc. Adv. Neural Inf. Process. Syst., 2023

  7. [7]

    Cof-cot: Enhancing large language models with coarse-to-fine chain-of-thought prompting for multi-domain NLU tasks,

    H. Nguyen, Y . Liu, C. Zhang, T. Zhang, and P. S. Yu, “Cof-cot: Enhancing large language models with coarse-to-fine chain-of-thought prompting for multi-domain NLU tasks,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2023, pp. 12 109–12 119

  8. [8]

    Large language model for multi-domain translation: Benchmarking and domain cot fine-tuning,

    T. Hu, P. Zhang, B. Yang, J. Xie, D. F. Wong, and R. Wang, “Large language model for multi-domain translation: Benchmarking and domain cot fine-tuning,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2024, pp. 5726–5746

  9. [9]

    Understanding the dark side of llms’ intrinsic self- correction,

    Q. Zhang, D. Wang, H. Qian, Y . Li, T. Zhang, M. Huang, K. Xu, H. Li, L. Yan, and H. Qiu, “Understanding the dark side of llms’ intrinsic self- correction,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 27 066–27 101

  10. [10]

    Fine-tuned language models are continual learners,

    T. Scialom, T. Chakrabarty, and S. Muresan, “Fine-tuned language models are continual learners,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2022, pp. 6107–6122

  11. [11]

    Diversity as a reward: Fine-tuning llms on a mixture of domain-undetermined data,

    Z. Ling, D. Chen, L. Yao, Y . Li, and Y . Shen, “Diversity as a reward: Fine-tuning llms on a mixture of domain-undetermined data,”CoRR, vol. abs/2502.04380, 2025

  12. [12]

    A comprehensive survey on source-free domain adaptation,

    J. Li, Z. Yu, Z. Du, L. Zhu, and H. T. Shen, “A comprehensive survey on source-free domain adaptation,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 8, pp. 5743–5762, 2024

  13. [13]

    Boosting multi-domain fine-tuning of large language models through evolving interactions between samples,

    X. Liang, L. Yang, J. Wang, Y . Lu, R. Wu, H. Chen, and J. Hao, “Boosting multi-domain fine-tuning of large language models through evolving interactions between samples,” inProc. Int. Conf. Mach. Learn., 2025

  14. [14]

    Metagpt: Meta programming for A multi-agent collaborative framework,

    S. Hong, M. Zhuge, J. Chen, X. Zheng, Y . Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu, and J. Schmidhuber, “Metagpt: Meta programming for A multi-agent collaborative framework,” inProc. Int. Conf. Learn. Represention, 2024

  15. [15]

    A dynamic llm- powered agent network for task-oriented agent collaboration,

    Z. Liu, Y . Zhang, P. Li, Y . Liu, and D. Yang, “A dynamic llm- powered agent network for task-oriented agent collaboration,”CoRR, vol. abs/2310.02170, 2024

  16. [16]

    Cut the crap: An economical communication pipeline for llm-based multi-agent systems,

    G. Zhang, Y . Yue, Z. Li, S. Yun, G. Wan, K. Wang, D. Cheng, J. X. Yu, and T. Chen, “Cut the crap: An economical communication pipeline for llm-based multi-agent systems,” inProc. Int. Conf. Learn. Repre., 2025. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. XX, XXXX 202X 14

  17. [17]

    G- designer: Architecting multi-agent communication topologies via graph neural networks,

    G. Zhang, Y . Yue, X. Sun, M. Yu, K. Wang, T. Chen, and D. Cheng, “G- designer: Architecting multi-agent communication topologies via graph neural networks,” inProc. Int. Conf. Mach. Learn., 2025

  18. [18]

    A multi-agent framework with automated decision rule optimization for cross-domain misinformation detection,

    H. Li, A. Wang, K. Li, Z. Wang, L. Zhang, D. Qiu, Q. Liu, and J. Su, “A multi-agent framework with automated decision rule optimization for cross-domain misinformation detection,” inProc. Conf. Empir. Methods Nat. Lang. Process., Nov. 2025

  19. [19]

    M4LE: A multi-ability multi-range multi-task multi- domain long-context evaluation benchmark for large language models,

    W. Kwan, X. Zeng, Y . Wang, Y . Sun, L. Li, Y . Jiang, L. Shang, Q. Liu, and K. Wong, “M4LE: A multi-ability multi-range multi-task multi- domain long-context evaluation benchmark for large language models,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 15 568– 15 592

  20. [20]

    M 3cot: A novel benchmark for multi-domain multi-step multi-modal chain-of- thought,

    Q. Chen, L. Qin, J. Zhang, Z. Chen, X. Xu, and W. Che, “M 3cot: A novel benchmark for multi-domain multi-step multi-modal chain-of- thought,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 8199–8221

  21. [21]

    Polynarrative: A multilingual, multilabel, multi-domain dataset for narrative extraction from news articles,

    N. Nikolaidis, N. Stefanovitch, P. Silvano, D. I. Dimitrov, R. Yangarber, N. Guimar ˜aes, E. Sartori, I. Androutsopoulos, P. Nakov, G. D. S. Martino, and J. Piskorski, “Polynarrative: A multilingual, multilabel, multi-domain dataset for narrative extraction from news articles,” in Proc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 31 323– 31 345

  22. [22]

    A prompt-driven framework for multi-domain knowledge tracing,

    Z. Liu, S. Huang, T. Guo, M. Hou, and Q. Liang, “A prompt-driven framework for multi-domain knowledge tracing,”Mach. Learn., vol. 114, no. 4, Feb. 2025

  23. [23]

    Lora: Low-rank adaptation of large language models,

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” inProc. Int. Conf. Learn. Represention, 2022

  24. [24]

    Scalable multi-domain adaptation of language models using modular experts,

    P. Schafhalter, S. Liao, Y . Zhou, C. Yeh, A. Kandoor, and J. Laudon, “Scalable multi-domain adaptation of language models using modular experts,”CoRR, vol. abs/2410.10181, 2024

  25. [25]

    Dynamic expert specialization: To- wards catastrophic forgetting-free multi-domain moe adaptation,

    J. Li, B. Wang, X. Zhou, and X. Hu, “Dynamic expert specialization: To- wards catastrophic forgetting-free multi-domain moe adaptation,”CoRR, vol. abs/2509.16882, 2025

  26. [26]

    Llm-blender: Ensembling large language models with pairwise ranking and generative fusion,

    D. Jiang, X. Ren, and B. Y . Lin, “Llm-blender: Ensembling large language models with pairwise ranking and generative fusion,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2023, pp. 14 165–14 178

  27. [27]

    Exploring collaboration mechanisms for LLM agents: A social psychology view,

    J. Zhang, X. Xu, N. Zhang, R. Liu, B. Hooi, and S. Deng, “Exploring collaboration mechanisms for LLM agents: A social psychology view,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 14 544– 14 607

  28. [28]

    Improving factuality and reasoning in language models through multiagent debate,

    Y . Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, “Improving factuality and reasoning in language models through multiagent debate,” inProc. Int. Conf. Mach. Learn., 2024

  29. [29]

    L2MAC: large language model automatic computer for unbounded code generation,

    S. Holt, M. R. Luyten, and M. van der Schaar, “L2MAC: large language model automatic computer for unbounded code generation,”CoRR, vol. abs/2310.02003, 2023

  30. [30]

    Chatdev: Com- municative agents for software development,

    C. Qian, W. Liu, H. Liu, N. Chen, Y . Dang, J. Li, C. Yang, W. Chen, Y . Su, X. Cong, J. Xu, D. Li, Z. Liu, and M. Sun, “Chatdev: Com- municative agents for software development,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 15 174–15 186

  31. [31]

    Autogen: Enabling next-gen LLM applications via multi- agent conversations,

    Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu, A. H. Awadallah, R. W. White, D. Burger, and C. Wang, “Autogen: Enabling next-gen LLM applications via multi- agent conversations,” inProc. Conf. Lang. Model., 2024

  32. [32]

    Large language model is a good policy teacher for training reinforcement learning agents,

    Z. Zhou, B. Hu, P. Zhang, C. Zhao, and B. Liu, “Large language model is a good policy teacher for training reinforcement learning agents,” CoRR, vol. abs/2311.13373, 2023

  33. [33]

    Self-organized agents: A LLM multi- agent framework toward ultra large-scale code generation and optimiza- tion,

    Y . Ishibashi and Y . Nishimura, “Self-organized agents: A LLM multi- agent framework toward ultra large-scale code generation and optimiza- tion,”CoRR, vol. abs/2404.02183, 2024

  34. [34]

    Gptswarm: Language agents as optimizable graphs,

    M. Zhuge, W. Wang, L. Kirsch, F. Faccio, D. Khizbullin, and J. Schmid- huber, “Gptswarm: Language agents as optimizable graphs,” inProc. Int. Conf. Mach. Learn., 2024

  35. [35]

    Agentdropout: Dynamic agent elimination for token-efficient and high- performance llm-based multi-agent collaboration,

    Z. Wang, Y . Wang, X. Liu, L. Ding, M. Zhang, J. Liu, and M. Zhang, “Agentdropout: Dynamic agent elimination for token-efficient and high- performance llm-based multi-agent collaboration,” inProc. Annu. Meet- ing Assoc. Comput. Linguistics, 2025, pp. 24 013–24 035

  36. [36]

    Assemble your crew: Au- tomatic multi-agent communication topology design via autoregressive graph generation,

    S. Li, Y . Liu, Q. Wen, C. Zhang, and S. Pan, “Assemble your crew: Au- tomatic multi-agent communication topology design via autoregressive graph generation,” inProc. Assoc. for the Advan. of Arti. Intell., 2026, pp. 23 142–23 150

  37. [37]

    To adapt or to annotate: Challenges and interventions for domain adaptation in open-domain question answering,

    D. Dua, E. Strubell, S. Singh, and P. Verga, “To adapt or to annotate: Challenges and interventions for domain adaptation in open-domain question answering,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2023, pp. 14 429–14 446

  38. [38]

    On the transferability of causal knowledge for language models,

    G. Dey and Y . K. Lal, “On the transferability of causal knowledge for language models,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 8–14

  39. [39]

    Variational Graph Auto-Encoders

    T. N. Kipf and M. Welling, “Variational graph auto-encoders,”CoRR, vol. abs/1611.07308, 2016

  40. [40]

    GraphInsight: Unlocking insights in large language models for graph structure under- standing,

    Y . Cao, S. Han, Z. Gao, Z. Ding, X. Xie, and S. K. Zhou, “GraphInsight: Unlocking insights in large language models for graph structure under- standing,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 12 096–12 134

  41. [41]

    Attribution- aware weight transfer: A warm-start initialization for class-incremental semantic segmentation,

    D. Goswami, R. Schuster, J. van de Weijer, and D. Stricker, “Attribution- aware weight transfer: A warm-start initialization for class-incremental semantic segmentation,” inProc. Winter Conf. Appl. Comput. Vis., 2023, pp. 3194–3203

  42. [42]

    Fast, scalable, warm-start semidefinite programming with spectral bundling and sketching,

    R. Angell and A. McCallum, “Fast, scalable, warm-start semidefinite programming with spectral bundling and sketching,” inProc. Int. Conf. Mach. Learn., 2024, pp. 1579–1615

  43. [43]

    Semi-supervised classification with graph convolutional networks,

    T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” inProc. Int. Conf. Learn. Represention, 2017

  44. [44]

    BERT: pre-training of deep bidirectional transformers for language understanding,

    J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” inProc. Annu. Conf. North Am. Chapter Assoc. Comput. Linguist., 2019, pp. 4171–4186

  45. [45]

    Unsupervised domain adaptation by backpropagation,

    Y . Ganin and V . S. Lempitsky, “Unsupervised domain adaptation by backpropagation,” inProc. Int. Conf. Mach. Learn., 2015, pp. 1180– 1189

  46. [46]

    On learning invariant representations for domain adaptation,

    H. Zhao, R. T. des Combes, K. Zhang, and G. J. Gordon, “On learning invariant representations for domain adaptation,” inProc. Int. Conf. Mach. Learn., 2019, pp. 7523–7532

  47. [47]

    Pac-bayesian domain adaptation bounds for multiclass learners,

    A. Sicilia, K. Atwell, M. Alikhani, and S. J. Hwang, “Pac-bayesian domain adaptation bounds for multiclass learners,” inProc. Conf. Uncertainty Artif. Intell., 2022, pp. 1824–1834

  48. [48]

    Y . E. Nesterov,Introductory Lectures on Convex Optimization - A Basic Course, ser. Applied Optimization. Springer, 2004, vol. 87

  49. [49]

    Llama: Open and efficient foundation language models,

    H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “Llama: Open and efficient foundation language models,” 2023

  50. [50]

    Qwen2.5 Technical Report

    A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, H. Lin, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Lin, K. Dang, K. Lu, K. Bao, K. Yang, L. Yu, M. Li, M. Xue, P. Zhang, Q. Zhu, R. Men, R. Lin, T. Li, T. Xia, X. Ren, X. Ren, Y . Fan, Y . Su, Y . Zhang, Y . Wan, Y . Liu, Z. Cui, Z. Zhang, and Z. Qiu, “Qwen2...

  51. [51]

    Deepseek-v3 technical report,

    DeepSeek-AI, “Deepseek-v3 technical report,” 2025

  52. [52]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V . Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” inProc. Adv. Neural Inf. Process. Syst., 2022

  53. [53]

    Retrieval-augmented generation for knowledge-intensive NLP tasks,

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W. Yih, T. Rockt ¨aschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” inProc. Adv. Neural Inf. Process. Syst., 2020

  54. [54]

    Modula: Mixture of domain- specific and universal lora for multi-task learning,

    Y . Ma, Z. Liang, H. Dai, B. Chen, D. Gao, Z. Ran, Z. Wang, L. Jin, W. Jiang, G. Zhang, X. Cai, and L. Yang, “Modula: Mixture of domain- specific and universal lora for multi-task learning,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2024, pp. 2758–2770

  55. [55]

    Deep Learning using Rectified Linear Units (ReLU)

    A. F. Agarap, “Deep learning using rectified linear units (relu),”CoRR, vol. abs/1803.08375, 2018

  56. [56]

    Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

    J. Chung, C ¸ . G¨ulc ¸ehre, K. Cho, and Y . Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,”CoRR, vol. abs/1412.3555, 2014

  57. [57]

    Visualizing data using t-sne,

    L. van der Maaten and G. Hinton, “Visualizing data using t-sne,”Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008

  58. [58]

    How to read paintings: Semantic art understanding with multi-modal retrieval,

    N. Garcia and G. V ogiatzis, “How to read paintings: Semantic art understanding with multi-modal retrieval,” inProc. Eur. Conf. Comput. Vis., 2018, pp. 676–691

  59. [59]

    CMNEE: A large-scale document-level event extraction dataset based on open-source chinese military news,

    M. Zhu, Z. Xu, K. Zeng, K. Xiao, M. Wang, W. Ke, and H. Huang, “CMNEE: A large-scale document-level event extraction dataset based on open-source chinese military news,” inProc. Int. Conf. Comput. Linguist., 2024, pp. 3367–3379

  60. [60]

    An astronomical question answering dataset for evaluating large language models,

    J. Li, F. Zhao, P. Chen, J. Xie, X. Zhang, H. Li, M. Chen, Y . Wang, and M. Zhu., “An astronomical question answering dataset for evaluating large language models,”Nature Scientific Data, 2025

  61. [61]

    Trec 2007 genomics track overview,

    W. R. Hersh, A. M. Cohen, J. Yang, R. T. Bhupatiraju, P. M. Roberts, and M. A. Hearst, “Trec 2007 genomics track overview,” inText Retrieval Conference, 2007. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. XX, XXXX 202X 15 APPENDIXA PROOFS OFTHEORETICALRESULTS In this appendix, we provide proofs and supporting state- ments for the analy...