Learning Transferable Topology Priors for Multi-Agent LLM Collaboration Across Domains

Chengyu Wang; Jiuheng Wan; Richang Hong; Taolin Zhang; Tingyuan Hu; Xiaofeng He; Zijie Zhou

arxiv: 2605.17359 · v1 · pith:NSSGX6EXnew · submitted 2026-05-17 · 💻 cs.CL

Learning Transferable Topology Priors for Multi-Agent LLM Collaboration Across Domains

Taolin Zhang , Zijie Zhou , Jiuheng Wan , Tingyuan Hu , Chengyu Wang , Xiaofeng He , Richang Hong This is my paper

Pith reviewed 2026-05-20 13:32 UTC · model grok-4.3

classification 💻 cs.CL

keywords multi-agent LLMcollaboration topologytransferable graph priorsvariational graph modelsadversarial alignmentmulti-domain reasoningtopology evolutionoffline prior learning

0 comments

The pith

TopoPrior learns reusable collaboration graph priors offline from multiple domains to initialize structures for new multi-agent LLM queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that multi-agent LLM systems can avoid building collaboration topologies from scratch for every new query by instead learning general structural patterns from collections of past graphs across different domains. These patterns are stored in a shared latent space and then adapted to fit the current query before being handed to any existing refinement method. A sympathetic reader would care because repeated per-query searches become expensive in tokens and time as the number of agents and domains grows, and shifting part of the work to offline learning could make the overall process scale better while staying compatible with current approaches.

Core claim

TopoPrior employs a conditional variational graph framework to capture reusable structural regularities across domains in a latent space, then uses a query-conditioned latent adaptation module with adversarial alignment to produce initial collaboration graphs that reduce domain discrepancies while keeping query-relevant variation for downstream topology evolution.

What carries the argument

Conditional variational graph framework that encodes reusable structural regularities from offline multi-domain collaboration graphs, paired with adversarial alignment to adapt those encodings to new queries.

If this is right

Plugging the learned initial graphs into several different topology-evolution backbones produces consistent gains on multi-domain reasoning tasks.
Online inference uses fewer tokens because the starting collaboration structure is already closer to effective ones.
Only a modest number of extra trainable parameters are added beyond the base backbones.
The approach works without changing the core search or evolution logic of existing methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the priors hold across wider ranges of agent counts, the same offline learning step could support systems with dozens of agents where per-query search would otherwise become prohibitive.
The latent-space transfer idea might extend to other structured outputs in LLM agents, such as debate trees or shared memory layouts, rather than only collaboration graphs.
Collecting reference graphs from even more varied sources could further strengthen the priors, suggesting a path toward domain-general multi-agent scaffolding.

Load-bearing premise

The structural patterns found in collaboration graphs from several past domains are similar enough that they can be learned once and then adjusted for a new query without losing useful details or adding mismatches that hurt final performance.

What would settle it

Run the method on held-out domains and compare final task accuracy and token use when starting from the learned priors versus starting from random or per-query scratch graphs; if the prior-initialized versions show no consistent gains, the transfer claim does not hold.

Figures

Figures reproduced from arXiv: 2605.17359 by Chengyu Wang, Jiuheng Wan, Richang Hong, Taolin Zhang, Tingyuan Hu, Xiaofeng He, Zijie Zhou.

**Figure 2.** Figure 2: Overview of TopoPrior. (1) Transferable Topology Prior Learning captures reusable collaboration patterns from multiple domains through a conditional variational graph framework. (2) Query-Conditioned Latent Adaptation improves crossdomain robustness by adversarially regularizing the latent space while retaining query-relevant structural information. trainable weights. The query representation hq and the i… view at source ↗

**Figure 3.** Figure 3: Accuracy gain, inference-time token reduction, and [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: t-SNE visualization of the encoder-induced latent space [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Out-of-domain generalization on unseen domains. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Hyperparameter analysis of the loss coefficients [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

read the original abstract

Large language model (LLM)-based multi-agent systems have shown strong potential for complex reasoning by coordinating specialized agents through structured communication. However, existing topology-evolution methods typically construct or optimize a collaboration topology for each query from scratch, leading to substantial online search overhead, high inference-time token consumption, and limited scalability in multi-domain settings. We propose TopoPrior, a framework for learning transferable topology priors for multi-agent LLM collaboration across domains. Rather than repeatedly searching for effective collaboration structures online, TopoPrior learns reusable topology priors from reference collaboration graphs collected offline from multiple domains and uses them to generate query-conditioned initial collaboration graphs for downstream refinement. By shifting part of topology search from per-query online optimization to offline prior learning, TopoPrior amortizes search cost while remaining compatible with existing topology-evolution backbones. Technically, TopoPrior contains two key components. First, a transferable topology prior learning module employs a conditional variational graph framework to capture reusable structural regularities across domains in a latent space. Second, a query-conditioned latent adaptation module introduces adversarial alignment to reduce unnecessary domain discrepancy while preserving query-relevant structural variation. Experiments on multi-domain reasoning benchmarks show that TopoPrior consistently improves several heterogeneous topology-evolution backbones while reducing online inference-time token usage, with only modest additional trainable parameters. These results suggest that transferable topology initialization is an effective and lightweight mechanism for improving the efficiency of multi-agent LLM collaboration across domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TopoPrior shifts topology search to offline prior learning via variational graphs and adversarial adaptation, but the transferability assumption needs stronger checks.

read the letter

The punchline here is that TopoPrior tries to amortize the cost of finding good collaboration topologies in multi-agent LLM systems by learning reusable priors from offline multi-domain graphs. If the priors transfer well, it could make scaling these systems more practical. The new element is the combination of a conditional variational graph framework to model structural regularities in a latent space and an adversarial alignment module to adapt those priors to specific queries. This setup is meant to provide a good starting point for existing topology-evolution methods without much extra cost. The paper does a reasonable job outlining how this remains compatible with current backbones and adds only modest trainable parameters. The high-level experimental claims of better performance and reduced token usage on multi-domain benchmarks are encouraging on the surface. The main soft spot is the reliance on the idea that there are transferable structural patterns across domains that the latent space captures and that adversarial adaptation can refine without losing query-specific information. The stress test note highlights this correctly—the abstract does not provide ablations on transfer gaps or alignment effects, so it's unclear if the method avoids adding bias when domains differ substantially. If the full paper has those checks and the numbers back it up, the approach strengthens; otherwise, it risks being an unproven efficiency tweak. This paper is aimed at people working on multi-agent LLM architectures for complex, cross-domain tasks. A reader looking for ways to reduce online optimization overhead would get value from the framing and the proposed components. It shows clear thinking on the problem even if the evidence is not fully detailed yet. I would recommend sending it to peer review. The idea is solid enough to warrant a closer look at the experiments and any additional results in the full manuscript.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes TopoPrior, a framework that learns reusable topology priors from offline multi-domain reference collaboration graphs via a conditional variational graph model to capture structural regularities in a latent space, then applies a query-conditioned latent adaptation module with adversarial alignment to generate initial collaboration graphs for downstream refinement in multi-agent LLM systems. The approach aims to amortize per-query topology search costs, reduce online inference-time token usage, and improve performance while remaining compatible with existing topology-evolution backbones, with experiments claiming consistent gains on multi-domain reasoning benchmarks using only modest additional trainable parameters.

Significance. If the transferability of latent structural regularities holds and the adversarial alignment preserves query-relevant variation without introducing harmful bias, the result would be significant for scaling multi-agent LLM collaboration: it provides a lightweight, amortizable mechanism for topology initialization that addresses the online search overhead of current per-query optimization methods. The compatibility with heterogeneous backbones and focus on cross-domain reuse are practical strengths that could influence efficiency-focused work in LLM agent systems.

major comments (2)

[Abstract and §4] Abstract and §4 (Experiments): the claim of 'consistent improvements' and 'reducing online inference-time token usage' is reported without any quantitative values, error bars, ablation results, data split details, or baseline descriptions. This absence is load-bearing because the central claim—that offline prior learning amortizes search cost while improving performance—cannot be evaluated without these elements.
[§3.2] §3.2 (Query-conditioned latent adaptation module): the adversarial alignment is introduced to reduce domain discrepancy while preserving query-relevant variation, yet no ablation on alignment strength, single- versus multi-domain training, or latent-space transfer gaps is provided. Without these, the weakest assumption—that reusable structural regularities survive alignment without collapse or harmful bias—remains untested and directly affects whether the generated initial graphs add benefit or bias.

minor comments (2)

[§3.1] Notation for the conditional variational graph (e.g., latent variable definitions and conditioning) could be clarified with an explicit diagram or expanded equations to improve readability for readers unfamiliar with variational graph models.
[§4] The manuscript would benefit from a table summarizing the additional trainable parameters across backbones and a clearer statement of the exact multi-domain benchmarks used.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the presentation of experimental claims and the need for additional validation of the adversarial alignment component. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): the claim of 'consistent improvements' and 'reducing online inference-time token usage' is reported without any quantitative values, error bars, ablation results, data split details, or baseline descriptions. This absence is load-bearing because the central claim—that offline prior learning amortizes search cost while improving performance—cannot be evaluated without these elements.

Authors: We agree that the abstract and the opening of §4 would be clearer with explicit quantitative summaries. The full experimental section (§4) contains tables reporting performance metrics across backbones and domains, token consumption reductions, standard deviations from repeated runs, data split details, and baseline descriptions. We will revise the abstract and §4 to include key numerical highlights (e.g., average accuracy gains and token savings) along with direct pointers to the tables and figures, while preserving the existing results. revision: yes
Referee: [§3.2] §3.2 (Query-conditioned latent adaptation module): the adversarial alignment is introduced to reduce domain discrepancy while preserving query-relevant variation, yet no ablation on alignment strength, single- versus multi-domain training, or latent-space transfer gaps is provided. Without these, the weakest assumption—that reusable structural regularities survive alignment without collapse or harmful bias—remains untested and directly affects whether the generated initial graphs add benefit or bias.

Authors: We concur that targeted ablations would better substantiate the design choices in the query-conditioned latent adaptation module. The current manuscript presents the overall framework and end-to-end results but omits explicit sensitivity analysis on alignment strength, single- versus multi-domain training comparisons, and metrics for latent-space transfer gaps. We will add these ablations in the revised version, including a hyperparameter sweep on the adversarial loss coefficient, a single-domain training baseline, and quantitative measures (e.g., MMD or reconstruction fidelity) of domain alignment in the latent space. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independent offline training and adaptation modules

full rationale

The paper describes a standard transfer-learning setup in which a conditional variational graph model is trained on offline multi-domain reference collaboration graphs to encode structural regularities, after which an adversarial alignment module adapts the latent space for new queries before feeding the result into existing topology-evolution backbones. None of the core components (variational graph encoder, adversarial alignment, or prior generation) are defined in terms of the final downstream performance metric, and the provided abstract and description contain no self-citations, fitted parameters renamed as predictions, or equations that reduce the claimed transferability to a tautology by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

Only abstract available so specific free parameters and axioms cannot be audited; the framework implicitly relies on standard VAE assumptions and graph model assumptions not detailed here.

free parameters (2)

latent dimension size
Used in the conditional variational graph framework to capture structural regularities
adversarial alignment strength
Hyperparameter controlling domain discrepancy reduction in the adaptation module

axioms (1)

domain assumption Collaboration topologies contain reusable structural regularities across domains that can be captured in a latent space
Invoked in the transferable topology prior learning module description

invented entities (1)

TopoPrior framework no independent evidence
purpose: To learn and adapt transferable topology priors for multi-agent LLM collaboration
New named system introduced to combine the two modules

pith-pipeline@v0.9.0 · 5802 in / 1404 out tokens · 41019 ms · 2026-05-20T13:32:19.787780+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

conditional variational graph framework to capture reusable structural regularities across domains in a latent space
IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

query-conditioned latent adaptation module introduces adversarial alignment

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 4 internal anchors

[1]

Leveraging large language models for medical text classification: a hospital readmission prediction case,

N. Nazyrova, S. Chahed, T. Chausalet, and M. Dwek, “Leveraging large language models for medical text classification: a hospital readmission prediction case,” inICPRS, 2024, pp. 1–7

work page 2024
[2]

Measuring massive multitask language understanding,

D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” in Proc. Int. Conf. Learn. Represent., 2021

work page 2021
[3]

C-eval: A multi-level multi- discipline chinese evaluation suite for foundation models,

Y . Huang, Y . Bai, Z. Zhu, J. Zhang, J. Zhang, T. Su, J. Liu, C. Lv, Y . Zhang, J. Lei, Y . Fu, M. Sun, and J. He, “C-eval: A multi-level multi- discipline chinese evaluation suite for foundation models,” inProc. Adv. Neural Inf. Process. Syst., 2023

work page 2023
[4]

Text classification and prediction in the legal domain,

M. Nghiem, P. Baylis, A. Freitas, and S. Ananiadou, “Text classification and prediction in the legal domain,” inLREC, 2022, pp. 4717–4722

work page 2022
[5]

Efficient prompt optimisation for legal text classification with proxy prompt evaluator,

H. Lee, K. C. Li, M. Grabmair, and S. Xu, “Efficient prompt optimisation for legal text classification with proxy prompt evaluator,”CoRR, vol. abs/2510.08524, 2025

work page arXiv 2025
[6]

Multi-prompt alignment for multi-source unsupervised domain adaptation,

H. Chen, X. Han, Z. Wu, and Y . Jiang, “Multi-prompt alignment for multi-source unsupervised domain adaptation,” inProc. Adv. Neural Inf. Process. Syst., 2023

work page 2023
[7]

Cof-cot: Enhancing large language models with coarse-to-fine chain-of-thought prompting for multi-domain NLU tasks,

H. Nguyen, Y . Liu, C. Zhang, T. Zhang, and P. S. Yu, “Cof-cot: Enhancing large language models with coarse-to-fine chain-of-thought prompting for multi-domain NLU tasks,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2023, pp. 12 109–12 119

work page 2023
[8]

Large language model for multi-domain translation: Benchmarking and domain cot fine-tuning,

T. Hu, P. Zhang, B. Yang, J. Xie, D. F. Wong, and R. Wang, “Large language model for multi-domain translation: Benchmarking and domain cot fine-tuning,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2024, pp. 5726–5746

work page 2024
[9]

Understanding the dark side of llms’ intrinsic self- correction,

Q. Zhang, D. Wang, H. Qian, Y . Li, T. Zhang, M. Huang, K. Xu, H. Li, L. Yan, and H. Qiu, “Understanding the dark side of llms’ intrinsic self- correction,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 27 066–27 101

work page 2025
[10]

Fine-tuned language models are continual learners,

T. Scialom, T. Chakrabarty, and S. Muresan, “Fine-tuned language models are continual learners,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2022, pp. 6107–6122

work page 2022
[11]

Diversity as a reward: Fine-tuning llms on a mixture of domain-undetermined data,

Z. Ling, D. Chen, L. Yao, Y . Li, and Y . Shen, “Diversity as a reward: Fine-tuning llms on a mixture of domain-undetermined data,”CoRR, vol. abs/2502.04380, 2025

work page arXiv 2025
[12]

A comprehensive survey on source-free domain adaptation,

J. Li, Z. Yu, Z. Du, L. Zhu, and H. T. Shen, “A comprehensive survey on source-free domain adaptation,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 8, pp. 5743–5762, 2024

work page 2024
[13]

Boosting multi-domain fine-tuning of large language models through evolving interactions between samples,

X. Liang, L. Yang, J. Wang, Y . Lu, R. Wu, H. Chen, and J. Hao, “Boosting multi-domain fine-tuning of large language models through evolving interactions between samples,” inProc. Int. Conf. Mach. Learn., 2025

work page 2025
[14]

Metagpt: Meta programming for A multi-agent collaborative framework,

S. Hong, M. Zhuge, J. Chen, X. Zheng, Y . Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu, and J. Schmidhuber, “Metagpt: Meta programming for A multi-agent collaborative framework,” inProc. Int. Conf. Learn. Represention, 2024

work page 2024
[15]

A dynamic llm- powered agent network for task-oriented agent collaboration,

Z. Liu, Y . Zhang, P. Li, Y . Liu, and D. Yang, “A dynamic llm- powered agent network for task-oriented agent collaboration,”CoRR, vol. abs/2310.02170, 2024

work page arXiv 2024
[16]

Cut the crap: An economical communication pipeline for llm-based multi-agent systems,

G. Zhang, Y . Yue, Z. Li, S. Yun, G. Wan, K. Wang, D. Cheng, J. X. Yu, and T. Chen, “Cut the crap: An economical communication pipeline for llm-based multi-agent systems,” inProc. Int. Conf. Learn. Repre., 2025. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. XX, XXXX 202X 14

work page 2025
[17]

G- designer: Architecting multi-agent communication topologies via graph neural networks,

G. Zhang, Y . Yue, X. Sun, M. Yu, K. Wang, T. Chen, and D. Cheng, “G- designer: Architecting multi-agent communication topologies via graph neural networks,” inProc. Int. Conf. Mach. Learn., 2025

work page 2025
[18]

A multi-agent framework with automated decision rule optimization for cross-domain misinformation detection,

H. Li, A. Wang, K. Li, Z. Wang, L. Zhang, D. Qiu, Q. Liu, and J. Su, “A multi-agent framework with automated decision rule optimization for cross-domain misinformation detection,” inProc. Conf. Empir. Methods Nat. Lang. Process., Nov. 2025

work page 2025
[19]

M4LE: A multi-ability multi-range multi-task multi- domain long-context evaluation benchmark for large language models,

W. Kwan, X. Zeng, Y . Wang, Y . Sun, L. Li, Y . Jiang, L. Shang, Q. Liu, and K. Wong, “M4LE: A multi-ability multi-range multi-task multi- domain long-context evaluation benchmark for large language models,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 15 568– 15 592

work page 2024
[20]

M 3cot: A novel benchmark for multi-domain multi-step multi-modal chain-of- thought,

Q. Chen, L. Qin, J. Zhang, Z. Chen, X. Xu, and W. Che, “M 3cot: A novel benchmark for multi-domain multi-step multi-modal chain-of- thought,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 8199–8221

work page 2024
[21]

Polynarrative: A multilingual, multilabel, multi-domain dataset for narrative extraction from news articles,

N. Nikolaidis, N. Stefanovitch, P. Silvano, D. I. Dimitrov, R. Yangarber, N. Guimar ˜aes, E. Sartori, I. Androutsopoulos, P. Nakov, G. D. S. Martino, and J. Piskorski, “Polynarrative: A multilingual, multilabel, multi-domain dataset for narrative extraction from news articles,” in Proc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 31 323– 31 345

work page 2025
[22]

A prompt-driven framework for multi-domain knowledge tracing,

Z. Liu, S. Huang, T. Guo, M. Hou, and Q. Liang, “A prompt-driven framework for multi-domain knowledge tracing,”Mach. Learn., vol. 114, no. 4, Feb. 2025

work page 2025
[23]

Lora: Low-rank adaptation of large language models,

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” inProc. Int. Conf. Learn. Represention, 2022

work page 2022
[24]

Scalable multi-domain adaptation of language models using modular experts,

P. Schafhalter, S. Liao, Y . Zhou, C. Yeh, A. Kandoor, and J. Laudon, “Scalable multi-domain adaptation of language models using modular experts,”CoRR, vol. abs/2410.10181, 2024

work page arXiv 2024
[25]

Dynamic expert specialization: To- wards catastrophic forgetting-free multi-domain moe adaptation,

J. Li, B. Wang, X. Zhou, and X. Hu, “Dynamic expert specialization: To- wards catastrophic forgetting-free multi-domain moe adaptation,”CoRR, vol. abs/2509.16882, 2025

work page arXiv 2025
[26]

Llm-blender: Ensembling large language models with pairwise ranking and generative fusion,

D. Jiang, X. Ren, and B. Y . Lin, “Llm-blender: Ensembling large language models with pairwise ranking and generative fusion,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2023, pp. 14 165–14 178

work page 2023
[27]

Exploring collaboration mechanisms for LLM agents: A social psychology view,

J. Zhang, X. Xu, N. Zhang, R. Liu, B. Hooi, and S. Deng, “Exploring collaboration mechanisms for LLM agents: A social psychology view,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 14 544– 14 607

work page 2024
[28]

Improving factuality and reasoning in language models through multiagent debate,

Y . Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, “Improving factuality and reasoning in language models through multiagent debate,” inProc. Int. Conf. Mach. Learn., 2024

work page 2024
[29]

L2MAC: large language model automatic computer for unbounded code generation,

S. Holt, M. R. Luyten, and M. van der Schaar, “L2MAC: large language model automatic computer for unbounded code generation,”CoRR, vol. abs/2310.02003, 2023

work page arXiv 2023
[30]

Chatdev: Com- municative agents for software development,

C. Qian, W. Liu, H. Liu, N. Chen, Y . Dang, J. Li, C. Yang, W. Chen, Y . Su, X. Cong, J. Xu, D. Li, Z. Liu, and M. Sun, “Chatdev: Com- municative agents for software development,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 15 174–15 186

work page 2024
[31]

Autogen: Enabling next-gen LLM applications via multi- agent conversations,

Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu, A. H. Awadallah, R. W. White, D. Burger, and C. Wang, “Autogen: Enabling next-gen LLM applications via multi- agent conversations,” inProc. Conf. Lang. Model., 2024

work page 2024
[32]

Large language model is a good policy teacher for training reinforcement learning agents,

Z. Zhou, B. Hu, P. Zhang, C. Zhao, and B. Liu, “Large language model is a good policy teacher for training reinforcement learning agents,” CoRR, vol. abs/2311.13373, 2023

work page arXiv 2023
[33]

Self-organized agents: A LLM multi- agent framework toward ultra large-scale code generation and optimiza- tion,

Y . Ishibashi and Y . Nishimura, “Self-organized agents: A LLM multi- agent framework toward ultra large-scale code generation and optimiza- tion,”CoRR, vol. abs/2404.02183, 2024

work page arXiv 2024
[34]

Gptswarm: Language agents as optimizable graphs,

M. Zhuge, W. Wang, L. Kirsch, F. Faccio, D. Khizbullin, and J. Schmid- huber, “Gptswarm: Language agents as optimizable graphs,” inProc. Int. Conf. Mach. Learn., 2024

work page 2024
[35]

Agentdropout: Dynamic agent elimination for token-efficient and high- performance llm-based multi-agent collaboration,

Z. Wang, Y . Wang, X. Liu, L. Ding, M. Zhang, J. Liu, and M. Zhang, “Agentdropout: Dynamic agent elimination for token-efficient and high- performance llm-based multi-agent collaboration,” inProc. Annu. Meet- ing Assoc. Comput. Linguistics, 2025, pp. 24 013–24 035

work page 2025
[36]

Assemble your crew: Au- tomatic multi-agent communication topology design via autoregressive graph generation,

S. Li, Y . Liu, Q. Wen, C. Zhang, and S. Pan, “Assemble your crew: Au- tomatic multi-agent communication topology design via autoregressive graph generation,” inProc. Assoc. for the Advan. of Arti. Intell., 2026, pp. 23 142–23 150

work page 2026
[37]

To adapt or to annotate: Challenges and interventions for domain adaptation in open-domain question answering,

D. Dua, E. Strubell, S. Singh, and P. Verga, “To adapt or to annotate: Challenges and interventions for domain adaptation in open-domain question answering,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2023, pp. 14 429–14 446

work page 2023
[38]

On the transferability of causal knowledge for language models,

G. Dey and Y . K. Lal, “On the transferability of causal knowledge for language models,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 8–14

work page 2025
[39]

Variational Graph Auto-Encoders

T. N. Kipf and M. Welling, “Variational graph auto-encoders,”CoRR, vol. abs/1611.07308, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[40]

GraphInsight: Unlocking insights in large language models for graph structure under- standing,

Y . Cao, S. Han, Z. Gao, Z. Ding, X. Xie, and S. K. Zhou, “GraphInsight: Unlocking insights in large language models for graph structure under- standing,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 12 096–12 134

work page 2025
[41]

Attribution- aware weight transfer: A warm-start initialization for class-incremental semantic segmentation,

D. Goswami, R. Schuster, J. van de Weijer, and D. Stricker, “Attribution- aware weight transfer: A warm-start initialization for class-incremental semantic segmentation,” inProc. Winter Conf. Appl. Comput. Vis., 2023, pp. 3194–3203

work page 2023
[42]

Fast, scalable, warm-start semidefinite programming with spectral bundling and sketching,

R. Angell and A. McCallum, “Fast, scalable, warm-start semidefinite programming with spectral bundling and sketching,” inProc. Int. Conf. Mach. Learn., 2024, pp. 1579–1615

work page 2024
[43]

Semi-supervised classification with graph convolutional networks,

T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” inProc. Int. Conf. Learn. Represention, 2017

work page 2017
[44]

BERT: pre-training of deep bidirectional transformers for language understanding,

J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” inProc. Annu. Conf. North Am. Chapter Assoc. Comput. Linguist., 2019, pp. 4171–4186

work page 2019
[45]

Unsupervised domain adaptation by backpropagation,

Y . Ganin and V . S. Lempitsky, “Unsupervised domain adaptation by backpropagation,” inProc. Int. Conf. Mach. Learn., 2015, pp. 1180– 1189

work page 2015
[46]

On learning invariant representations for domain adaptation,

H. Zhao, R. T. des Combes, K. Zhang, and G. J. Gordon, “On learning invariant representations for domain adaptation,” inProc. Int. Conf. Mach. Learn., 2019, pp. 7523–7532

work page 2019
[47]

Pac-bayesian domain adaptation bounds for multiclass learners,

A. Sicilia, K. Atwell, M. Alikhani, and S. J. Hwang, “Pac-bayesian domain adaptation bounds for multiclass learners,” inProc. Conf. Uncertainty Artif. Intell., 2022, pp. 1824–1834

work page 2022
[48]

Y . E. Nesterov,Introductory Lectures on Convex Optimization - A Basic Course, ser. Applied Optimization. Springer, 2004, vol. 87

work page 2004
[49]

Llama: Open and efficient foundation language models,

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “Llama: Open and efficient foundation language models,” 2023

work page 2023
[50]

Qwen2.5 Technical Report

A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, H. Lin, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Lin, K. Dang, K. Lu, K. Bao, K. Yang, L. Yu, M. Li, M. Xue, P. Zhang, Q. Zhu, R. Men, R. Lin, T. Li, T. Xia, X. Ren, X. Ren, Y . Fan, Y . Su, Y . Zhang, Y . Wan, Y . Liu, Z. Cui, Z. Zhang, and Z. Qiu, “Qwen2...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[51]

Deepseek-v3 technical report,

DeepSeek-AI, “Deepseek-v3 technical report,” 2025

work page 2025
[52]

Chain-of-thought prompting elicits reasoning in large language models,

J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V . Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” inProc. Adv. Neural Inf. Process. Syst., 2022

work page 2022
[53]

Retrieval-augmented generation for knowledge-intensive NLP tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W. Yih, T. Rockt ¨aschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” inProc. Adv. Neural Inf. Process. Syst., 2020

work page 2020
[54]

Modula: Mixture of domain- specific and universal lora for multi-task learning,

Y . Ma, Z. Liang, H. Dai, B. Chen, D. Gao, Z. Ran, Z. Wang, L. Jin, W. Jiang, G. Zhang, X. Cai, and L. Yang, “Modula: Mixture of domain- specific and universal lora for multi-task learning,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2024, pp. 2758–2770

work page 2024
[55]

Deep Learning using Rectified Linear Units (ReLU)

A. F. Agarap, “Deep learning using rectified linear units (relu),”CoRR, vol. abs/1803.08375, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[56]

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

J. Chung, C ¸ . G¨ulc ¸ehre, K. Cho, and Y . Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,”CoRR, vol. abs/1412.3555, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[57]

Visualizing data using t-sne,

L. van der Maaten and G. Hinton, “Visualizing data using t-sne,”Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008

work page 2008
[58]

How to read paintings: Semantic art understanding with multi-modal retrieval,

N. Garcia and G. V ogiatzis, “How to read paintings: Semantic art understanding with multi-modal retrieval,” inProc. Eur. Conf. Comput. Vis., 2018, pp. 676–691

work page 2018
[59]

CMNEE: A large-scale document-level event extraction dataset based on open-source chinese military news,

M. Zhu, Z. Xu, K. Zeng, K. Xiao, M. Wang, W. Ke, and H. Huang, “CMNEE: A large-scale document-level event extraction dataset based on open-source chinese military news,” inProc. Int. Conf. Comput. Linguist., 2024, pp. 3367–3379

work page 2024
[60]

An astronomical question answering dataset for evaluating large language models,

J. Li, F. Zhao, P. Chen, J. Xie, X. Zhang, H. Li, M. Chen, Y . Wang, and M. Zhu., “An astronomical question answering dataset for evaluating large language models,”Nature Scientific Data, 2025

work page 2025
[61]

Trec 2007 genomics track overview,

W. R. Hersh, A. M. Cohen, J. Yang, R. T. Bhupatiraju, P. M. Roberts, and M. A. Hearst, “Trec 2007 genomics track overview,” inText Retrieval Conference, 2007. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. XX, XXXX 202X 15 APPENDIXA PROOFS OFTHEORETICALRESULTS In this appendix, we provide proofs and supporting state- ments for the analy...

work page 2007

[1] [1]

Leveraging large language models for medical text classification: a hospital readmission prediction case,

N. Nazyrova, S. Chahed, T. Chausalet, and M. Dwek, “Leveraging large language models for medical text classification: a hospital readmission prediction case,” inICPRS, 2024, pp. 1–7

work page 2024

[2] [2]

Measuring massive multitask language understanding,

D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” in Proc. Int. Conf. Learn. Represent., 2021

work page 2021

[3] [3]

C-eval: A multi-level multi- discipline chinese evaluation suite for foundation models,

Y . Huang, Y . Bai, Z. Zhu, J. Zhang, J. Zhang, T. Su, J. Liu, C. Lv, Y . Zhang, J. Lei, Y . Fu, M. Sun, and J. He, “C-eval: A multi-level multi- discipline chinese evaluation suite for foundation models,” inProc. Adv. Neural Inf. Process. Syst., 2023

work page 2023

[4] [4]

Text classification and prediction in the legal domain,

M. Nghiem, P. Baylis, A. Freitas, and S. Ananiadou, “Text classification and prediction in the legal domain,” inLREC, 2022, pp. 4717–4722

work page 2022

[5] [5]

Efficient prompt optimisation for legal text classification with proxy prompt evaluator,

H. Lee, K. C. Li, M. Grabmair, and S. Xu, “Efficient prompt optimisation for legal text classification with proxy prompt evaluator,”CoRR, vol. abs/2510.08524, 2025

work page arXiv 2025

[6] [6]

Multi-prompt alignment for multi-source unsupervised domain adaptation,

H. Chen, X. Han, Z. Wu, and Y . Jiang, “Multi-prompt alignment for multi-source unsupervised domain adaptation,” inProc. Adv. Neural Inf. Process. Syst., 2023

work page 2023

[7] [7]

Cof-cot: Enhancing large language models with coarse-to-fine chain-of-thought prompting for multi-domain NLU tasks,

H. Nguyen, Y . Liu, C. Zhang, T. Zhang, and P. S. Yu, “Cof-cot: Enhancing large language models with coarse-to-fine chain-of-thought prompting for multi-domain NLU tasks,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2023, pp. 12 109–12 119

work page 2023

[8] [8]

Large language model for multi-domain translation: Benchmarking and domain cot fine-tuning,

T. Hu, P. Zhang, B. Yang, J. Xie, D. F. Wong, and R. Wang, “Large language model for multi-domain translation: Benchmarking and domain cot fine-tuning,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2024, pp. 5726–5746

work page 2024

[9] [9]

Understanding the dark side of llms’ intrinsic self- correction,

Q. Zhang, D. Wang, H. Qian, Y . Li, T. Zhang, M. Huang, K. Xu, H. Li, L. Yan, and H. Qiu, “Understanding the dark side of llms’ intrinsic self- correction,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 27 066–27 101

work page 2025

[10] [10]

Fine-tuned language models are continual learners,

T. Scialom, T. Chakrabarty, and S. Muresan, “Fine-tuned language models are continual learners,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2022, pp. 6107–6122

work page 2022

[11] [11]

Diversity as a reward: Fine-tuning llms on a mixture of domain-undetermined data,

Z. Ling, D. Chen, L. Yao, Y . Li, and Y . Shen, “Diversity as a reward: Fine-tuning llms on a mixture of domain-undetermined data,”CoRR, vol. abs/2502.04380, 2025

work page arXiv 2025

[12] [12]

A comprehensive survey on source-free domain adaptation,

J. Li, Z. Yu, Z. Du, L. Zhu, and H. T. Shen, “A comprehensive survey on source-free domain adaptation,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 8, pp. 5743–5762, 2024

work page 2024

[13] [13]

Boosting multi-domain fine-tuning of large language models through evolving interactions between samples,

X. Liang, L. Yang, J. Wang, Y . Lu, R. Wu, H. Chen, and J. Hao, “Boosting multi-domain fine-tuning of large language models through evolving interactions between samples,” inProc. Int. Conf. Mach. Learn., 2025

work page 2025

[14] [14]

Metagpt: Meta programming for A multi-agent collaborative framework,

S. Hong, M. Zhuge, J. Chen, X. Zheng, Y . Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu, and J. Schmidhuber, “Metagpt: Meta programming for A multi-agent collaborative framework,” inProc. Int. Conf. Learn. Represention, 2024

work page 2024

[15] [15]

A dynamic llm- powered agent network for task-oriented agent collaboration,

Z. Liu, Y . Zhang, P. Li, Y . Liu, and D. Yang, “A dynamic llm- powered agent network for task-oriented agent collaboration,”CoRR, vol. abs/2310.02170, 2024

work page arXiv 2024

[16] [16]

Cut the crap: An economical communication pipeline for llm-based multi-agent systems,

G. Zhang, Y . Yue, Z. Li, S. Yun, G. Wan, K. Wang, D. Cheng, J. X. Yu, and T. Chen, “Cut the crap: An economical communication pipeline for llm-based multi-agent systems,” inProc. Int. Conf. Learn. Repre., 2025. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. XX, XXXX 202X 14

work page 2025

[17] [17]

G- designer: Architecting multi-agent communication topologies via graph neural networks,

G. Zhang, Y . Yue, X. Sun, M. Yu, K. Wang, T. Chen, and D. Cheng, “G- designer: Architecting multi-agent communication topologies via graph neural networks,” inProc. Int. Conf. Mach. Learn., 2025

work page 2025

[18] [18]

A multi-agent framework with automated decision rule optimization for cross-domain misinformation detection,

H. Li, A. Wang, K. Li, Z. Wang, L. Zhang, D. Qiu, Q. Liu, and J. Su, “A multi-agent framework with automated decision rule optimization for cross-domain misinformation detection,” inProc. Conf. Empir. Methods Nat. Lang. Process., Nov. 2025

work page 2025

[19] [19]

M4LE: A multi-ability multi-range multi-task multi- domain long-context evaluation benchmark for large language models,

W. Kwan, X. Zeng, Y . Wang, Y . Sun, L. Li, Y . Jiang, L. Shang, Q. Liu, and K. Wong, “M4LE: A multi-ability multi-range multi-task multi- domain long-context evaluation benchmark for large language models,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 15 568– 15 592

work page 2024

[20] [20]

M 3cot: A novel benchmark for multi-domain multi-step multi-modal chain-of- thought,

Q. Chen, L. Qin, J. Zhang, Z. Chen, X. Xu, and W. Che, “M 3cot: A novel benchmark for multi-domain multi-step multi-modal chain-of- thought,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 8199–8221

work page 2024

[21] [21]

Polynarrative: A multilingual, multilabel, multi-domain dataset for narrative extraction from news articles,

N. Nikolaidis, N. Stefanovitch, P. Silvano, D. I. Dimitrov, R. Yangarber, N. Guimar ˜aes, E. Sartori, I. Androutsopoulos, P. Nakov, G. D. S. Martino, and J. Piskorski, “Polynarrative: A multilingual, multilabel, multi-domain dataset for narrative extraction from news articles,” in Proc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 31 323– 31 345

work page 2025

[22] [22]

A prompt-driven framework for multi-domain knowledge tracing,

Z. Liu, S. Huang, T. Guo, M. Hou, and Q. Liang, “A prompt-driven framework for multi-domain knowledge tracing,”Mach. Learn., vol. 114, no. 4, Feb. 2025

work page 2025

[23] [23]

Lora: Low-rank adaptation of large language models,

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” inProc. Int. Conf. Learn. Represention, 2022

work page 2022

[24] [24]

Scalable multi-domain adaptation of language models using modular experts,

P. Schafhalter, S. Liao, Y . Zhou, C. Yeh, A. Kandoor, and J. Laudon, “Scalable multi-domain adaptation of language models using modular experts,”CoRR, vol. abs/2410.10181, 2024

work page arXiv 2024

[25] [25]

Dynamic expert specialization: To- wards catastrophic forgetting-free multi-domain moe adaptation,

J. Li, B. Wang, X. Zhou, and X. Hu, “Dynamic expert specialization: To- wards catastrophic forgetting-free multi-domain moe adaptation,”CoRR, vol. abs/2509.16882, 2025

work page arXiv 2025

[26] [26]

Llm-blender: Ensembling large language models with pairwise ranking and generative fusion,

D. Jiang, X. Ren, and B. Y . Lin, “Llm-blender: Ensembling large language models with pairwise ranking and generative fusion,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2023, pp. 14 165–14 178

work page 2023

[27] [27]

Exploring collaboration mechanisms for LLM agents: A social psychology view,

J. Zhang, X. Xu, N. Zhang, R. Liu, B. Hooi, and S. Deng, “Exploring collaboration mechanisms for LLM agents: A social psychology view,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 14 544– 14 607

work page 2024

[28] [28]

Improving factuality and reasoning in language models through multiagent debate,

Y . Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, “Improving factuality and reasoning in language models through multiagent debate,” inProc. Int. Conf. Mach. Learn., 2024

work page 2024

[29] [29]

L2MAC: large language model automatic computer for unbounded code generation,

S. Holt, M. R. Luyten, and M. van der Schaar, “L2MAC: large language model automatic computer for unbounded code generation,”CoRR, vol. abs/2310.02003, 2023

work page arXiv 2023

[30] [30]

Chatdev: Com- municative agents for software development,

C. Qian, W. Liu, H. Liu, N. Chen, Y . Dang, J. Li, C. Yang, W. Chen, Y . Su, X. Cong, J. Xu, D. Li, Z. Liu, and M. Sun, “Chatdev: Com- municative agents for software development,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 15 174–15 186

work page 2024

[31] [31]

Autogen: Enabling next-gen LLM applications via multi- agent conversations,

Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu, A. H. Awadallah, R. W. White, D. Burger, and C. Wang, “Autogen: Enabling next-gen LLM applications via multi- agent conversations,” inProc. Conf. Lang. Model., 2024

work page 2024

[32] [32]

Large language model is a good policy teacher for training reinforcement learning agents,

Z. Zhou, B. Hu, P. Zhang, C. Zhao, and B. Liu, “Large language model is a good policy teacher for training reinforcement learning agents,” CoRR, vol. abs/2311.13373, 2023

work page arXiv 2023

[33] [33]

Self-organized agents: A LLM multi- agent framework toward ultra large-scale code generation and optimiza- tion,

Y . Ishibashi and Y . Nishimura, “Self-organized agents: A LLM multi- agent framework toward ultra large-scale code generation and optimiza- tion,”CoRR, vol. abs/2404.02183, 2024

work page arXiv 2024

[34] [34]

Gptswarm: Language agents as optimizable graphs,

M. Zhuge, W. Wang, L. Kirsch, F. Faccio, D. Khizbullin, and J. Schmid- huber, “Gptswarm: Language agents as optimizable graphs,” inProc. Int. Conf. Mach. Learn., 2024

work page 2024

[35] [35]

Agentdropout: Dynamic agent elimination for token-efficient and high- performance llm-based multi-agent collaboration,

Z. Wang, Y . Wang, X. Liu, L. Ding, M. Zhang, J. Liu, and M. Zhang, “Agentdropout: Dynamic agent elimination for token-efficient and high- performance llm-based multi-agent collaboration,” inProc. Annu. Meet- ing Assoc. Comput. Linguistics, 2025, pp. 24 013–24 035

work page 2025

[36] [36]

Assemble your crew: Au- tomatic multi-agent communication topology design via autoregressive graph generation,

S. Li, Y . Liu, Q. Wen, C. Zhang, and S. Pan, “Assemble your crew: Au- tomatic multi-agent communication topology design via autoregressive graph generation,” inProc. Assoc. for the Advan. of Arti. Intell., 2026, pp. 23 142–23 150

work page 2026

[37] [37]

To adapt or to annotate: Challenges and interventions for domain adaptation in open-domain question answering,

D. Dua, E. Strubell, S. Singh, and P. Verga, “To adapt or to annotate: Challenges and interventions for domain adaptation in open-domain question answering,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2023, pp. 14 429–14 446

work page 2023

[38] [38]

On the transferability of causal knowledge for language models,

G. Dey and Y . K. Lal, “On the transferability of causal knowledge for language models,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 8–14

work page 2025

[39] [39]

Variational Graph Auto-Encoders

T. N. Kipf and M. Welling, “Variational graph auto-encoders,”CoRR, vol. abs/1611.07308, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[40] [40]

GraphInsight: Unlocking insights in large language models for graph structure under- standing,

Y . Cao, S. Han, Z. Gao, Z. Ding, X. Xie, and S. K. Zhou, “GraphInsight: Unlocking insights in large language models for graph structure under- standing,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 12 096–12 134

work page 2025

[41] [41]

Attribution- aware weight transfer: A warm-start initialization for class-incremental semantic segmentation,

D. Goswami, R. Schuster, J. van de Weijer, and D. Stricker, “Attribution- aware weight transfer: A warm-start initialization for class-incremental semantic segmentation,” inProc. Winter Conf. Appl. Comput. Vis., 2023, pp. 3194–3203

work page 2023

[42] [42]

Fast, scalable, warm-start semidefinite programming with spectral bundling and sketching,

R. Angell and A. McCallum, “Fast, scalable, warm-start semidefinite programming with spectral bundling and sketching,” inProc. Int. Conf. Mach. Learn., 2024, pp. 1579–1615

work page 2024

[43] [43]

Semi-supervised classification with graph convolutional networks,

T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” inProc. Int. Conf. Learn. Represention, 2017

work page 2017

[44] [44]

BERT: pre-training of deep bidirectional transformers for language understanding,

J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” inProc. Annu. Conf. North Am. Chapter Assoc. Comput. Linguist., 2019, pp. 4171–4186

work page 2019

[45] [45]

Unsupervised domain adaptation by backpropagation,

Y . Ganin and V . S. Lempitsky, “Unsupervised domain adaptation by backpropagation,” inProc. Int. Conf. Mach. Learn., 2015, pp. 1180– 1189

work page 2015

[46] [46]

On learning invariant representations for domain adaptation,

H. Zhao, R. T. des Combes, K. Zhang, and G. J. Gordon, “On learning invariant representations for domain adaptation,” inProc. Int. Conf. Mach. Learn., 2019, pp. 7523–7532

work page 2019

[47] [47]

Pac-bayesian domain adaptation bounds for multiclass learners,

A. Sicilia, K. Atwell, M. Alikhani, and S. J. Hwang, “Pac-bayesian domain adaptation bounds for multiclass learners,” inProc. Conf. Uncertainty Artif. Intell., 2022, pp. 1824–1834

work page 2022

[48] [48]

Y . E. Nesterov,Introductory Lectures on Convex Optimization - A Basic Course, ser. Applied Optimization. Springer, 2004, vol. 87

work page 2004

[49] [49]

Llama: Open and efficient foundation language models,

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “Llama: Open and efficient foundation language models,” 2023

work page 2023

[50] [50]

Qwen2.5 Technical Report

A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, H. Lin, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Lin, K. Dang, K. Lu, K. Bao, K. Yang, L. Yu, M. Li, M. Xue, P. Zhang, Q. Zhu, R. Men, R. Lin, T. Li, T. Xia, X. Ren, X. Ren, Y . Fan, Y . Su, Y . Zhang, Y . Wan, Y . Liu, Z. Cui, Z. Zhang, and Z. Qiu, “Qwen2...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[51] [51]

Deepseek-v3 technical report,

DeepSeek-AI, “Deepseek-v3 technical report,” 2025

work page 2025

[52] [52]

Chain-of-thought prompting elicits reasoning in large language models,

J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V . Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” inProc. Adv. Neural Inf. Process. Syst., 2022

work page 2022

[53] [53]

Retrieval-augmented generation for knowledge-intensive NLP tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W. Yih, T. Rockt ¨aschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” inProc. Adv. Neural Inf. Process. Syst., 2020

work page 2020

[54] [54]

Modula: Mixture of domain- specific and universal lora for multi-task learning,

Y . Ma, Z. Liang, H. Dai, B. Chen, D. Gao, Z. Ran, Z. Wang, L. Jin, W. Jiang, G. Zhang, X. Cai, and L. Yang, “Modula: Mixture of domain- specific and universal lora for multi-task learning,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2024, pp. 2758–2770

work page 2024

[55] [55]

Deep Learning using Rectified Linear Units (ReLU)

A. F. Agarap, “Deep learning using rectified linear units (relu),”CoRR, vol. abs/1803.08375, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[56] [56]

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

J. Chung, C ¸ . G¨ulc ¸ehre, K. Cho, and Y . Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,”CoRR, vol. abs/1412.3555, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[57] [57]

Visualizing data using t-sne,

L. van der Maaten and G. Hinton, “Visualizing data using t-sne,”Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008

work page 2008

[58] [58]

How to read paintings: Semantic art understanding with multi-modal retrieval,

N. Garcia and G. V ogiatzis, “How to read paintings: Semantic art understanding with multi-modal retrieval,” inProc. Eur. Conf. Comput. Vis., 2018, pp. 676–691

work page 2018

[59] [59]

CMNEE: A large-scale document-level event extraction dataset based on open-source chinese military news,

M. Zhu, Z. Xu, K. Zeng, K. Xiao, M. Wang, W. Ke, and H. Huang, “CMNEE: A large-scale document-level event extraction dataset based on open-source chinese military news,” inProc. Int. Conf. Comput. Linguist., 2024, pp. 3367–3379

work page 2024

[60] [60]

An astronomical question answering dataset for evaluating large language models,

J. Li, F. Zhao, P. Chen, J. Xie, X. Zhang, H. Li, M. Chen, Y . Wang, and M. Zhu., “An astronomical question answering dataset for evaluating large language models,”Nature Scientific Data, 2025

work page 2025

[61] [61]

Trec 2007 genomics track overview,

W. R. Hersh, A. M. Cohen, J. Yang, R. T. Bhupatiraju, P. M. Roberts, and M. A. Hearst, “Trec 2007 genomics track overview,” inText Retrieval Conference, 2007. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. XX, XXXX 202X 15 APPENDIXA PROOFS OFTHEORETICALRESULTS In this appendix, we provide proofs and supporting state- ments for the analy...

work page 2007