Learning Transferable Topology Priors for Multi-Agent LLM Collaboration Across Domains
Pith reviewed 2026-05-20 13:32 UTC · model grok-4.3
The pith
TopoPrior learns reusable collaboration graph priors offline from multiple domains to initialize structures for new multi-agent LLM queries.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TopoPrior employs a conditional variational graph framework to capture reusable structural regularities across domains in a latent space, then uses a query-conditioned latent adaptation module with adversarial alignment to produce initial collaboration graphs that reduce domain discrepancies while keeping query-relevant variation for downstream topology evolution.
What carries the argument
Conditional variational graph framework that encodes reusable structural regularities from offline multi-domain collaboration graphs, paired with adversarial alignment to adapt those encodings to new queries.
If this is right
- Plugging the learned initial graphs into several different topology-evolution backbones produces consistent gains on multi-domain reasoning tasks.
- Online inference uses fewer tokens because the starting collaboration structure is already closer to effective ones.
- Only a modest number of extra trainable parameters are added beyond the base backbones.
- The approach works without changing the core search or evolution logic of existing methods.
Where Pith is reading between the lines
- If the priors hold across wider ranges of agent counts, the same offline learning step could support systems with dozens of agents where per-query search would otherwise become prohibitive.
- The latent-space transfer idea might extend to other structured outputs in LLM agents, such as debate trees or shared memory layouts, rather than only collaboration graphs.
- Collecting reference graphs from even more varied sources could further strengthen the priors, suggesting a path toward domain-general multi-agent scaffolding.
Load-bearing premise
The structural patterns found in collaboration graphs from several past domains are similar enough that they can be learned once and then adjusted for a new query without losing useful details or adding mismatches that hurt final performance.
What would settle it
Run the method on held-out domains and compare final task accuracy and token use when starting from the learned priors versus starting from random or per-query scratch graphs; if the prior-initialized versions show no consistent gains, the transfer claim does not hold.
Figures
read the original abstract
Large language model (LLM)-based multi-agent systems have shown strong potential for complex reasoning by coordinating specialized agents through structured communication. However, existing topology-evolution methods typically construct or optimize a collaboration topology for each query from scratch, leading to substantial online search overhead, high inference-time token consumption, and limited scalability in multi-domain settings. We propose TopoPrior, a framework for learning transferable topology priors for multi-agent LLM collaboration across domains. Rather than repeatedly searching for effective collaboration structures online, TopoPrior learns reusable topology priors from reference collaboration graphs collected offline from multiple domains and uses them to generate query-conditioned initial collaboration graphs for downstream refinement. By shifting part of topology search from per-query online optimization to offline prior learning, TopoPrior amortizes search cost while remaining compatible with existing topology-evolution backbones. Technically, TopoPrior contains two key components. First, a transferable topology prior learning module employs a conditional variational graph framework to capture reusable structural regularities across domains in a latent space. Second, a query-conditioned latent adaptation module introduces adversarial alignment to reduce unnecessary domain discrepancy while preserving query-relevant structural variation. Experiments on multi-domain reasoning benchmarks show that TopoPrior consistently improves several heterogeneous topology-evolution backbones while reducing online inference-time token usage, with only modest additional trainable parameters. These results suggest that transferable topology initialization is an effective and lightweight mechanism for improving the efficiency of multi-agent LLM collaboration across domains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes TopoPrior, a framework that learns reusable topology priors from offline multi-domain reference collaboration graphs via a conditional variational graph model to capture structural regularities in a latent space, then applies a query-conditioned latent adaptation module with adversarial alignment to generate initial collaboration graphs for downstream refinement in multi-agent LLM systems. The approach aims to amortize per-query topology search costs, reduce online inference-time token usage, and improve performance while remaining compatible with existing topology-evolution backbones, with experiments claiming consistent gains on multi-domain reasoning benchmarks using only modest additional trainable parameters.
Significance. If the transferability of latent structural regularities holds and the adversarial alignment preserves query-relevant variation without introducing harmful bias, the result would be significant for scaling multi-agent LLM collaboration: it provides a lightweight, amortizable mechanism for topology initialization that addresses the online search overhead of current per-query optimization methods. The compatibility with heterogeneous backbones and focus on cross-domain reuse are practical strengths that could influence efficiency-focused work in LLM agent systems.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): the claim of 'consistent improvements' and 'reducing online inference-time token usage' is reported without any quantitative values, error bars, ablation results, data split details, or baseline descriptions. This absence is load-bearing because the central claim—that offline prior learning amortizes search cost while improving performance—cannot be evaluated without these elements.
- [§3.2] §3.2 (Query-conditioned latent adaptation module): the adversarial alignment is introduced to reduce domain discrepancy while preserving query-relevant variation, yet no ablation on alignment strength, single- versus multi-domain training, or latent-space transfer gaps is provided. Without these, the weakest assumption—that reusable structural regularities survive alignment without collapse or harmful bias—remains untested and directly affects whether the generated initial graphs add benefit or bias.
minor comments (2)
- [§3.1] Notation for the conditional variational graph (e.g., latent variable definitions and conditioning) could be clarified with an explicit diagram or expanded equations to improve readability for readers unfamiliar with variational graph models.
- [§4] The manuscript would benefit from a table summarizing the additional trainable parameters across backbones and a clearer statement of the exact multi-domain benchmarks used.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the presentation of experimental claims and the need for additional validation of the adversarial alignment component. We address each major comment below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): the claim of 'consistent improvements' and 'reducing online inference-time token usage' is reported without any quantitative values, error bars, ablation results, data split details, or baseline descriptions. This absence is load-bearing because the central claim—that offline prior learning amortizes search cost while improving performance—cannot be evaluated without these elements.
Authors: We agree that the abstract and the opening of §4 would be clearer with explicit quantitative summaries. The full experimental section (§4) contains tables reporting performance metrics across backbones and domains, token consumption reductions, standard deviations from repeated runs, data split details, and baseline descriptions. We will revise the abstract and §4 to include key numerical highlights (e.g., average accuracy gains and token savings) along with direct pointers to the tables and figures, while preserving the existing results. revision: yes
-
Referee: [§3.2] §3.2 (Query-conditioned latent adaptation module): the adversarial alignment is introduced to reduce domain discrepancy while preserving query-relevant variation, yet no ablation on alignment strength, single- versus multi-domain training, or latent-space transfer gaps is provided. Without these, the weakest assumption—that reusable structural regularities survive alignment without collapse or harmful bias—remains untested and directly affects whether the generated initial graphs add benefit or bias.
Authors: We concur that targeted ablations would better substantiate the design choices in the query-conditioned latent adaptation module. The current manuscript presents the overall framework and end-to-end results but omits explicit sensitivity analysis on alignment strength, single- versus multi-domain training comparisons, and metrics for latent-space transfer gaps. We will add these ablations in the revised version, including a hyperparameter sweep on the adversarial loss coefficient, a single-domain training baseline, and quantitative measures (e.g., MMD or reconstruction fidelity) of domain alignment in the latent space. revision: yes
Circularity Check
No significant circularity; derivation relies on independent offline training and adaptation modules
full rationale
The paper describes a standard transfer-learning setup in which a conditional variational graph model is trained on offline multi-domain reference collaboration graphs to encode structural regularities, after which an adversarial alignment module adapts the latent space for new queries before feeding the result into existing topology-evolution backbones. None of the core components (variational graph encoder, adversarial alignment, or prior generation) are defined in terms of the final downstream performance metric, and the provided abstract and description contain no self-citations, fitted parameters renamed as predictions, or equations that reduce the claimed transferability to a tautology by construction. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- latent dimension size
- adversarial alignment strength
axioms (1)
- domain assumption Collaboration topologies contain reusable structural regularities across domains that can be captured in a latent space
invented entities (1)
-
TopoPrior framework
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinctionreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
conditional variational graph framework to capture reusable structural regularities across domains in a latent space
-
IndisputableMonolith/Cost/FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
query-conditioned latent adaptation module introduces adversarial alignment
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
N. Nazyrova, S. Chahed, T. Chausalet, and M. Dwek, “Leveraging large language models for medical text classification: a hospital readmission prediction case,” inICPRS, 2024, pp. 1–7
work page 2024
-
[2]
Measuring massive multitask language understanding,
D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” in Proc. Int. Conf. Learn. Represent., 2021
work page 2021
-
[3]
C-eval: A multi-level multi- discipline chinese evaluation suite for foundation models,
Y . Huang, Y . Bai, Z. Zhu, J. Zhang, J. Zhang, T. Su, J. Liu, C. Lv, Y . Zhang, J. Lei, Y . Fu, M. Sun, and J. He, “C-eval: A multi-level multi- discipline chinese evaluation suite for foundation models,” inProc. Adv. Neural Inf. Process. Syst., 2023
work page 2023
-
[4]
Text classification and prediction in the legal domain,
M. Nghiem, P. Baylis, A. Freitas, and S. Ananiadou, “Text classification and prediction in the legal domain,” inLREC, 2022, pp. 4717–4722
work page 2022
-
[5]
Efficient prompt optimisation for legal text classification with proxy prompt evaluator,
H. Lee, K. C. Li, M. Grabmair, and S. Xu, “Efficient prompt optimisation for legal text classification with proxy prompt evaluator,”CoRR, vol. abs/2510.08524, 2025
-
[6]
Multi-prompt alignment for multi-source unsupervised domain adaptation,
H. Chen, X. Han, Z. Wu, and Y . Jiang, “Multi-prompt alignment for multi-source unsupervised domain adaptation,” inProc. Adv. Neural Inf. Process. Syst., 2023
work page 2023
-
[7]
H. Nguyen, Y . Liu, C. Zhang, T. Zhang, and P. S. Yu, “Cof-cot: Enhancing large language models with coarse-to-fine chain-of-thought prompting for multi-domain NLU tasks,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2023, pp. 12 109–12 119
work page 2023
-
[8]
Large language model for multi-domain translation: Benchmarking and domain cot fine-tuning,
T. Hu, P. Zhang, B. Yang, J. Xie, D. F. Wong, and R. Wang, “Large language model for multi-domain translation: Benchmarking and domain cot fine-tuning,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2024, pp. 5726–5746
work page 2024
-
[9]
Understanding the dark side of llms’ intrinsic self- correction,
Q. Zhang, D. Wang, H. Qian, Y . Li, T. Zhang, M. Huang, K. Xu, H. Li, L. Yan, and H. Qiu, “Understanding the dark side of llms’ intrinsic self- correction,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 27 066–27 101
work page 2025
-
[10]
Fine-tuned language models are continual learners,
T. Scialom, T. Chakrabarty, and S. Muresan, “Fine-tuned language models are continual learners,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2022, pp. 6107–6122
work page 2022
-
[11]
Diversity as a reward: Fine-tuning llms on a mixture of domain-undetermined data,
Z. Ling, D. Chen, L. Yao, Y . Li, and Y . Shen, “Diversity as a reward: Fine-tuning llms on a mixture of domain-undetermined data,”CoRR, vol. abs/2502.04380, 2025
-
[12]
A comprehensive survey on source-free domain adaptation,
J. Li, Z. Yu, Z. Du, L. Zhu, and H. T. Shen, “A comprehensive survey on source-free domain adaptation,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 8, pp. 5743–5762, 2024
work page 2024
-
[13]
X. Liang, L. Yang, J. Wang, Y . Lu, R. Wu, H. Chen, and J. Hao, “Boosting multi-domain fine-tuning of large language models through evolving interactions between samples,” inProc. Int. Conf. Mach. Learn., 2025
work page 2025
-
[14]
Metagpt: Meta programming for A multi-agent collaborative framework,
S. Hong, M. Zhuge, J. Chen, X. Zheng, Y . Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu, and J. Schmidhuber, “Metagpt: Meta programming for A multi-agent collaborative framework,” inProc. Int. Conf. Learn. Represention, 2024
work page 2024
-
[15]
Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization
Z. Liu, Y . Zhang, P. Li, Y . Liu, and D. Yang, “A dynamic llm- powered agent network for task-oriented agent collaboration,”CoRR, vol. abs/2310.02170, 2024
-
[16]
Cut the crap: An economical communication pipeline for llm-based multi-agent systems,
G. Zhang, Y . Yue, Z. Li, S. Yun, G. Wan, K. Wang, D. Cheng, J. X. Yu, and T. Chen, “Cut the crap: An economical communication pipeline for llm-based multi-agent systems,” inProc. Int. Conf. Learn. Repre., 2025. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. XX, XXXX 202X 14
work page 2025
-
[17]
G- designer: Architecting multi-agent communication topologies via graph neural networks,
G. Zhang, Y . Yue, X. Sun, M. Yu, K. Wang, T. Chen, and D. Cheng, “G- designer: Architecting multi-agent communication topologies via graph neural networks,” inProc. Int. Conf. Mach. Learn., 2025
work page 2025
-
[18]
H. Li, A. Wang, K. Li, Z. Wang, L. Zhang, D. Qiu, Q. Liu, and J. Su, “A multi-agent framework with automated decision rule optimization for cross-domain misinformation detection,” inProc. Conf. Empir. Methods Nat. Lang. Process., Nov. 2025
work page 2025
-
[19]
W. Kwan, X. Zeng, Y . Wang, Y . Sun, L. Li, Y . Jiang, L. Shang, Q. Liu, and K. Wong, “M4LE: A multi-ability multi-range multi-task multi- domain long-context evaluation benchmark for large language models,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 15 568– 15 592
work page 2024
-
[20]
M 3cot: A novel benchmark for multi-domain multi-step multi-modal chain-of- thought,
Q. Chen, L. Qin, J. Zhang, Z. Chen, X. Xu, and W. Che, “M 3cot: A novel benchmark for multi-domain multi-step multi-modal chain-of- thought,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 8199–8221
work page 2024
-
[21]
N. Nikolaidis, N. Stefanovitch, P. Silvano, D. I. Dimitrov, R. Yangarber, N. Guimar ˜aes, E. Sartori, I. Androutsopoulos, P. Nakov, G. D. S. Martino, and J. Piskorski, “Polynarrative: A multilingual, multilabel, multi-domain dataset for narrative extraction from news articles,” in Proc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 31 323– 31 345
work page 2025
-
[22]
A prompt-driven framework for multi-domain knowledge tracing,
Z. Liu, S. Huang, T. Guo, M. Hou, and Q. Liang, “A prompt-driven framework for multi-domain knowledge tracing,”Mach. Learn., vol. 114, no. 4, Feb. 2025
work page 2025
-
[23]
Lora: Low-rank adaptation of large language models,
E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” inProc. Int. Conf. Learn. Represention, 2022
work page 2022
-
[24]
Scalable multi-domain adaptation of language models using modular experts,
P. Schafhalter, S. Liao, Y . Zhou, C. Yeh, A. Kandoor, and J. Laudon, “Scalable multi-domain adaptation of language models using modular experts,”CoRR, vol. abs/2410.10181, 2024
-
[25]
Dynamic expert specialization: To- wards catastrophic forgetting-free multi-domain moe adaptation,
J. Li, B. Wang, X. Zhou, and X. Hu, “Dynamic expert specialization: To- wards catastrophic forgetting-free multi-domain moe adaptation,”CoRR, vol. abs/2509.16882, 2025
-
[26]
Llm-blender: Ensembling large language models with pairwise ranking and generative fusion,
D. Jiang, X. Ren, and B. Y . Lin, “Llm-blender: Ensembling large language models with pairwise ranking and generative fusion,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2023, pp. 14 165–14 178
work page 2023
-
[27]
Exploring collaboration mechanisms for LLM agents: A social psychology view,
J. Zhang, X. Xu, N. Zhang, R. Liu, B. Hooi, and S. Deng, “Exploring collaboration mechanisms for LLM agents: A social psychology view,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 14 544– 14 607
work page 2024
-
[28]
Improving factuality and reasoning in language models through multiagent debate,
Y . Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, “Improving factuality and reasoning in language models through multiagent debate,” inProc. Int. Conf. Mach. Learn., 2024
work page 2024
-
[29]
L2MAC: large language model automatic computer for unbounded code generation,
S. Holt, M. R. Luyten, and M. van der Schaar, “L2MAC: large language model automatic computer for unbounded code generation,”CoRR, vol. abs/2310.02003, 2023
-
[30]
Chatdev: Com- municative agents for software development,
C. Qian, W. Liu, H. Liu, N. Chen, Y . Dang, J. Li, C. Yang, W. Chen, Y . Su, X. Cong, J. Xu, D. Li, Z. Liu, and M. Sun, “Chatdev: Com- municative agents for software development,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2024, pp. 15 174–15 186
work page 2024
-
[31]
Autogen: Enabling next-gen LLM applications via multi- agent conversations,
Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu, A. H. Awadallah, R. W. White, D. Burger, and C. Wang, “Autogen: Enabling next-gen LLM applications via multi- agent conversations,” inProc. Conf. Lang. Model., 2024
work page 2024
-
[32]
Large language model is a good policy teacher for training reinforcement learning agents,
Z. Zhou, B. Hu, P. Zhang, C. Zhao, and B. Liu, “Large language model is a good policy teacher for training reinforcement learning agents,” CoRR, vol. abs/2311.13373, 2023
-
[33]
Y . Ishibashi and Y . Nishimura, “Self-organized agents: A LLM multi- agent framework toward ultra large-scale code generation and optimiza- tion,”CoRR, vol. abs/2404.02183, 2024
-
[34]
Gptswarm: Language agents as optimizable graphs,
M. Zhuge, W. Wang, L. Kirsch, F. Faccio, D. Khizbullin, and J. Schmid- huber, “Gptswarm: Language agents as optimizable graphs,” inProc. Int. Conf. Mach. Learn., 2024
work page 2024
-
[35]
Z. Wang, Y . Wang, X. Liu, L. Ding, M. Zhang, J. Liu, and M. Zhang, “Agentdropout: Dynamic agent elimination for token-efficient and high- performance llm-based multi-agent collaboration,” inProc. Annu. Meet- ing Assoc. Comput. Linguistics, 2025, pp. 24 013–24 035
work page 2025
-
[36]
S. Li, Y . Liu, Q. Wen, C. Zhang, and S. Pan, “Assemble your crew: Au- tomatic multi-agent communication topology design via autoregressive graph generation,” inProc. Assoc. for the Advan. of Arti. Intell., 2026, pp. 23 142–23 150
work page 2026
-
[37]
D. Dua, E. Strubell, S. Singh, and P. Verga, “To adapt or to annotate: Challenges and interventions for domain adaptation in open-domain question answering,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2023, pp. 14 429–14 446
work page 2023
-
[38]
On the transferability of causal knowledge for language models,
G. Dey and Y . K. Lal, “On the transferability of causal knowledge for language models,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 8–14
work page 2025
-
[39]
Variational Graph Auto-Encoders
T. N. Kipf and M. Welling, “Variational graph auto-encoders,”CoRR, vol. abs/1611.07308, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[40]
GraphInsight: Unlocking insights in large language models for graph structure under- standing,
Y . Cao, S. Han, Z. Gao, Z. Ding, X. Xie, and S. K. Zhou, “GraphInsight: Unlocking insights in large language models for graph structure under- standing,” inProc. Annu. Meeting Assoc. Comput. Linguistics, 2025, pp. 12 096–12 134
work page 2025
-
[41]
D. Goswami, R. Schuster, J. van de Weijer, and D. Stricker, “Attribution- aware weight transfer: A warm-start initialization for class-incremental semantic segmentation,” inProc. Winter Conf. Appl. Comput. Vis., 2023, pp. 3194–3203
work page 2023
-
[42]
Fast, scalable, warm-start semidefinite programming with spectral bundling and sketching,
R. Angell and A. McCallum, “Fast, scalable, warm-start semidefinite programming with spectral bundling and sketching,” inProc. Int. Conf. Mach. Learn., 2024, pp. 1579–1615
work page 2024
-
[43]
Semi-supervised classification with graph convolutional networks,
T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” inProc. Int. Conf. Learn. Represention, 2017
work page 2017
-
[44]
BERT: pre-training of deep bidirectional transformers for language understanding,
J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” inProc. Annu. Conf. North Am. Chapter Assoc. Comput. Linguist., 2019, pp. 4171–4186
work page 2019
-
[45]
Unsupervised domain adaptation by backpropagation,
Y . Ganin and V . S. Lempitsky, “Unsupervised domain adaptation by backpropagation,” inProc. Int. Conf. Mach. Learn., 2015, pp. 1180– 1189
work page 2015
-
[46]
On learning invariant representations for domain adaptation,
H. Zhao, R. T. des Combes, K. Zhang, and G. J. Gordon, “On learning invariant representations for domain adaptation,” inProc. Int. Conf. Mach. Learn., 2019, pp. 7523–7532
work page 2019
-
[47]
Pac-bayesian domain adaptation bounds for multiclass learners,
A. Sicilia, K. Atwell, M. Alikhani, and S. J. Hwang, “Pac-bayesian domain adaptation bounds for multiclass learners,” inProc. Conf. Uncertainty Artif. Intell., 2022, pp. 1824–1834
work page 2022
-
[48]
Y . E. Nesterov,Introductory Lectures on Convex Optimization - A Basic Course, ser. Applied Optimization. Springer, 2004, vol. 87
work page 2004
-
[49]
Llama: Open and efficient foundation language models,
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “Llama: Open and efficient foundation language models,” 2023
work page 2023
-
[50]
A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, H. Lin, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Lin, K. Dang, K. Lu, K. Bao, K. Yang, L. Yu, M. Li, M. Xue, P. Zhang, Q. Zhu, R. Men, R. Lin, T. Li, T. Xia, X. Ren, X. Ren, Y . Fan, Y . Su, Y . Zhang, Y . Wan, Y . Liu, Z. Cui, Z. Zhang, and Z. Qiu, “Qwen2...
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [51]
-
[52]
Chain-of-thought prompting elicits reasoning in large language models,
J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V . Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” inProc. Adv. Neural Inf. Process. Syst., 2022
work page 2022
-
[53]
Retrieval-augmented generation for knowledge-intensive NLP tasks,
P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W. Yih, T. Rockt ¨aschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” inProc. Adv. Neural Inf. Process. Syst., 2020
work page 2020
-
[54]
Modula: Mixture of domain- specific and universal lora for multi-task learning,
Y . Ma, Z. Liang, H. Dai, B. Chen, D. Gao, Z. Ran, Z. Wang, L. Jin, W. Jiang, G. Zhang, X. Cai, and L. Yang, “Modula: Mixture of domain- specific and universal lora for multi-task learning,” inProc. Conf. Empir. Methods Nat. Lang. Process., 2024, pp. 2758–2770
work page 2024
-
[55]
Deep Learning using Rectified Linear Units (ReLU)
A. F. Agarap, “Deep learning using rectified linear units (relu),”CoRR, vol. abs/1803.08375, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[56]
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
J. Chung, C ¸ . G¨ulc ¸ehre, K. Cho, and Y . Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,”CoRR, vol. abs/1412.3555, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[57]
L. van der Maaten and G. Hinton, “Visualizing data using t-sne,”Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008
work page 2008
-
[58]
How to read paintings: Semantic art understanding with multi-modal retrieval,
N. Garcia and G. V ogiatzis, “How to read paintings: Semantic art understanding with multi-modal retrieval,” inProc. Eur. Conf. Comput. Vis., 2018, pp. 676–691
work page 2018
-
[59]
M. Zhu, Z. Xu, K. Zeng, K. Xiao, M. Wang, W. Ke, and H. Huang, “CMNEE: A large-scale document-level event extraction dataset based on open-source chinese military news,” inProc. Int. Conf. Comput. Linguist., 2024, pp. 3367–3379
work page 2024
-
[60]
An astronomical question answering dataset for evaluating large language models,
J. Li, F. Zhao, P. Chen, J. Xie, X. Zhang, H. Li, M. Chen, Y . Wang, and M. Zhu., “An astronomical question answering dataset for evaluating large language models,”Nature Scientific Data, 2025
work page 2025
-
[61]
Trec 2007 genomics track overview,
W. R. Hersh, A. M. Cohen, J. Yang, R. T. Bhupatiraju, P. M. Roberts, and M. A. Hearst, “Trec 2007 genomics track overview,” inText Retrieval Conference, 2007. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. XX, NO. XX, XXXX 202X 15 APPENDIXA PROOFS OFTHEORETICALRESULTS In this appendix, we provide proofs and supporting state- ments for the analy...
work page 2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.