Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
Pith reviewed 2026-05-17 22:12 UTC · model grok-4.3
The pith
Model merging combines trained models without new data or heavy retraining, and this survey organizes the methods into a fresh taxonomy while mapping their uses in language models and many other settings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Model merging is an efficient empowerment technique that avoids collecting raw training data and expensive computation; a new taxonomic approach exhaustively classifies the methods, their theories are reviewed, applications are shown across large language models, multimodal large language models, and more than ten subfields, and remaining challenges plus future directions are highlighted.
What carries the argument
The new taxonomic approach that exhaustively classifies existing model merging methods and serves as the organizing structure for the review of techniques, applications, and open problems.
If this is right
- Merging supports continual learning by letting models gain new abilities without erasing prior ones.
- Multi-task learning can use merging to handle several tasks with one combined model rather than separate training runs.
- Few-shot learning gains from merging as a route to quick adaptation using limited examples.
- The same methods apply across more than ten machine learning subfields, indicating wide practical reach.
- Open challenges in scaling and compatibility point to concrete next steps for theory and practice.
Where Pith is reading between the lines
- The taxonomy could serve as a checklist for spotting which merging strategies remain under-tested in new domains.
- Links to related ideas such as model editing may suggest hybrid methods that combine merging with other lightweight updates.
- Applying the taxonomy to papers published after the survey would test how durable the classification remains.
Load-bearing premise
The proposed taxonomy covers every current model merging method and the reviewed literature accurately represents the state of the field without major omissions.
What would settle it
A search that turns up several well-known model merging papers that fall outside the new taxonomy or were missed in the review would show the survey is incomplete.
read the original abstract
Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. This survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and more than ten machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions. A comprehensive list of papers about model merging is available at https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a survey on model merging techniques. It claims to fill a literature gap by providing a comprehensive overview of methods and theories, introducing a new taxonomic approach that exhaustively classifies existing techniques, reviewing applications in LLMs, MLLMs, and over ten ML subfields (e.g., continual learning, multi-task learning, few-shot learning), discussing challenges, and outlining future directions. It includes a linked GitHub repository curating cited papers.
Significance. If the taxonomy proves exhaustive and the summaries of methods/theories/applications are accurate without major omissions, the survey would be a useful organizational contribution in a fast-growing area. The public, updatable GitHub list and cross-domain application coverage add practical value for researchers working on efficient model adaptation without retraining.
major comments (1)
- [Taxonomy section] Taxonomy section: the assertion that the proposed taxonomy 'exhaustively discusses existing model merging methods' is central to the survey's contribution but lacks explicit justification or a completeness argument (e.g., search strategy, cutoff date, or handling of edge cases like merging in non-transformer architectures). This risks undercutting the claim of systematic coverage.
minor comments (2)
- [Abstract] Abstract: the phrase 'more than ten machine learning subfields' is vague; enumerating the subfields or providing a table of coverage would improve reader orientation.
- [Introduction / Conclusion] GitHub repository: confirm that the linked resource (https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications) is actively maintained and includes all cited works with DOIs or arXiv IDs for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our survey. We address the major comment below and will revise the manuscript to incorporate the suggested clarifications.
read point-by-point responses
-
Referee: Taxonomy section: the assertion that the proposed taxonomy 'exhaustively discusses existing model merging methods' is central to the survey's contribution but lacks explicit justification or a completeness argument (e.g., search strategy, cutoff date, or handling of edge cases like merging in non-transformer architectures). This risks undercutting the claim of systematic coverage.
Authors: We agree that an explicit justification for the taxonomy's coverage would strengthen the manuscript. In the revised version, we will add a dedicated paragraph (or subsection) in the Taxonomy section describing our literature review process. This will include the search strategy (keywords such as 'model merging', 'weight interpolation', 'task arithmetic' on arXiv and Google Scholar), the inclusion criteria, and a cutoff date (papers up to July 2024). For edge cases such as non-transformer architectures, we will clarify that the taxonomy is intentionally architecture-agnostic and derived from core merging operations rather than model-specific details; however, the bulk of published work applies these methods to transformers. We will briefly note any existing examples involving CNNs or other architectures and acknowledge that coverage of non-transformer cases remains limited in the current literature, marking it as an opportunity for future work. revision: yes
Circularity Check
No significant circularity in this literature survey
full rationale
This paper is a survey that reviews existing model merging techniques, proposes an organizational taxonomy, covers applications across domains, and lists future directions. No derivations, predictions, fitted parameters, or mathematical claims are present that could reduce to self-definition or self-citation. The taxonomy is an explicit organizational contribution rather than a derived result, and the GitHub repository serves as a supporting reference list without creating load-bearing circularity. The work is self-contained as a review with no internal equations or predictive steps to inspect for equivalence to inputs.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We first propose a new taxonomic approach that exhaustively discusses existing model merging methods... pre-merging and during-merging phases (§2)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 17 Pith papers
-
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
-
Differentially Private Model Merging
Post-processing via random selection or linear combination generates differentially private models for arbitrary privacy parameters from pre-trained models on the same dataset.
-
Understanding and Enforcing Weight Disentanglement in Task Arithmetic
Task-Feature Specialization explains weight disentanglement in task arithmetic and leads to orthogonality, which OrthoReg enforces to enhance performance of model composition methods.
-
From OSS to Open Source AI: an Exploratory Study of Collaborative Development Paradigm Divergence
Open source AI shows lower collaboration intensity, reduced direct contributions, and a shift toward adaptive use rather than joint improvement compared to traditional OSS.
-
MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification
MetaMoE unifies domain-specialized experts into a single MoE via diversity-aware public proxy selection that approximates private data distributions for router training and expert alignment.
-
Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem
Treating retention as the dominant task and using constructive gradient synthesis like SAGO allows LLM unlearning to achieve higher general performance recovery without weakening the forgetting effect.
-
Zero-Shot Synthetic-to-Real Handwritten Text Recognition via Task Analogies
A method learns synthetic-to-real parameter corrections from source languages and transfers them to target languages without any real target data, improving HTR across five languages and six models.
-
Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning
Adaptive regularization guided by training-time safety risk signals from judges or activations prevents safety degradation in fine-tuned language models while preserving utility.
-
AP-BMM: Approximating Capability-Cost Pareto Sets of LLMs via Asynchronous Prior-Guided Bayesian Model Merging
AP-BMM approximates Pareto sets of layer-wise merged LLMs for accuracy-cost trade-offs via prior-guided asynchronous Bayesian optimization and reranking.
-
TRINITY: An Evolved LLM Coordinator
A compact 0.6B-parameter coordinator with a 10K-parameter head uses evolutionary strategy to dynamically delegate roles to LLMs, achieving SOTA results such as 86.2% on LiveCodeBench.
-
Muon is Scalable for LLM Training
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
-
ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging
ORBIT preserves foundational language capabilities during generative retrieval fine-tuning by using origin-regulated weight averaging to constrain parameter drift beyond a distance threshold.
-
Black-Box Optimization of Mixed Binary-Continuous Variables: Challenges and Opportunities in Evolutionary Model Merging
Data flow space model merging is formalized as a mixed binary-continuous black-box optimization problem, where a structured approach respecting variable dependencies achieves 6.7% higher accuracy and 51.4% smaller sea...
-
Can Continual Pre-training Bridge the Performance Gap between General-purpose and Specialized Language Models in the Medical Domain?
Continual pre-training on a German medical corpus lets 7B models close much of the performance gap with 24B general models on medical benchmarks, though merging introduces some language mixing and verbosity.
-
Domain-Adaptive Model Merging Across Disconnected Modes
DMM merges highly divergent domain-specific models without data sharing by synthesizing pseudo-data from normalization statistics and distilling knowledge, achieving state-of-the-art performance on unimodal and multim...
-
Retrofit: Continual Learning with Controlled Forgetting for Binary Security Detection and Analysis
RETROFIT enables continual learning for malware detection and binary summarization by retrospective-free parameter merging with low-rank sparse updates and confidence-guided arbitration, improving retention and genera...
-
World Simulation with Video Foundation Models for Physical AI
Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.
Reference graph
Works this paper leans on
-
[1]
Javier Abad, Konstantin Donhauser, Francesco Pinto, and Fanny Yang. 2024. Strong Copyright Protection for Language Models via Adaptive Model Fusion.ICML(2024)
work page 2024
-
[2]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Linara Adilova, Asja Fischer, and Martin Jaggi. 2024. Layerwise linear mode connectivity.ICLR(2024)
work page 2024
-
[4]
Emanuele Aiello, Lili Yu, Yixin Nie, Armen Aghajanyan, and Barlas Oguz. 2024. Jointly training large autoregressive multimodal models.ICLR(2024)
work page 2024
-
[5]
Samuel Ainsworth, Jonathan Hayase, and Siddhartha Srinivasa. 2023. Git Re-Basin: Merging Models modulo Permu- tation Symmetries. InICLR
work page 2023
-
[6]
Takuya Akiba, Makoto Shing, Yujin Tang, Qi Sun, and David Ha. 2025. Evolutionary optimization of model merging recipes.Nature Machine Intelligence7, 2 (2025), 195–204
work page 2025
- [7]
-
[8]
Devansh Arpit, Huan Wang, Yingbo Zhou, and Caiming Xiong. 2022. Ensemble of averages: Improving model selection and boosting performance in domain generalization.NeurIPS35 (2022), 8265–8277
work page 2022
- [9]
-
[10]
Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. GAN Cocktail: mixing GANs without dataset access. InECCV. Springer, 205–221
work page 2022
-
[11]
Robert Belanec, Simon Ostermann, Ivan Srba, and Maria Bielikova. 2024. Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer.arXiv preprint arXiv:2408.01119(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
Gregory Benton, Wesley Maddox, Sanae Lotfi, and Andrew Gordon Gordon Wilson. 2021. Loss surface simplexes for mode connecting volumes and fast ensembling. InICML. PMLR, 769–779
work page 2021
- [13]
- [14]
-
[15]
Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2019. Nuanced metrics for measuring unintended bias with real data for text classification. InWWW. 491–500
work page 2019
-
[16]
Ruisi Cai, Zhenyu Zhang, and Zhangyang Wang. 2023. Robust weight signatures: gaining robustness as easy as patching weights?. InICML. PMLR, 3495–3506
work page 2023
- [17]
-
[18]
Rich Caruana. 1997. Multitask learning.Machine learning28 (1997), 41–75
work page 1997
-
[19]
Junbum Cha, Sanghyuk Chun, Kyungjae Lee, Han-Cheol Cho, Seunghyun Park, Yunsung Lee, and Sungrae Park. 2021. Swad: Domain generalization by seeking flat minima.NeurIPS34 (2021), 22405–22418. J. ACM, Vol. 00, No. 0, Article 000. Publication date: 0000. Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities 000:31
work page 2021
-
[20]
Chi Chen, Yiyang Du, Zheng Fang, Ziyue Wang, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, et al. 2024. Model Composition for Multimodal Large Language Models.ACL(2024)
work page 2024
-
[21]
Guangyao Chen, Peixi Peng, Yangru Huang, Mengyue Geng, and Yonghong Tian. 2024. Adaptive Discovering and Merging for Incremental Novel Class Discovery. InAAAI, Vol. 38. 11276–11284
work page 2024
-
[22]
Weiyu Chen and James Kwok. 2025. Pareto Merging: Multi-Objective Optimization for Preference-Aware Model Merging. InICML
work page 2025
-
[23]
Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, et al. 2024. Do not think that much for 2+ 3=? on the overthinking of o1-like llms.arXiv preprint arXiv:2412.21187(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[24]
Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. 2018. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. InICML. PMLR, 794–803
work page 2018
- [25]
-
[26]
Runxi Cheng, Feng Xiong, Yongxian Wei, Wanyun Zhu, and Chun Yuan. 2025. Whoever Started the interference Should End It: Guiding Data-Free Model Merging via Task Vectors. InICML
work page 2025
- [27]
-
[28]
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. 2023. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research24, 240 (2023), 1–113
work page 2023
-
[29]
Alexandra Chronopoulou, Matthew E Peters, Alexander Fraser, and Jesse Dodge. 2023. AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models. InEACL. 2009–2018
work page 2023
-
[30]
Alexandra Chronopoulou, Jonas Pfeiffer, Joshua Maynez, Xinyi Wang, Sebastian Ruder, and Priyanka Agrawal
- [31]
-
[32]
Tianshuo Cong, Delong Ran, Zesen Liu, Xinlei He, Jinyuan Liu, Yichen Gong, Qi Li, Anyu Wang, and Xiaoyun Wang
- [33]
- [34]
-
[35]
Francesco Croce, Sylvestre-Alvise Rebuffi, Evan Shelhamer, and Sven Gowal. 2023. Seasoning model soups for robustness to adversarial and natural distribution shifts. InCVPR. 12313–12323
work page 2023
-
[36]
Nico Daheim, Thomas Möllenhoff, Edoardo Ponti, Iryna Gurevych, and Mohammad Emtiyaz Khan. 2024. Model Merging by Uncertainty-Based Gradient Matching. InICLR
work page 2024
-
[37]
Damai Dai, Chengqi Deng, Chenggang Zhao, RX Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Yu Wu, et al. 2024. Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models. arXiv preprint arXiv:2401.06066(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[38]
Rui Dai, Sile Hu, Xu Shen, Yonggang Zhang, Xinmei Tian, and Jieping Ye. 2025. Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs. InICLR
work page 2025
- [39]
- [40]
-
[41]
Jasper Dekoninck, Marc Fischer, Luca Beurer-Kellner, and Martin Vechev. 2024. Controlled Text Generation via Language Model Arithmetic.ICLR(2024)
work page 2024
-
[42]
Caglar Demir, Arnab Sharma, and Axel-Cyrille Ngonga Ngomo. 2024. Adaptive Stochastic Weight Averaging.JMLR (2024)
work page 2024
-
[43]
Thomas G Dietterich et al. 2002. Ensemble learning.The handbook of brain theory and neural networks2, 1 (2002), 110–125
work page 2002
-
[44]
Omkar Dige, Diljot Singh, Tsz Fung Yau, Qixuan Zhang, Borna Bolandraftar, Xiaodan Zhu, and Faiza Khan Khattak
- [45]
-
[46]
Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, et al. 2023. Parameter-efficient fine-tuning of large-scale pre-trained language models.Nature Machine Intelligence5, 3 (2023), 220–235
work page 2023
-
[47]
Guangyao Dou, Zheyuan Liu, Qing Lyu, Kaize Ding, and Eric Wong. 2024. Avoiding Copyright Infringement via Machine Unlearning.arXiv preprint arXiv:2406.10952(2024). J. ACM, Vol. 00, No. 0, Article 000. Publication date: 0000. 000:32 Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xiaochun Cao, Jie Zhang, and Dacheng Tao
-
[48]
Felix Draxler, Kambis Veschgini, Manfred Salmhofer, and Fred Hamprecht. 2018. Essentially no barriers in neural network energy landscape. InICML. PMLR, 1309–1318
work page 2018
-
[49]
Guodong Du, Junlin Lee, Jing Li, Runhua Jiang, Yifei Guo, Shuyang Yu, Hanting Liu, Sim K Goh, Ho-Kin Tang, Daojing He, et al. 2024. Parameter competition balancing for model merging.NeurIPS37 (2024), 84746–84776
work page 2024
-
[50]
Rahim Entezari, Hanie Sedghi, Olga Saukh, and Behnam Neyshabur. 2022. The role of permutation invariance in linear mode connectivity of neural networks.ICLR(2022)
work page 2022
-
[51]
Damien Ferbach, Baptiste Goujaud, Gauthier Gidel, and Aymeric Dieuleveut. 2024. Proving linear mode connectivity of neural networks via optimal transport. InAISTATS. PMLR, 3853–3861
work page 2024
-
[52]
Ronald A Fisher. 1922. On the mathematical foundations of theoretical statistics.Philosophical transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character222, 594-604 (1922), 309–368
work page 1922
-
[53]
Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. 2021. Sharpness-aware minimization for efficiently improving generalization.ICLR(2021)
work page 2021
-
[54]
Louis Fournier, Adel Nabli, Masih Aminbeidokhti, Marco Pedersoli, Eugene Belilovsky, and Edouard Oyallon
- [55]
-
[56]
Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, and Michael Carbin. 2020. Linear mode connectivity and the lottery ticket hypothesis. InICML. PMLR, 3259–3269
work page 2020
-
[57]
Tingchen Fu, Deng Cai, Lemao Liu, Shuming Shi, and Rui Yan. 2024. Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction.ACL(2024)
work page 2024
- [58]
- [59]
-
[60]
Antonio Andrea Gargiulo, Donato Crisostomi, Maria Sofia Bucarelli, Simone Scardapane, Fabrizio Silvestri, and Emanuele Rodola. 2025. Task singular vectors: Reducing task interference in model merging. InCVPR. 18695–18705
work page 2025
-
[61]
Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry P Vetrov, and Andrew G Wilson. 2018. Loss surfaces, mode connectivity, and fast ensembling of dnns.NeurIPS31 (2018)
work page 2018
- [63]
-
[64]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[65]
Vipul Gupta, Santiago Akle Serrano, and Dennis DeCoste. 2020. Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well. InICLR. OpenReview.net
work page 2020
- [66]
-
[67]
Moritz Hardt, Ben Recht, and Yoram Singer. 2016. Train faster, generalize better: Stability of stochastic gradient descent. InICML. PMLR, 1225–1234
work page 2016
- [68]
- [69]
-
[70]
Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, and Dacheng Tao. 2023. Merging Experts into One: Improving Computational Efficiency of Mixture of Experts. InEMNLP
work page 2023
-
[71]
Yifei He, Yuzheng Hu, Yong Lin, Tong Zhang, and Han Zhao. 2024. Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic.Transactions on Machine Learning Research(2024)
work page 2024
-
[72]
Yifei He, Siqi Zeng, Yuzheng Hu, Rui Yang, Tong Zhang, and Han Zhao. 2025. MergeBench: A Benchmark for Merging Domain-Specialized LLMs.NeurIPS 2025 Datasets and Benchmarks Track(2025)
work page 2025
-
[73]
Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations.ICLR(2019)
work page 2019
-
[74]
Oğuz Kağan Hitit, Leander Girrbach, and Zeynep Akata. 2025. A Systematic Study of Model Merging Techniques in Large Language Models.arXiv preprint arXiv:2511.21437(2025). J. ACM, Vol. 00, No. 0, Article 000. Publication date: 0000. Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities 000:33
-
[75]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models.NeurIPS33 (2020), 6840–6851
work page 2020
-
[76]
Chris Jay Hoofnagle, Bart Van Der Sloot, and Frederik Zuiderveen Borgesius. 2019. The European Union general data protection regulation: what it is and what it means.Information & Communications Technology Law28, 1 (2019), 65–98
work page 2019
-
[77]
Stefan Horoi, Albert Manuel Orozco Camacho, Eugene Belilovsky, and Guy Wolf. 2024. Harmony in diversity: Merging neural networks with canonical correlation analysis. InICML
work page 2024
-
[78]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
-
[79]
LoRA: Low-Rank Adaptation of Large Language Models. InICLR
-
[80]
Xinshuo Hu, Dongfang Li, Baotian Hu, Zihao Zheng, Zhenyu Liu, and Min Zhang. 2024. Separate the wheat from the chaff: Model deficiency unlearning via parameter-efficient module operation. InAAAI, Vol. 38. 18252–18260
work page 2024
-
[81]
Chengsong Huang, Qian Liu, Bill Yuchen Lin, Tianyu Pang, Chao Du, and Min Lin. 2024. LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition.COLM(2024)
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.