{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2024:S4H6UNB3RG2NF7OXXPN7QU2DJV","short_pith_number":"pith:S4H6UNB3","schema_version":"1.0","canonical_sha256":"970fea343b89b4d2fdd7bbdbf853434d56243a9020826c011b83d010fd033e7a","source":{"kind":"arxiv","id":"2408.07666","version":5},"attestation_state":"computed","paper":{"title":"Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Model merging combines trained models without new data or heavy retraining, and this survey organizes the methods into a fresh taxonomy while mapping their uses in language models and many other settings.","cross_cats":["cs.AI","cs.CL","cs.CV"],"primary_cat":"cs.LG","authors_text":"Dacheng Tao, Enneng Yang, Guibing Guo, Jie Zhang, Li Shen, Xiaochun Cao, Xingwei Wang","submitted_at":"2024-08-14T16:58:48Z","abstract_excerpt":"Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. This survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, an"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2408.07666","kind":"arxiv","version":5},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.LG","submitted_at":"2024-08-14T16:58:48Z","cross_cats_sorted":["cs.AI","cs.CL","cs.CV"],"title_canon_sha256":"7ad0ce19dce4c2a8a359d088a94f8ef4dd2c41719dfc0b8983cc770c74ad4003","abstract_canon_sha256":"2bf84cfa927f14e59dc85889c7a67959759abcc20b49a854d0576b261863614f"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:12.880279Z","signature_b64":"DPf+JKFG+AN43lVRzUrIe3WRdZAy6qEUjIiL1iAb0LOJY5XnE2llwWjb/o1bR7IA9fGaO4ibcFu39ymcyf+WDw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"970fea343b89b4d2fdd7bbdbf853434d56243a9020826c011b83d010fd033e7a","last_reissued_at":"2026-05-17T23:38:12.879582Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:12.879582Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Model merging combines trained models without new data or heavy retraining, and this survey organizes the methods into a fresh taxonomy while mapping their uses in language models and many other settings.","cross_cats":["cs.AI","cs.CL","cs.CV"],"primary_cat":"cs.LG","authors_text":"Dacheng Tao, Enneng Yang, Guibing Guo, Jie Zhang, Li Shen, Xiaochun Cao, Xingwei Wang","submitted_at":"2024-08-14T16:58:48Z","abstract_excerpt":"Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. This survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, an"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"This survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The proposed taxonomy is exhaustive and the reviewed literature accurately represents the current state of model merging techniques without significant omissions or mischaracterizations.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"The paper introduces a new taxonomy for model merging methods and reviews their applications in LLMs, MLLMs, continual learning, multi-task learning, and other subfields while outlining open challenges.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Model merging combines trained models without new data or heavy retraining, and this survey organizes the methods into a fresh taxonomy while mapping their uses in language models and many other settings.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"088b4ef5c9ea86f198bdfb6402d0946c54dd87326f25c0f6ecf4c7be6148ba95"},"source":{"id":"2408.07666","kind":"arxiv","version":5},"verdict":{"id":"7b44a381-59f4-4a83-94ad-6ea549388ee1","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T22:12:26.890443Z","strongest_claim":"This survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods.","one_line_summary":"The paper introduces a new taxonomy for model merging methods and reviews their applications in LLMs, MLLMs, continual learning, multi-task learning, and other subfields while outlining open challenges.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The proposed taxonomy is exhaustive and the reviewed literature accurately represents the current state of model merging techniques without significant omissions or mischaracterizations.","pith_extraction_headline":"Model merging combines trained models without new data or heavy retraining, and this survey organizes the methods into a fresh taxonomy while mapping their uses in language models and many other settings."},"references":{"count":299,"sample":[{"doi":"","year":2024,"title":"Javier Abad, Konstantin Donhauser, Francesco Pinto, and Fanny Yang. 2024. Strong Copyright Protection for Language Models via Adaptive Model Fusion.ICML(2024)","work_id":"d00c4ea3-b437-4d10-8dfd-88a45f6cea8c","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","ref_index":2,"cited_arxiv_id":"2303.08774","is_internal_anchor":true},{"doi":"","year":2024,"title":"Linara Adilova, Asja Fischer, and Martin Jaggi. 2024. Layerwise linear mode connectivity.ICLR(2024)","work_id":"3947a4ec-02c0-4194-ac25-f7cc9bff6a31","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Emanuele Aiello, Lili Yu, Yixin Nie, Armen Aghajanyan, and Barlas Oguz. 2024. Jointly training large autoregressive multimodal models.ICLR(2024)","work_id":"3d348625-a55d-447e-88bf-bd8ea8b4a62f","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Samuel Ainsworth, Jonathan Hayase, and Siddhartha Srinivasa. 2023. Git Re-Basin: Merging Models modulo Permu- tation Symmetries. InICLR","work_id":"38468975-2143-48ca-a4a2-1330cf5fb48e","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":299,"snapshot_sha256":"042f008d814a837bf12525746385594d9663a41e592c24de992100da56cb6e41","internal_anchors":15},"formal_canon":{"evidence_count":1,"snapshot_sha256":"f1f41b27ee5724e31d26ef0136d2edfb8dd69ea066dceb821a09f5f0a7828a6b"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2408.07666","created_at":"2026-05-17T23:38:12.879685+00:00"},{"alias_kind":"arxiv_version","alias_value":"2408.07666v5","created_at":"2026-05-17T23:38:12.879685+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2408.07666","created_at":"2026-05-17T23:38:12.879685+00:00"},{"alias_kind":"pith_short_12","alias_value":"S4H6UNB3RG2N","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"S4H6UNB3RG2NF7OX","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"S4H6UNB3","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":25,"internal_anchor_count":25,"sample":[{"citing_arxiv_id":"2502.18036","citing_title":"Harnessing Multiple Large Language Models: A Survey on LLM Ensemble","ref_index":58,"is_internal_anchor":true},{"citing_arxiv_id":"2503.16549","citing_title":"MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems","ref_index":70,"is_internal_anchor":true},{"citing_arxiv_id":"2601.05106","citing_title":"Token-Level LLM Collaboration via FusionRoute","ref_index":27,"is_internal_anchor":true},{"citing_arxiv_id":"2602.01167","citing_title":"Do All Individual Layers Help? An Empirical Study of Task-Interfering Layers in Vision-Language Models","ref_index":49,"is_internal_anchor":true},{"citing_arxiv_id":"2604.01674","citing_title":"Can Heterogeneous Language Models Be Fused?","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20408","citing_title":"Spectral Souping: A Unified Framework for Online Preference Alignment","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18904","citing_title":"Dynamic Model Merging Made Slim","ref_index":94,"is_internal_anchor":true},{"citing_arxiv_id":"2506.23104","citing_title":"DC-TTA: Divide-and-Conquer Framework for Test-Time Adaptation of Interactive Segmentation","ref_index":58,"is_internal_anchor":true},{"citing_arxiv_id":"2511.11439","citing_title":"Retrofit: Continual Learning with Controlled Forgetting for Binary Security Detection and Analysis","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2512.04695","citing_title":"TRINITY: An Evolved LLM Coordinator","ref_index":20,"is_internal_anchor":true},{"citing_arxiv_id":"2512.09972","citing_title":"AP-BMM: Approximating Capability-Cost Pareto Sets of LLMs via Asynchronous Prior-Guided Bayesian Model Merging","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2602.17546","citing_title":"Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2603.05957","citing_title":"Domain-Adaptive Model Merging Across Disconnected Modes","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14350","citing_title":"Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling","ref_index":244,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14289","citing_title":"MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12419","citing_title":"ORBIT: Preserving Foundational Language Capabilities in GenRetrieval via Origin-Regulated Merging","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12326","citing_title":"Black-Box Optimization of Mixed Binary-Continuous Variables: Challenges and Opportunities in Evolutionary Model Merging","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2511.00062","citing_title":"World Simulation with Video Foundation Models for Physical AI","ref_index":89,"is_internal_anchor":true},{"citing_arxiv_id":"2502.16982","citing_title":"Muon is Scalable for LLM Training","ref_index":66,"is_internal_anchor":true},{"citing_arxiv_id":"2604.20985","citing_title":"Differentially Private Model Merging","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2604.19394","citing_title":"Can Continual Pre-training Bridge the Performance Gap between General-purpose and Specialized Language Models in the Medical Domain?","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2604.08888","citing_title":"From OSS to Open Source AI: an Exploratory Study of Collaborative Development Paradigm Divergence","ref_index":135,"is_internal_anchor":true},{"citing_arxiv_id":"2604.09713","citing_title":"Zero-Shot Synthetic-to-Real Handwritten Text Recognition via Task Analogies","ref_index":76,"is_internal_anchor":true},{"citing_arxiv_id":"2604.14808","citing_title":"Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2604.17078","citing_title":"Understanding and Enforcing Weight Disentanglement in Task Arithmetic","ref_index":47,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":1,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/S4H6UNB3RG2NF7OXXPN7QU2DJV","json":"https://pith.science/pith/S4H6UNB3RG2NF7OXXPN7QU2DJV.json","graph_json":"https://pith.science/api/pith-number/S4H6UNB3RG2NF7OXXPN7QU2DJV/graph.json","events_json":"https://pith.science/api/pith-number/S4H6UNB3RG2NF7OXXPN7QU2DJV/events.json","paper":"https://pith.science/paper/S4H6UNB3"},"agent_actions":{"view_html":"https://pith.science/pith/S4H6UNB3RG2NF7OXXPN7QU2DJV","download_json":"https://pith.science/pith/S4H6UNB3RG2NF7OXXPN7QU2DJV.json","view_paper":"https://pith.science/paper/S4H6UNB3","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2408.07666&json=true","fetch_graph":"https://pith.science/api/pith-number/S4H6UNB3RG2NF7OXXPN7QU2DJV/graph.json","fetch_events":"https://pith.science/api/pith-number/S4H6UNB3RG2NF7OXXPN7QU2DJV/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/S4H6UNB3RG2NF7OXXPN7QU2DJV/action/timestamp_anchor","attest_storage":"https://pith.science/pith/S4H6UNB3RG2NF7OXXPN7QU2DJV/action/storage_attestation","attest_author":"https://pith.science/pith/S4H6UNB3RG2NF7OXXPN7QU2DJV/action/author_attestation","sign_citation":"https://pith.science/pith/S4H6UNB3RG2NF7OXXPN7QU2DJV/action/citation_signature","submit_replication":"https://pith.science/pith/S4H6UNB3RG2NF7OXXPN7QU2DJV/action/replication_record"}},"created_at":"2026-05-17T23:38:12.879685+00:00","updated_at":"2026-05-17T23:38:12.879685+00:00"}