Demystifying Mergeability: Interpretable Properties to Predict Model Merging Success

· 2026 · cs.LG · arXiv 2601.22285

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Model merging combines knowledge from separately fine-tuned models, yet the factors driving its success remain poorly understood. While recent work treats mergeability as an intrinsic property of the models, we show with an architecture-agnostic framework that it fundamentally depends on both the merging method and the partner tasks. Using L1-regularized linear optimization over a set of interpretable pairwise metrics (e.g., gradient L_2 distance), we uncover properties correlating with post-merge normalized accuracy across five merging methods. We find architecture- and method-specific variation in success drivers (64.0% average top-5 metric overlap; 79.3% sign agreement), with certain methods, notably TIES, exhibiting distinct ``fingerprints'' that diverge from the broader consensus. Crucially, however, gradient alignment metrics consistently emerge as the most fundamental signals of compatibility. These findings provide a diagnostic foundation for understanding mergeability and motivate future merge-aware fine-tuning strategies.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training

cs.LG · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

Low-rank pre-training methods converge to geometrically and spectrally distinct basins and show diverging activations compared to full-rank training at 60M-350M scales.

Model Merging: Foundations and Algorithms

cs.LG · 2026-05-02 · unverdicted · novelty 6.0

New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.

citing papers explorer

Showing 2 of 2 citing papers.

Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training cs.LG · 2026-05-13 · unverdicted · none · ref 30 · 2 links · internal anchor
Low-rank pre-training methods converge to geometrically and spectrally distinct basins and show diverging activations compared to full-rank training at 60M-350M scales.
Model Merging: Foundations and Algorithms cs.LG · 2026-05-02 · unverdicted · none · ref 207 · internal anchor
New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.

Demystifying Mergeability: Interpretable Properties to Predict Model Merging Success

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer