pith. sign in

Demystifying Mergeability: Interpretable Properties to Predict Model Merging Success

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

Model merging combines knowledge from separately fine-tuned models, yet the factors driving its success remain poorly understood. While recent work treats mergeability as an intrinsic property of the models, we show with an architecture-agnostic framework that it fundamentally depends on both the merging method and the partner tasks. Using L1-regularized linear optimization over a set of interpretable pairwise metrics (e.g., gradient L_2 distance), we uncover properties correlating with post-merge normalized accuracy across five merging methods. We find architecture- and method-specific variation in success drivers (64.0% average top-5 metric overlap; 79.3% sign agreement), with certain methods, notably TIES, exhibiting distinct ``fingerprints'' that diverge from the broader consensus. Crucially, however, gradient alignment metrics consistently emerge as the most fundamental signals of compatibility. These findings provide a diagnostic foundation for understanding mergeability and motivate future merge-aware fine-tuning strategies.

citation-role summary

background 1

citation-polarity summary

fields

cs.LG 2

years

2026 2

verdicts

UNVERDICTED 2

roles

background 1

polarities

background 1

representative citing papers

Model Merging: Foundations and Algorithms

cs.LG · 2026-05-02 · unverdicted · novelty 6.0

New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.

citing papers explorer

Showing 2 of 2 citing papers.

  • Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training cs.LG · 2026-05-13 · unverdicted · none · ref 30 · 2 links · internal anchor

    Low-rank pre-training methods converge to geometrically and spectrally distinct basins and show diverging activations compared to full-rank training at 60M-350M scales.

  • Model Merging: Foundations and Algorithms cs.LG · 2026-05-02 · unverdicted · none · ref 207 · internal anchor

    New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.