arxiv: 2601.22285 · v6 · submitted 2026-01-29 · 💻 cs.LG

Recognition: no theorem link

Demystifying Mergeability: Interpretable Properties to Predict Model Merging Success

Luca Zhou , Bo Zhao , Rose Yu , Emanuele Rodol\`a

Authors on Pith no claims yet

Pith reviewed 2026-05-16 09:47 UTC · model grok-4.3

classification 💻 cs.LG

keywords model merginggradient alignmentmergeabilityfine-tuned modelspairwise metricsL1 regularizationtask compatibilityinterpretability

0 comments

The pith

Gradient alignment between fine-tuned models is the strongest predictor of successful merging across methods and tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Model merging combines knowledge from separately trained models, but the factors that make two models compatible have been unclear. The paper demonstrates that merge success is not an intrinsic property of the models themselves but depends on the specific merging technique and the pair of tasks involved. By applying L1-regularized linear optimization to a collection of interpretable pairwise metrics, including gradient L2 distance, the authors identify which signals best correlate with post-merge normalized accuracy. Gradient alignment metrics stand out as the most consistent and fundamental indicators across five different merging methods. This supplies a diagnostic approach that can forecast merge outcomes without performing the merge itself.

Core claim

Mergeability depends on both the merging method and the partner tasks rather than being intrinsic to the models. L1-regularized linear optimization over interpretable pairwise metrics, such as gradient L2 distance, reveals that gradient alignment metrics are the most reliable signals of compatibility and consistently correlate with post-merge accuracy. Architecture- and method-specific variation exists, with an average 64.0 percent top-5 metric overlap and 79.3 percent sign agreement, yet TIES and similar methods show distinct fingerprints while gradient alignment remains the shared core driver.

What carries the argument

L1-regularized linear optimization applied to a set of interpretable pairwise metrics (gradient L2 distance and similar quantities) to predict normalized post-merge accuracy.

If this is right

Gradient alignment can serve as a pre-merge diagnostic to select compatible model pairs.
Different merging methods such as TIES possess distinct sets of predictive metrics.
Task pairs can be evaluated for merge potential using gradient-based metrics before any merging is attempted.
The framework supports development of fine-tuning procedures that deliberately improve gradient alignment.
Architecture-specific differences in metric importance can guide method selection for particular model families.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners could compute gradient metrics on candidate models to rank merge partners without running full merge experiments.
Fine-tuning objectives might be modified to encourage better gradient alignment between related tasks.
The same metrics could help decide which merging method to apply to a given pair of models.

Load-bearing premise

The chosen collection of pairwise metrics plus L1-regularized linear optimization is sufficient to reveal the true drivers of merge success without overlooking important unmeasured factors.

What would settle it

Finding model pairs that exhibit strong gradient alignment yet produce low post-merge accuracy, or weak alignment yet high accuracy, would falsify the claim that gradient alignment is the fundamental signal.

read the original abstract

Model merging combines knowledge from separately fine-tuned models, yet the factors driving its success remain poorly understood. While recent work treats mergeability as an intrinsic property of the models, we show with an architecture-agnostic framework that it fundamentally depends on both the merging method and the partner tasks. Using L1-regularized linear optimization over a set of interpretable pairwise metrics (e.g., gradient $L_2$ distance), we uncover properties correlating with post-merge normalized accuracy across five merging methods. We find architecture- and method-specific variation in success drivers (64.0% average top-5 metric overlap; 79.3% sign agreement), with certain methods, notably TIES, exhibiting distinct ``fingerprints'' that diverge from the broader consensus. Crucially, however, \textit{gradient alignment} metrics consistently emerge as the most fundamental signals of compatibility. These findings provide a diagnostic foundation for understanding mergeability and motivate future merge-aware fine-tuning strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Gradient alignment metrics emerge as the steadiest predictors of merge success, but the L1-linear setup risks missing non-linear drivers.

read the letter

The main thing to know is that this paper supplies a clean, architecture-agnostic way to measure pairwise properties and link them to post-merge accuracy, with gradient alignment coming out on top across methods. It also documents real variation by merging technique, especially the distinct pattern for TIES. Those two results are the concrete advance. The 64% top-5 metric overlap and 79% sign agreement give numbers that later work can check directly, and the metrics are defined before looking at outcomes, so the correlations are not built in by construction. That part is useful for anyone who wants diagnostics instead of another merging recipe. The framework itself is new enough in the merging literature to stand on its own. The soft spot is the L1-regularized linear regression used to rank the metrics. If the actual relationships involve interactions or curvature terms that the chosen metric set does not capture, the selection will systematically understate or mis-order the drivers. The low overlap across methods already hints that the ranking is not rock-solid, so the claim that gradients are the most fundamental signal rests on an assumption that linearity plus the current metric list is sufficient. Without checks for non-linear effects or omitted variables, that part stays provisional. The paper is aimed at people working inside model merging who need tools to decide which models can be combined before running the merge. A reader who cares about empirical characterization rather than new algorithms will get the most from it. I would send it to peer review because the framework and the method comparisons are worth referee scrutiny even if the linear model needs more validation.

Referee Report

3 major / 2 minor

Summary. The paper claims that model mergeability depends on both the merging method and partner tasks rather than being an intrinsic model property. Using an architecture-agnostic framework, it applies L1-regularized linear regression to a set of interpretable pairwise metrics (e.g., gradient L2 distance) to predict post-merge normalized accuracy across five merging methods. The analysis reveals method-specific variation in top predictors (64% average top-5 overlap, 79.3% sign agreement) but identifies gradient alignment metrics as the most consistent signals of compatibility.

Significance. If the empirical correlations hold under fuller validation, the work supplies a diagnostic toolkit of interpretable metrics that could guide merge-aware fine-tuning and method selection. The emphasis on architecture-agnostic pairwise metrics and the observation of method-specific fingerprints are constructive contributions that move beyond black-box merge success prediction.

major comments (3)

[§4] §4 (Experimental Setup): No details are supplied on the datasets, number of fine-tuned models, validation splits, number of random seeds, or statistical significance testing for the reported correlations. Without these, it is impossible to determine whether the claimed superiority of gradient alignment metrics is robust or sensitive to unstated post-hoc choices.
[§5.1] §5.1 (Metric Selection and Regression): The 64% average top-5 metric overlap across methods already signals instability in the selected drivers. This variability, combined with the exclusive use of L1-linear regression, raises the possibility that non-linear interactions or omitted variables (e.g., curvature or task-specific loss landscape terms) are the true drivers; the manuscript provides no ablation or comparison against non-linear models to rule this out.
[§5.3] §5.3 (Gradient Alignment Claim): The central assertion that gradient alignment metrics are the most fundamental signals rests on their consistent ranking under L1 regularization. Because the paper does not test whether this ranking survives removal of the linearity assumption or addition of interaction terms, the claim remains conditional on an unverified modeling choice.

minor comments (2)

[§3] Notation for normalized accuracy and the precise definition of each pairwise metric should be collected in a single table for easy reference.
[Figures] Figure captions should explicitly state what the error bars represent and whether results are averaged over multiple random seeds.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions we will make to improve clarity and robustness.

read point-by-point responses

Referee: [§4] §4 (Experimental Setup): No details are supplied on the datasets, number of fine-tuned models, validation splits, number of random seeds, or statistical significance testing for the reported correlations. Without these, it is impossible to determine whether the claimed superiority of gradient alignment metrics is robust or sensitive to unstated post-hoc choices.

Authors: We agree that §4 requires expanded details for full reproducibility and to substantiate the robustness of the gradient alignment findings. In the revised manuscript we will add: the exact datasets and tasks used, the total number of fine-tuned models, validation split ratios, the number of random seeds (five seeds were employed), and statistical significance tests (including p-values and confidence intervals for the reported correlations). These additions will allow readers to evaluate sensitivity to experimental choices. revision: yes
Referee: [§5.1] §5.1 (Metric Selection and Regression): The 64% average top-5 metric overlap across methods already signals instability in the selected drivers. This variability, combined with the exclusive use of L1-linear regression, raises the possibility that non-linear interactions or omitted variables (e.g., curvature or task-specific loss landscape terms) are the true drivers; the manuscript provides no ablation or comparison against non-linear models to rule this out.

Authors: We disagree that the reported 64% top-5 overlap indicates instability; it instead quantifies the method-specific variation that constitutes one of the paper’s central contributions, as further supported by the 79.3% sign agreement and the distinct fingerprints observed for methods such as TIES. The deliberate choice of L1-regularized linear regression prioritizes interpretability of the pairwise metrics, which is essential to the diagnostic toolkit we aim to provide. While non-linear models might capture additional interactions, they would sacrifice the transparency needed to identify consistent signals such as gradient alignment. In the revision we will add an explicit discussion of this design choice and the associated trade-off. revision: partial
Referee: [§5.3] §5.3 (Gradient Alignment Claim): The central assertion that gradient alignment metrics are the most fundamental signals rests on their consistent ranking under L1 regularization. Because the paper does not test whether this ranking survives removal of the linearity assumption or addition of interaction terms, the claim remains conditional on an unverified modeling choice.

Authors: The claim is explicitly conditioned on the linear regression framework we adopted for interpretability. Within that framework, gradient alignment metrics rank consistently across all five merging methods, providing the most reliable signal we observe. We acknowledge that relaxing the linearity assumption could be informative; however, doing so would move away from the interpretable properties that are the paper’s focus. In the revision we will clarify the scope of the claim and list non-linear extensions as a direction for future work. revision: no

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper defines interpretable pairwise metrics (e.g., gradient L2 distance) independently of merge outcomes and applies L1-regularized linear regression to identify correlations with post-merge normalized accuracy. This is a standard empirical correlation analysis with no reduction of the claimed result to a fitted quantity by construction, no self-definitional loops, and no load-bearing self-citations or ansatzes invoked in the provided text. The finding that gradient alignment metrics emerge as key signals is data-driven rather than tautological.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that a linear model over a hand-chosen set of pairwise metrics can recover the drivers of merge success after L1 regularization; no new entities are postulated.

free parameters (1)

L1 regularization strength
Controls sparsity in the linear model that selects which metrics predict merge accuracy.

axioms (1)

domain assumption Linear relationship between selected pairwise metrics and normalized post-merge accuracy
Invoked to justify the use of L1-regularized linear optimization for uncovering correlations.

pith-pipeline@v0.9.0 · 5469 in / 1129 out tokens · 25324 ms · 2026-05-16T09:47:05.919474+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Beyond Perplexity: A Geometric and Spectral Study of Low-Rank Pre-Training
cs.LG 2026-05 unverdicted novelty 7.0

Low-rank pre-training methods converge to geometrically and spectrally distinct basins from full-rank training and from each other, even at similar validation perplexity.
Model Merging: Foundations and Algorithms
cs.LG 2026-05 unverdicted novelty 6.0

New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.