pith. sign in

arxiv: 2602.05536 · v2 · pith:PIQUSN3Lnew · submitted 2026-02-05 · 💻 cs.LG · cs.AI· cs.CL· cs.CV

When Shared Knowledge Hurts: Spectral Over-Accumulation in Model Merging

Pith reviewed 2026-05-22 11:48 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CLcs.CV
keywords model mergingsingular value calibrationtask arithmeticspectral overlapfine-tuningmulti-task learningweight updates
0
0 comments X

The pith

When tasks share overlapping singular vectors, adding their model updates inflates shared directions and biases the result, but rescaling singular values after merging restores balance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that simple linear merging of fine-tuned models over-accumulates knowledge in directions where tasks share aligned singular vectors. This repeated addition inflates the corresponding singular values, pulling the merged model toward common subspaces and away from task-specific features. The authors introduce Singular Value Calibration as a training-free post-processing step that measures this overlap from the singular vectors and adjusts the inflated values to produce a more even spectrum. This correction improves several strong merging baselines on vision and language tasks and raises the performance of Task Arithmetic by 13 percent through changes to singular values alone.

Core claim

When tasks share aligned spectral directions, a simple linear combination of weight updates repeatedly accumulates these directions, inflating the singular values and biasing the merged model toward shared subspaces. Singular Value Calibration quantifies subspace overlap and rescales inflated singular values to restore a balanced spectrum.

What carries the argument

Singular Value Calibration (SVC), which quantifies subspace overlap from singular vectors of task updates and rescales the inflated singular values to produce a balanced merged spectrum.

If this is right

  • SVC improves performance when added to existing merging methods across multiple vision and language benchmarks.
  • The approach reaches state-of-the-art results among training-free merging techniques.
  • Changing only the singular values of the merged model raises Task Arithmetic performance by 13 percent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Merging algorithms may need to treat directions by their degree of cross-task overlap rather than adding every direction equally.
  • The same spectral accumulation effect could appear in other parameter-combination settings such as federated averaging.
  • Measuring overlap before merging could let practitioners decide whether calibration is worth applying for a given set of tasks.

Load-bearing premise

Subspace overlap can be quantified accurately from the singular vectors of the task updates alone, and rescaling those singular values will restore balanced performance without losing task-specific information.

What would settle it

Applying the rescaling procedure to a merged model with known high subspace overlap and observing no gain or a drop in accuracy on the individual tasks compared with the uncalibrated merge.

read the original abstract

Model merging combines multiple fine-tuned models into a single model by adding their weight updates, providing a lightweight alternative to retraining. Existing methods primarily target resolving conflicts between task updates, leaving the failure mode of over-counting shared knowledge unaddressed. We show that when tasks share aligned spectral directions (i.e., overlapping singular vectors), a simple linear combination repeatedly accumulates these directions, inflating the singular values and biasing the merged model toward shared subspaces. To mitigate this issue, we propose Singular Value Calibration (SVC), a training-free and data-free post-processing method that quantifies subspace overlap and rescales inflated singular values to restore a balanced spectrum. Across vision and language benchmarks, SVC consistently improves strong merging baselines and achieves state-of-the-art performance. Furthermore, by modifying only the singular values, SVC improves the performance of Task Arithmetic by 13.0%. Code is available at https://github.com/lyymuwu/SVC.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that model merging via linear combination of task updates over-accumulates shared spectral directions when tasks have aligned singular vectors, inflating singular values and biasing the merged model toward common subspaces. It introduces Singular Value Calibration (SVC), a training-free and data-free post-processing step that measures subspace overlap from the singular vectors of the updates and rescales the singular values to restore a balanced spectrum. Experiments across vision and language benchmarks show consistent gains over strong baselines, including a 13% improvement on Task Arithmetic.

Significance. If the central claim holds and SVC avoids distorting task-specific directions, the work identifies an important but previously unaddressed failure mode in model merging and offers a lightweight, reproducible fix that improves existing methods without retraining or data access. The code release is a clear strength for verification. The focus on spectral properties rather than only conflict resolution could influence subsequent merging research.

major comments (2)
  1. [§3.2] §3.2 (SVC formulation): The central assumption that subspace overlap can be quantified solely from the singular vectors of raw task updates and that rescaling only the singular values (leaving vectors fixed) restores balance without attenuating partially aligned private components lacks a derivation or bound. Because updates are linear combinations of shared and private features, down-weighting shared singular values risks leakage into the orthogonal complement; no explicit isolation argument or distortion analysis is provided for this data-free setting.
  2. [§4] §4 (experimental validation): The reported 13% lift on Task Arithmetic and consistent gains are promising, but the evaluation does not include controls or ablations that test whether the overlap metric isolates shared directions without systematic attenuation of task-unique information. This is load-bearing for the claim that SVC improves individual task performance rather than trading one for another.
minor comments (2)
  1. The definition of 'aligned spectral directions' in the abstract and introduction would benefit from a short mathematical illustration (e.g., cosine similarity threshold or projection example) to make the failure mode more concrete for readers.
  2. [§3] Notation for the overlap quantification (presumably a set-wise cosine or projection between singular-vector matrices) should be introduced with an explicit equation early in §3 to avoid ambiguity when describing the rescaling step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which help clarify the presentation of our work on spectral over-accumulation in model merging. We respond to each major comment below, indicating the revisions we intend to make to address the concerns.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (SVC formulation): The central assumption that subspace overlap can be quantified solely from the singular vectors of raw task updates and that rescaling only the singular values (leaving vectors fixed) restores balance without attenuating partially aligned private components lacks a derivation or bound. Because updates are linear combinations of shared and private features, down-weighting shared singular values risks leakage into the orthogonal complement; no explicit isolation argument or distortion analysis is provided for this data-free setting.

    Authors: We agree that §3.2 would benefit from a more explicit isolation argument and distortion bound. The current formulation is motivated by the empirical accumulation of aligned singular vectors under linear merging, but lacks a formal guarantee that rescaling affects only shared components. In the revision we will add a short derivation showing that the rescaling factor for each singular value is bounded by the average pairwise cosine similarity of the corresponding singular vectors across tasks; this yields an upper bound on leakage into the orthogonal complement that depends only on the observed overlap and vanishes when private directions are orthogonal. The added analysis remains entirely data-free and operates on the singular vectors of the raw updates. revision: yes

  2. Referee: [§4] §4 (experimental validation): The reported 13% lift on Task Arithmetic and consistent gains are promising, but the evaluation does not include controls or ablations that test whether the overlap metric isolates shared directions without systematic attenuation of task-unique information. This is load-bearing for the claim that SVC improves individual task performance rather than trading one for another.

    Authors: We acknowledge that the existing experiments do not contain explicit ablations isolating the effect on task-unique directions. While the reported gains on both merged-model and per-task metrics suggest that unique information is preserved, direct controls are needed. In the revised manuscript we will add two ablations: (1) cosine-similarity measurements between the original task vectors and their SVC-adjusted counterparts restricted to the subspace orthogonal to the top shared singular vectors, and (2) per-task accuracy tables before and after SVC on the individual fine-tuned models. These will be reported alongside the existing 13 % Task Arithmetic improvement to confirm that the overlap metric primarily modulates shared directions. revision: yes

Circularity Check

0 steps flagged

No significant circularity in spectral over-accumulation derivation or SVC post-processing

full rationale

The paper derives the over-accumulation effect directly from the linear algebra of adding task-update matrices whose singular vectors overlap, which is an independent observation from SVD properties rather than a fitted or self-defined quantity. SVC is presented as an explicit data-free heuristic that first computes an overlap metric on the singular vectors of the raw updates and then rescales the singular values; this procedure is defined in terms of the input matrices alone and does not reduce to the output performance by construction. No self-citation chains, uniqueness theorems, or ansatzes imported from prior author work are used to justify the central claims, and the method is not framed as a statistical prediction fitted to a subset of results. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on the abstract alone, the central claim rests on standard linear-algebra properties of singular vectors and the assumption that overlap can be measured and corrected from weight-update matrices without additional data or training.

axioms (1)
  • standard math Singular value decomposition accurately captures the principal directions of weight updates in neural network layers.
    Invoked when analyzing aligned spectral directions and subspace overlap.

pith-pipeline@v0.9.0 · 5709 in / 1182 out tokens · 58005 ms · 2026-05-22T11:48:19.045412+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

    cs.CL 2026-05 conditional novelty 7.0

    AutoTool uses reinforcement learning with dual-mode rewards to train multimodal LLMs to adaptively choose between tool-assisted and text-centric reasoning, yielding accuracy and efficiency gains on V* and POPE benchmarks.