ATM: Improving Model Merging by Alternating Tuning and Merging

Daniele Solombrino; Donato Crisostomi; Emanuele Rodol\`a; Fabrizio Silvestri; Luca Zhou; Maria Sofia Bucarelli

arxiv: 2411.03055 · v4 · pith:LL5OJNIZnew · submitted 2024-11-05 · 💻 cs.LG · cs.AI· cs.CV

ATM: Improving Model Merging by Alternating Tuning and Merging

Luca Zhou , Daniele Solombrino , Donato Crisostomi , Maria Sofia Bucarelli , Fabrizio Silvestri , Emanuele Rodol\`a This is my paper

classification 💻 cs.LG cs.AIcs.CV

keywords mergingmodelmultitaskeffectivenesslearningsteptasktuning

0 comments

read the original abstract

Model merging has emerged as a cost-efficient approximation to multitask learning. Among merging strategies, task arithmetic is notable for its simplicity and effectiveness. In this work, we provide a theoretical motivation for task vectors by highlighting that, under single-epoch full-batch gradient descent, they are equivalent to multitask gradients. This insight leads us to reinterpret model merging as a single step in an iterative procedure that Alternates between Tuning and Merging (ATM). We propose two applications of ATM: (1) as an alternative to multitask learning in scenarios where data sharing is restricted (e.g., federated settings), and (2) as a lightweight refinement step to improve existing model merging methods using a small validation set. Experiments across diverse vision tasks demonstrate the effectiveness of ATM.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Model Merging: Foundations and Algorithms
cs.LG 2026-05 unverdicted novelty 6.0

New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.