pith. sign in

arxiv: 2411.03055 · v4 · pith:LL5OJNIZnew · submitted 2024-11-05 · 💻 cs.LG · cs.AI· cs.CV

ATM: Improving Model Merging by Alternating Tuning and Merging

classification 💻 cs.LG cs.AIcs.CV
keywords mergingmodelmultitaskeffectivenesslearningsteptasktuning
0
0 comments X
read the original abstract

Model merging has emerged as a cost-efficient approximation to multitask learning. Among merging strategies, task arithmetic is notable for its simplicity and effectiveness. In this work, we provide a theoretical motivation for task vectors by highlighting that, under single-epoch full-batch gradient descent, they are equivalent to multitask gradients. This insight leads us to reinterpret model merging as a single step in an iterative procedure that Alternates between Tuning and Merging (ATM). We propose two applications of ATM: (1) as an alternative to multitask learning in scenarios where data sharing is restricted (e.g., federated settings), and (2) as a lightweight refinement step to improve existing model merging methods using a small validation set. Experiments across diverse vision tasks demonstrate the effectiveness of ATM.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Model Merging: Foundations and Algorithms

    cs.LG 2026-05 unverdicted novelty 6.0

    New cycle-consistent optimization, task vector theory, singular vector decompositions, adaptive routing, and efficient evolutionary search provide foundations for merging neural network weights across tasks.