ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs

Guanhua Chen; Hongru Wang; Jianqiao Yu; Xuetao Wei; Yan Yang; Yixia Li; Yun Chen

arxiv: 2504.13237 · v1 · pith:5MSJ6EEUnew · submitted 2025-04-17 · 💻 cs.CL

ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs

Yan Yang , Yixia Li , Hongru Wang , Xuetao Wei , Jianqiao Yu , Yun Chen , Guanhua Chen This is my paper

classification 💻 cs.CL

keywords deltaimpartcompressionmethodsmodelsingulardelta-sparsificationeffectively

0 comments

read the original abstract

With the proliferation of task-specific large language models, delta compression has emerged as a method to mitigate the resource challenges of deploying numerous such models by effectively compressing the delta model parameters. Previous delta-sparsification methods either remove parameters randomly or truncate singular vectors directly after singular value decomposition (SVD). However, these methods either disregard parameter importance entirely or evaluate it with too coarse a granularity. In this work, we introduce ImPart, a novel importance-aware delta sparsification approach. Leveraging SVD, it dynamically adjusts sparsity ratios of different singular vectors based on their importance, effectively retaining crucial task-specific knowledge even at high sparsity ratios. Experiments show that ImPart achieves state-of-the-art delta sparsification performance, demonstrating $2\times$ higher compression ratio than baselines at the same performance level. When integrated with existing methods, ImPart sets a new state-of-the-art on delta quantization and model merging.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dynamic Model Merging Made Slim
cs.LG 2026-05 unverdicted novelty 6.0

DiDi-Merging achieves dynamic model merging performance matching or exceeding prior methods while using only 1.24x to 1.4x the parameters of a single fine-tuned model.