Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM

Jiayi Tian; Jing Liu; Nathan Susanj; Parsa Madinei; Rupak Swaminathan; Ryan Solgi; Zheng Zhang

arxiv: 2510.05544 · v2 · pith:WJZYQB3Gnew · submitted 2025-10-07 · 💻 cs.CL · cs.LG

Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM

Ryan Solgi , Parsa Madinei , Jiayi Tian , Rupak Swaminathan , Jing Liu , Nathan Susanj , Zheng Zhang This is my paper

classification 💻 cs.CL cs.LG

keywords compressionlow-rankpareto-guidedmodelspgsvdtheoreticalaccuracyachieved

0 comments

read the original abstract

Large language models (LLM) and vision-language models (VLM) have achieved state-of-the-art performance, but they impose significant memory and computing challenges in deployment. We present a novel low-rank compression framework to address this challenge. First, we upper bound the change of network loss via layer-wise activation-based compression errors, filling a theoretical gap in the literature. We then formulate low-rank model compression as a bi-objective optimization and prove that a single uniform tolerance yields surrogate Pareto-optimal heterogeneous ranks. Based on our theoretical insights, we propose Pareto-Guided Singular Value Decomposition (PGSVD), a zero-shot pipeline that improves activation-aware compression via Pareto-guided rank selection and alternating least-squares implementation. We apply PGSVD to both LLM and VLM, showing better accuracy at the same compression levels and inference speedup.

This paper has not been read by Pith yet.

Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM

discussion (0)