MaskPro learns categorical distributions over groups of M weights to generate exact (N:M) sparsity via N-way sampling without replacement and stabilizes training with a moving average tracker of loss residuals.
https:// arxiv.org/abs/2402.05406
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
Widthwise pruning of LVLM language backbones combined with supervised finetuning and hidden-state distillation recovers over 95% performance using just 5% of data across 3B-7B models.
NVILA improves on VILA with a scale-then-compress visual token strategy and full-lifecycle efficiency optimizations, matching or exceeding leading VLMs on image and video benchmarks while reducing training cost 1.9-5.1x and latencies 1.2-2.8x.
citing papers explorer
-
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs
MaskPro learns categorical distributions over groups of M weights to generate exact (N:M) sparsity via N-way sampling without replacement and stabilizes training with a moving average tracker of loss residuals.
-
Structural Pruning of Large Vision Language Models: A Comprehensive Study on Pruning Dynamics, Recovery, and Data Efficiency
Widthwise pruning of LVLM language backbones combined with supervised finetuning and hidden-state distillation recovers over 95% performance using just 5% of data across 3B-7B models.
-
NVILA: Efficient Frontier Visual Language Models
NVILA improves on VILA with a scale-then-compress visual token strategy and full-lifecycle efficiency optimizations, matching or exceeding leading VLMs on image and video benchmarks while reducing training cost 1.9-5.1x and latencies 1.2-2.8x.