MaskPro learns categorical distributions over groups of M weights to generate exact (N:M) sparsity via N-way sampling without replacement and stabilizes training with a moving average tracker of loss residuals.
Prun- ing meets low-rank parameter-efficient fine-tuning
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2representative citing papers
RAP is a reinforcement learning framework for runtime-adaptive pruning of LLMs that jointly optimizes model weights and KV-cache usage under varying memory budgets.
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.
Position paper claiming that distributed training across massive edge devices can overcome data depletion and centralized compute monopolies in LLM scaling.
citing papers explorer
-
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs
MaskPro learns categorical distributions over groups of M weights to generate exact (N:M) sparsity via N-way sampling without replacement and stabilizes training with a moving average tracker of loss residuals.
-
RAP: Runtime Adaptive Pruning for LLM Inference
RAP is a reinforcement learning framework for runtime-adaptive pruning of LLMs that jointly optimizes model weights and KV-cache usage under varying memory budgets.
-
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
A comprehensive survey of PEFT algorithms for large models, covering their performance, overhead, applications, and real-world system implementations.
-
Will LLMs Scaling Hit the Wall? Breaking Barriers via Distributed Resources on Massive Edge Devices
Position paper claiming that distributed training across massive edge devices can overcome data depletion and centralized compute monopolies in LLM scaling.