To prune, or not to prune: exploring the efficacy of pruning for model compression

Michael Zhu, Suyog Gupta · 2017 · stat.ML · arXiv 1710.01878

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

open full Pith review browse 6 citing papers arXiv PDF

abstract

Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks at the cost of only a marginal loss in accuracy and achieve a sizable reduction in model size. This hints at the possibility that the baseline models in these experiments are perhaps severely over-parameterized at the outset and a viable alternative for model compression might be to simply reduce the number of hidden units while maintaining the model's dense connection structure, exposing a similar trade-off in model size and accuracy. We investigate these two distinct paths for model compression within the context of energy-efficient inference in resource-constrained environments and propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning and can be seamlessly incorporated within the training process. We compare the accuracy of large, but pruned models (large-sparse) and their smaller, but dense (small-dense) counterparts with identical memory footprint. Across a broad range of neural network architectures (deep CNNs, stacked LSTM, and seq2seq LSTM models), we find large-sparse models to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Prune, Update and Trim: Robust Structured Pruning for Large Language Models

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

Putri is a structured pruning technique for LLMs that compensates for pruning errors via weight updates and sequential processing while pruning at the attention-head level to reach state-of-the-art results at extreme sparsity.

Beyond Sunk Costs: Boosting LLM Pre-training Efficiency via Orthogonal Growth of Mixture-of-Experts

cs.LG · 2025-10-09 · unverdicted · novelty 5.0

Orthogonal growth recycles pre-trained MoE checkpoints via layer copying and noisy expert duplication, delivering 10.6% higher accuracy than training from scratch with equivalent extra compute.

Engineering Resource-constrained Software Systems with DNN Components: a Concept-based Pruning Approach

cs.SE · 2026-04-11 · unverdicted · novelty 5.0

A concept-based pruning method for DNNs guided by interpretable concepts and system requirements produces smaller, computationally efficient models that maintain effectiveness on image classification tasks.

Meta-Learning Guided Pruning for Few-Shot Plant Pathology on Edge Devices

cs.CV · 2026-01-05 · unverdicted · novelty 4.0

DACIS-guided PMP pipeline prunes plant pathology models by 78% while retaining 92.3% accuracy and achieving 7 FPS on Raspberry Pi 4 using few-shot meta-learning.

Exploring Vision Neural Network Pruning via Screening Methodology

cs.LG · 2025-02-11 · unverdicted · novelty 4.0

A unified F-statistic screening and weighted evaluation method prunes both unstructured and structured parameters in FNNs and CNNs, claiming order-of-magnitude size reduction with competitive accuracy on vision datasets.

A Survey on Foundation Models for Personalized Federated Intelligence

cs.AI · 2025-05-11 · unverdicted · novelty 3.0

The survey introduces personalized federated intelligence (PFI) as a framework integrating federated learning and foundation models to support privacy-aware personalization of AI models.

citing papers explorer

Showing 6 of 6 citing papers.

Prune, Update and Trim: Robust Structured Pruning for Large Language Models cs.LG · 2026-05-18 · unverdicted · none · ref 13 · internal anchor
Putri is a structured pruning technique for LLMs that compensates for pruning errors via weight updates and sequential processing while pruning at the attention-head level to reach state-of-the-art results at extreme sparsity.
Beyond Sunk Costs: Boosting LLM Pre-training Efficiency via Orthogonal Growth of Mixture-of-Experts cs.LG · 2025-10-09 · unverdicted · none · ref 28 · internal anchor
Orthogonal growth recycles pre-trained MoE checkpoints via layer copying and noisy expert duplication, delivering 10.6% higher accuracy than training from scratch with equivalent extra compute.
Engineering Resource-constrained Software Systems with DNN Components: a Concept-based Pruning Approach cs.SE · 2026-04-11 · unverdicted · none · ref 97
A concept-based pruning method for DNNs guided by interpretable concepts and system requirements produces smaller, computationally efficient models that maintain effectiveness on image classification tasks.
Meta-Learning Guided Pruning for Few-Shot Plant Pathology on Edge Devices cs.CV · 2026-01-05 · unverdicted · none · ref 7 · internal anchor
DACIS-guided PMP pipeline prunes plant pathology models by 78% while retaining 92.3% accuracy and achieving 7 FPS on Raspberry Pi 4 using few-shot meta-learning.
Exploring Vision Neural Network Pruning via Screening Methodology cs.LG · 2025-02-11 · unverdicted · none · ref 59 · internal anchor
A unified F-statistic screening and weighted evaluation method prunes both unstructured and structured parameters in FNNs and CNNs, claiming order-of-magnitude size reduction with competitive accuracy on vision datasets.
A Survey on Foundation Models for Personalized Federated Intelligence cs.AI · 2025-05-11 · unverdicted · none · ref 191 · internal anchor
The survey introduces personalized federated intelligence (PFI) as a framework integrating federated learning and foundation models to support privacy-aware personalization of AI models.

To prune, or not to prune: exploring the efficacy of pruning for model compression

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer