Pruning Filters for Efficient ConvNets

Hao Li , Asim Kadav , Igor Durdanovic , Hanan Samet , Hans Peter Graf

Authors on Pith no claims yet

classification 💻 cs.CV cs.LG

keywords pruningcostsaccuracycnnscomputationfilterslayersweights

read the original abstract

The success of CNNs in various applications is accompanied by a significant increase in the computation and parameter storage costs. Recent efforts toward reducing these overheads involve pruning and compressing the weights of various layers without hurting original accuracy. However, magnitude-based pruning of weights reduces a significant number of parameters from the fully connected layers and may not adequately reduce the computation costs in the convolutional layers due to irregular sparsity in the pruned networks. We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. We show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SparseForge: Efficient Semi-Structured LLM Sparsification via Annealing of Hessian-Guided Soft-Mask
cs.LG 2026-05 unverdicted novelty 6.0

SparseForge achieves 57.27% zero-shot accuracy on LLaMA-2-7B at 2:4 sparsity using only 5B retraining tokens, beating the dense baseline and nearly matching a 40B-token SOTA method.
Neural Network Pruning via QUBO Optimization
cs.CV 2026-04 unverdicted novelty 6.0

A hybrid QUBO pruning framework using Taylor/Fisher metrics and activation similarity outperforms greedy Taylor and L1-QUBO baselines on the SIDD denoising dataset, with further gains from Tensor-Train refinement.
Rethinking Layer Relevance in Large Language Models Beyond Cosine Similarity
cs.LG 2026-05 unverdicted novelty 5.0

Cosine similarity poorly predicts performance degradation from layer removal in LLMs, making direct accuracy-drop ablation a more reliable relevance metric.
Engineering Resource-constrained Software Systems with DNN Components: a Concept-based Pruning Approach
cs.SE 2026-04 unverdicted novelty 5.0

A concept-based pruning method for DNNs guided by interpretable concepts and system requirements produces smaller, computationally efficient models that maintain effectiveness on image classification tasks.
Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference
cs.LG 2026-04 unverdicted novelty 5.0

SentryFuse delivers modality-aware zero-shot pruning and sparse attention that improves accuracy by 12.7% on average and up to 18% under sensor dropout while cutting memory 28.2% and latency up to 1.63x across multimo...