Pruning Convolutional Neural Networks for Resource Efficient Inference

Pavlo Molchanov , Stephen Tyree , Tero Karras , Timo Aila , Jan Kautz

Authors on Pith no claims yet

classification 💻 cs.LG stat.ML

keywords pruningefficientnetworksadaptedconvolutionalcriterioninferencelarge

read the original abstract

We propose a new formulation for pruning convolutional kernels in neural networks to enable efficient inference. We interleave greedy criteria-based pruning with fine-tuning by backpropagation - a computationally efficient procedure that maintains good generalization in the pruned network. We propose a new criterion based on Taylor expansion that approximates the change in the cost function induced by pruning network parameters. We focus on transfer learning, where large pretrained networks are adapted to specialized tasks. The proposed criterion demonstrates superior performance compared to other criteria, e.g. the norm of kernel weights or feature map activation, for pruning large CNNs after adaptation to fine-grained classification tasks (Birds-200 and Flowers-102) relaying only on the first order gradient information. We also show that pruning can lead to more than 10x theoretical (5x practical) reduction in adapted 3D-convolutional filters with a small drop in accuracy in a recurrent gesture classifier. Finally, we show results for the large-scale ImageNet dataset to emphasize the flexibility of our approach.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MNAFT: modality neuron-aware fine-tuning of multimodal large language models for image translation
cs.CL 2026-04 unverdicted novelty 7.0

MNAFT identifies language-agnostic and language-specific neurons via activation analysis and selectively fine-tunes only relevant ones in MLLMs to close the modality gap and outperform full fine-tuning and other metho...
ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing
cs.CL 2026-05 conditional novelty 6.0

ReST-KV formulates KV eviction as layer-wise output reconstruction optimization with spatial-temporal smoothing, outperforming baselines by 2.58% on LongBench and 15.2% on RULER while cutting decoding latency by 10.61...
Modality-Aware Zero-Shot Pruning and Sparse Attention for Efficient Multimodal Edge Inference
cs.LG 2026-04 unverdicted novelty 5.0

SentryFuse delivers modality-aware zero-shot pruning and sparse attention that improves accuracy by 12.7% on average and up to 18% under sensor dropout while cutting memory 28.2% and latency up to 1.63x across multimo...