Is Oracle Pruning the True Oracle?
read the original abstract
Oracle pruning, which selects unimportant weights by minimizing the pruned train loss, has served as the foundation for most neural network pruning methods for over thirty-five years, while few (if any) have thought about how much the foundation really holds. This paper, for the first time, attempts to systematically examine its validity on deep neural networks through empirical correlation analyses and provides meta-framework reflections on the field of neural network pruning. Specifically, this paper focuses on the pruning algorithms with three stages: training, pruning, and retraining. We analyze the correlation in model performance before and after the retraining stage. Extensive experiments (37K models are trained) across a wide spectrum of models (LeNet5, VGG, ResNets, ViT, MLLM) and datasets (MNIST, CIFAR10/CIFAR100, ImageNet-1K, MLLM data) are conducted. For large-scale experiments, we adopt approximate oracle pruning due to the prohibitive cost of exact oracle pruning. The results point to a counterintuitive conclusion: for deep learning models of nontrivial size (already at the scale of ResNet56 on CIFAR-10), pre-retraining performance is negligibly correlated with post-retraining performance. In other words, the weights identified by oracle pruning can scarcely guarantee strong performance following retraining. This further suggests that existing works that derive pruning criteria from oracle pruning may rest on a questionable foundational premise. Further studies suggest that rising task complexity is a primary factor behind the invalidity of oracle pruning nowadays. Finally, given the evidence, we argue that the retraining stage in a pruning algorithm should be accounted for when developing pruning criteria.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Cross-Resolution Diffusion Models via Network Pruning
CR-Diff applies block-wise pruning followed by output amplification to diffusion models, improving consistency and fidelity at unseen resolutions while retaining default-resolution performance.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.