Exploring Vision Neural Network Pruning via Screening Methodology

Mingyuan Wang; Sida Liu; Yangzi Guo; Yuhang Liu

arxiv: 2502.07189 · v2 · submitted 2025-02-11 · 💻 cs.LG · stat.ML

Exploring Vision Neural Network Pruning via Screening Methodology

Mingyuan Wang , Yangzi Guo , Sida Liu , Yuhang Liu This is my paper

Pith reviewed 2026-05-23 03:30 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords neural network pruningmodel compressionF-statistic screeningvision modelsunstructured pruningstructured pruningdeep learning efficiencyedge deployment

0 comments

The pith

A statistical screening method prunes neural networks by an order of magnitude while keeping accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a pruning framework for deep neural networks that removes non-essential parameters through statistical analysis of their significance across classification categories. It combines an F-statistic-based screening technique with a weighted evaluation scheme to handle both unstructured and structured pruning in a single approach. Experiments on fully connected and convolutional networks for vision tasks show that the resulting models require far less storage and computation yet match the accuracy of larger networks. A sympathetic reader would care because large models currently face high costs that limit their use on devices with tight memory and power budgets. The method aims to make high-performing vision models practical without needing separate techniques for different pruning styles.

Core claim

The proposed framework eliminates non-essential parameters through a statistical analysis of component significance across classification categories. Specifically, it employs an F-statistic-based screening technique combined with a weighted evaluation scheme to quantify the contributions of connections and channels, enabling both unstructured and structured pruning within a unified framework. Extensive experiments on real-world vision datasets demonstrate that the framework produces compact and efficient models that reduce storage and computation requirements by an order of magnitude while preserving model accuracy and remaining competitive with state-of-the-art approaches.

What carries the argument

F-statistic-based screening technique combined with a weighted evaluation scheme that quantifies contributions of connections and channels across categories.

If this is right

Both connection-level and channel-level pruning become available inside one procedure rather than requiring separate tools.
Storage and compute needs drop by roughly a factor of ten on the tested vision models with accuracy held steady.
The same statistical screening step applies to both fully connected networks and convolutional networks.
Compact models produced this way remain competitive with existing pruning methods on standard vision benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the screening step generalizes, the same procedure could be tried on non-vision tasks such as language models without redesigning the significance test.
The approach might allow training an oversized network first and then pruning it down, rather than training the compact version from scratch.
Energy use during inference on edge hardware could fall in proportion to the reported compute reduction, though this remains unmeasured in the work.

Load-bearing premise

The F-statistic screening plus weighted evaluation reliably flags non-essential parameters across different network architectures and datasets without extra tuning that would change the reported accuracy results.

What would settle it

Run the pruning procedure on a held-out vision dataset or architecture not used in the paper and measure whether accuracy drops more than a few percent relative to the original model or to other pruning baselines.

Figures

Figures reproduced from arXiv: 2502.07189 by Mingyuan Wang, Sida Liu, Yangzi Guo, Yuhang Liu.

**Figure 2.** Figure 2: The mask of the first fully connected layer of pruned Lenet-300-100. [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: A plot depicting the number of original and remaining channels after [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: A histogram showing the distribution of non-zero weight values after [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

read the original abstract

The remarkable performance of modern deep neural networks (DNNs) is largely driven by their massive scale, often comprising tens to hundreds of millions-or even billions-of parameters. However, such a scale incurs substantial storage and computational costs, hindering deployment on platforms such as edge devices that require energy-efficient and real-time processing. In this paper, we propose a network pruning framework that reduces both storage and computation requirements by an order of magnitude while preserving model accuracy. Our approach eliminates non-essential parameters through a statistical analysis of component significance across classification categories. Specifically, we employ a F-statistic-based screening technique combined with a weighted evaluation scheme to quantify the contributions of connections and channels, enabling both unstructured and structured pruning within a unified framework. Extensive experiments on real-world vision datasets, covering both fully connected neural networks (FNNs) and convolutional neural networks (CNNs), demonstrate that the proposed framework produces compact and efficient models that are highly competitive with the state of art apporoaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Applies F-statistic screening to unify unstructured and structured pruning but the abstract gives no numbers or controls to back the accuracy-at-10x claim.

read the letter

The new piece is the use of an F-statistic to rank parameter significance across classes, then a weighted scheme that lets the same procedure handle both weight-level and channel-level pruning. That unification is the main technical step beyond standard magnitude or gradient pruning. The paper also states it tested the approach on both fully connected and convolutional nets on real vision data, which is a reasonable scope for a compression method. Those are the concrete moves it makes. The rest of the abstract is standard motivation about edge deployment. The soft spot is that none of the performance claims are accompanied by numbers, baselines, variance estimates, or even a list of the datasets. Without those, it is impossible to tell whether the reported order-of-magnitude savings come from the screening itself or from later threshold choices that were tuned on validation accuracy. The multiple-testing problem that arises when screening thousands of parameters is also unaddressed in the text we have. The central accuracy-preservation claim therefore rests on an unverified assumption that the statistic plus weighting works out of the box across architectures. This paper is aimed at the model-compression subgroup that already knows the usual pruning literature and is looking for statistical alternatives. A reader in that niche could extract the screening idea and try it, but the work does not yet supply enough detail to stand on its own. It is worth sending to referees so the experiments can be checked; the idea is simple enough that a solid experimental section would make it a useful incremental paper.

Referee Report

2 major / 2 minor

Summary. The paper proposes a network pruning framework for vision DNNs (FNNs and CNNs) that applies an F-statistic-based screening of component significance across classification categories, combined with a weighted evaluation scheme, to remove non-essential connections and channels. This unified approach for unstructured and structured pruning is claimed to reduce storage and computation by an order of magnitude while preserving accuracy and remaining competitive with SOTA methods on real-world datasets.

Significance. If the screening procedure can be shown to operate without architecture- or dataset-specific post-selection adjustments that are tuned to validation accuracy, the work would provide a statistically grounded, unified pruning method that could simplify compression for edge deployment; the absence of such verification currently limits the assessed impact.

major comments (2)

[Abstract] Abstract: the central claim that the framework 'reduces both storage and computation requirements by an order of magnitude while preserving model accuracy' and produces 'highly competitive' models is presented without any quantitative results, error bars, baseline comparisons, or ablation details, so the accuracy-preservation guarantee cannot be evaluated from the manuscript.
[Method] Method description (screening procedure): the F-statistic screening plus weighted evaluation is asserted to identify non-essential parameters in an architecture- and dataset-agnostic manner, yet no explicit equations, threshold-selection rules, or pseudocode demonstrate that the statistic and weights are computed without reference to validation accuracy or post-hoc scaling; this directly bears on whether the reported compression is a consequence of the method itself or of experiment-specific choices.

minor comments (2)

[Abstract] Abstract contains the typo 'state of art apporoaches' (should be 'state-of-the-art approaches').
[Method] The manuscript does not mention multiple-testing correction for the F-statistic screening across many components and classes, which is a standard statistical concern for the screening step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below and will revise the manuscript to strengthen the presentation of claims and the method description.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the framework 'reduces both storage and computation requirements by an order of magnitude while preserving model accuracy' and produces 'highly competitive' models is presented without any quantitative results, error bars, baseline comparisons, or ablation details, so the accuracy-preservation guarantee cannot be evaluated from the manuscript.

Authors: We agree that the abstract presents the claims at a high level without supporting numbers. In the revision we will incorporate specific quantitative results from the experiments (e.g., compression ratios achieved, accuracy retention on MNIST/CIFAR/ImageNet subsets, and direct comparisons to pruning baselines) along with brief mention of error bars where applicable. This will make the abstract self-contained for evaluating the central claims. revision: yes
Referee: [Method] Method description (screening procedure): the F-statistic screening plus weighted evaluation is asserted to identify non-essential parameters in an architecture- and dataset-agnostic manner, yet no explicit equations, threshold-selection rules, or pseudocode demonstrate that the statistic and weights are computed without reference to validation accuracy or post-hoc scaling; this directly bears on whether the reported compression is a consequence of the method itself or of experiment-specific choices.

Authors: The F-statistic is computed directly from the training activations and labels across classes, with thresholds derived from standard statistical significance levels (e.g., p-value cutoffs) rather than validation accuracy. We acknowledge that the current manuscript does not include the explicit equations, threshold rules, or pseudocode that would make this independence fully transparent. We will add the full mathematical formulation of the F-statistic screening, the weighted evaluation formula, the exact threshold selection procedure, and pseudocode in the revised Methods section to demonstrate that no validation-based tuning or post-hoc scaling is used. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical screening procedure is self-contained

full rationale

The paper describes an F-statistic-based screening technique combined with a weighted evaluation scheme to identify non-essential parameters for pruning. No equations, derivations, or self-citations are presented that reduce the claimed order-of-magnitude compression while preserving accuracy to a quantity defined by the method itself or fitted inputs renamed as predictions. The central claim rests on empirical application across FNNs and CNNs on vision datasets, with competitiveness shown via experiments rather than by construction. This matches the absence of any load-bearing self-referential steps in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The method relies on standard statistical F-testing whose assumptions (normality, independence) are not discussed.

pith-pipeline@v0.9.0 · 5701 in / 991 out tokens · 22842 ms · 2026-05-23T03:30:54.424875+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

60 extracted references · 60 canonical work pages · 15 internal anchors

[1]

Arora, P., Jalali, S. M. J., Ahmadian, S., Panigrahi, B. K., Suganthan, P. N., and Khosravi, A. (2022). Probabilistic wind power forecasting using optimized deep auto-regressive recurrent neural networks. IEEE Transactions on Industrial Informatics , 19(3):2814--2825

work page 2022
[2]

Barbu, A., Sun, L., Wang, M., and Guo, Y. (2021). A novel framework for online supervised learning with feature selection. In 2021 Joint Mathematics Meetings ( JMM ) . AMS

work page 2021
[3]

Baykal, C., Liebenwein, L., Gilitschenski, I., Feldman, D., and Rus, D. (2019). Sipping neural networks: Sensitivity-informed provable pruning of neural networks. ArXiv , abs/1910.05422

work page arXiv 2019
[4]

Dawer, G., Guo, Y., and Barbu, A. (2017). Generating compact tree ensembles via annealing. 2020 International Joint Conference on Neural Networks (IJCNN) , pages 1--8

work page 2017
[5]

Dawer, G., Guo, Y., Liu, S., and Barbu, A. (2020). Neural rule ensembles: Encoding sparse feature interactions into neural networks. In 2020 International Joint Conference on Neural Networks (IJCNN) , pages 1--8. IEEE

work page 2020
[6]

Devlin, J. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

Ding, X., Ding, G., Guo, Y., and Han, J. (2019). Centripetal sgd for pruning very deep convolutional networks with complicated structure. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 4938--4948

work page 2019
[8]

Dosovitskiy, A. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

work page internal anchor Pith review Pith/arXiv arXiv 2020
[9]

Guo, H., Tang, R., Ye, Y., Li, Z., and He, X. (2017). Deepfm: a factorization-machine based neural network for ctr prediction. arXiv preprint arXiv:1703.04247

work page internal anchor Pith review Pith/arXiv arXiv 2017
[10]

Guo, Y., She, Y., and Barbu, A. (2021a). Network pruning via annealing and direct sparsity control. In 2021 International Joint Conference on Neural Networks (IJCNN) , pages 1--8

work page 2021
[11]

N., and Barbu, A

Guo, Y., Wu, Y. N., and Barbu, A. (2021b). A study of local optima for learning feature interactions using neural networks. In 2021 International Joint Conference on Neural Networks (IJCNN) , pages 1--8

work page 2021
[12]

Guo, Y., Yao, A., and Chen, Y. (2016). Dynamic network surgery for efficient dnns. In Advances In Neural Information Processing Systems , pages 1379--1387

work page 2016
[13]

Haeffele, B. D. and Vidal, R. (2015). Global optimality in tensor factorization, deep learning, and beyond. arXiv preprint arXiv:1506.07540

work page internal anchor Pith review Pith/arXiv arXiv 2015
[14]

Han, S., Pool, J., Tran, J., and Dally, W. (2015). Learning both weights and connections for efficient neural network. Advances in neural information processing systems , 28

work page 2015
[15]

He, K., Gkioxari, G., Doll \'a r, P., and Girshick, R. (2017a). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision , pages 2961--2969

work page
[16]

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770--778

work page 2016
[17]

and Xiao, L

He, Y. and Xiao, L. (2023). Structured pruning for deep convolutional neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence

work page 2023
[18]

He, Y., Zhang, X., and Sun, J. (2017b). Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision , pages 1389--1397

work page
[19]

Hinton, G. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531

work page internal anchor Pith review Pith/arXiv arXiv 2015
[20]

Howard, A. G. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

M., Zhang, Z., and Suh, G

Hua, W., Zhou, Y., De Sa, C. M., Zhang, Z., and Suh, G. E. (2019). Channel gating neural networks. Advances in Neural Information Processing Systems , 32

work page 2019
[22]

Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4700--4708

work page 2017
[23]

Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., and Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 2704--2713

work page 2018
[24]

Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny images

work page 2009
[25]

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems , 25

work page 2012
[26]

M., and Farhadi, A

Kusupati, A., Ramanujan, V., Somani, R., Wortsman, M., Jain, P., Kakade, S. M., and Farhadi, A. (2020). Soft threshold weight reparameterization for learnable sparsity. In International Conference on Machine Learning

work page 2020
[27]

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE , 86(11):2278--2324

work page 1998
[28]

and Cortes, C

LeCun, Y. and Cortes, C. (2010). MNIST handwritten digit database

work page 2010
[29]

Lemaire, C., Achkar, A., and Jodoin, P.-M. (2019). Structured pruning of neural networks with budget-aware regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 9108--9116

work page 2019
[30]

\"O ., Loeff, N., and Pfister, T

Lim, B., Ar k, S. \"O ., Loeff, N., and Pfister, T. (2021). Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting , 37(4):1748--1764

work page 2021
[31]

Lin, J., Rao, Y., Lu, J., and Zhou, J. (2017). Runtime neural pruning. Advances in neural information processing systems , 30

work page 2017
[32]

U., Barba, L., Dmitriev, D., and Jaggi, M

Lin, T., Stich, S. U., Barba, L., Dmitriev, D., and Jaggi, M. (2020). Dynamic model pruning with feedback. ArXiv , abs/2006.07253

work page arXiv 2020
[33]

Liu, H., Simonyan, K., and Yang, Y. (2018). Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055

work page internal anchor Pith review Pith/arXiv arXiv 2018
[34]

Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., and Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision , pages 2736--2744

work page 2017
[35]

AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference

Luo, J.-H. and Wu, J. (2018). Autopruner: An end-to-end trainable filter pruning method for efficient deep model inference. ArXiv , abs/1805.08941

work page internal anchor Pith review Pith/arXiv arXiv 2018
[36]

and Wu, J

Luo, J.-H. and Wu, J. (2019). Neural network pruning with residual-connections and limited-data. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 1455--1464

work page 2019
[37]

Luo, J.-H., Wu, J., and Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE international conference on computer vision , pages 5058--5066

work page 2017
[38]

Mikolov, T. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 , 3781

work page internal anchor Pith review Pith/arXiv arXiv 2013
[39]

Nayman, N., Noy, A., Ridnik, T., Friedman, I., Jin, R., and Zelnik-Manor, L. (2019). Xnas: Neural architecture search with expert advice. ArXiv , abs/1906.08031

work page internal anchor Pith review Pith/arXiv arXiv 2019
[40]

Redmon, J. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition

work page 2016
[41]

Savarese, P. H. P., Silva, H., and Maire, M. (2019). Winning the lottery with continuous sparsification. ArXiv , abs/1912.04427

work page arXiv 2019
[42]

S., Reddy, P

Shakeela, S., Shankar, N. S., Reddy, P. M., Tulasi, T. K., and Koneru, M. M. (2021). Optimal ensemble learning based on distinctive feature selection by univariate anova-f statistics for ids. International Journal of Electronics and Telecommunications , pages 267--275

work page 2021
[43]

Simonyan, K. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

work page internal anchor Pith review Pith/arXiv arXiv 2014
[44]

Sun, Y., Zheng, L., Deng, W., and Wang, S. (2017). Svdnet for pedestrian retrieval. In Proceedings of the IEEE international conference on computer vision , pages 3800--3808

work page 2017
[45]

Sutskever, I. (2014). Sequence to sequence learning with neural networks. arXiv preprint arXiv:1409.3215

work page internal anchor Pith review Pith/arXiv arXiv 2014
[46]

E., and Hinton, G

Sutskever, I., Martens, J., Dahl, G. E., and Hinton, G. E. (2013). On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning

work page 2013
[47]

Tan, C. M. J. and Motani, M. (2020). D rop N et: Reducing neural network complexity via iterative pruning. In III, H. D. and Singh, A., editors, Proceedings of the 37th International Conference on Machine Learning , volume 119 of Proceedings of Machine Learning Research , pages 9356--9366. PMLR

work page 2020
[48]

Tian, Y., Krishnan, D., and Isola, P. (2019). Contrastive representation distillation. arXiv preprint arXiv:1910.10699

work page arXiv 2019
[49]

and Ameen, S

Vadera, S. and Ameen, S. (2022). Methods for pruning deep neural networks. IEEE Access , 10:63280--63300

work page 2022
[50]

R., Alizadeh, M., Farquhar, S., Lane, N

van Amersfoort, J. R., Alizadeh, M., Farquhar, S., Lane, N. D., and Gal, Y. (2020). Single shot structured pruning before training. ArXiv , abs/2007.00389

work page arXiv 2020
[51]

and Barbu, A

Wang, M. and Barbu, A. (2019). Are screening methods useful in feature selection? an empirical study. PloS one , 14(9):e0220842

work page 2019
[52]

and Barbu, A

Wang, M. and Barbu, A. (2022). Online feature screening for data streams with concept drift. IEEE Transactions on Knowledge and Data Engineering , 35(11):11693--11707

work page 2022
[53]

and Zhou, C

Wang, Y. and Zhou, C. (2021). Feature selection method based on chi-square test and minimum redundancy. In Emerging Trends in Intelligent and Interactive Systems and Applications: Proceedings of the 5th International Conference on Intelligent, Interactive Systems and Applications (IISA2020) , pages 171--178. Springer

work page 2021
[54]

Wu, J., Wang, Y., Wu, Z., Wang, Z., Veeraraghavan, A., and Lin, Y. (2018a). Deep k-means: Re-training and parameter sharing with harder cluster assignments for compressing deep convolutions. In International Conference on Machine Learning , pages 5363--5372. PMLR

work page
[55]

Wu, S., Li, G., Chen, F., and Shi, L. (2018b). Training and inference with integers in deep neural networks. arXiv preprint arXiv:1802.04680

work page internal anchor Pith review Pith/arXiv arXiv
[56]

xin Zhang, Y., Lin, M., Lin, C.-W., Chen, J., Huang, F., Wu, Y., Tian, Y., and Ji, R. (2021). Carrying out cnn channel pruning in a white box. IEEE Transactions on Neural Networks and Learning Systems , 34:7946--7955

work page 2021
[57]

Yang, Y., Liu, T., Wang, Y., Zhou, J., Gan, Q., Wei, Z., Zhang, Z., Huang, Z., and Wipf, D. (2021). Graph neural networks inspired by classical iterative algorithms. In International Conference on Machine Learning , pages 11773--11783. PMLR

work page 2021
[58]

Zhang, X., Zhou, X., Lin, M., and Sun, J. (2018). Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 6848--6856

work page 2018
[59]

To prune, or not to prune: exploring the efficacy of pruning for model compression

Zhu, M. and Gupta, S. (2017). To prune, or not to prune: exploring the efficacy of pruning for model compression. ArXiv , abs/1710.01878

work page internal anchor Pith review Pith/arXiv arXiv 2017
[60]

Zoph, B. (2016). Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578

work page internal anchor Pith review Pith/arXiv arXiv 2016

[1] [1]

Arora, P., Jalali, S. M. J., Ahmadian, S., Panigrahi, B. K., Suganthan, P. N., and Khosravi, A. (2022). Probabilistic wind power forecasting using optimized deep auto-regressive recurrent neural networks. IEEE Transactions on Industrial Informatics , 19(3):2814--2825

work page 2022

[2] [2]

Barbu, A., Sun, L., Wang, M., and Guo, Y. (2021). A novel framework for online supervised learning with feature selection. In 2021 Joint Mathematics Meetings ( JMM ) . AMS

work page 2021

[3] [3]

Baykal, C., Liebenwein, L., Gilitschenski, I., Feldman, D., and Rus, D. (2019). Sipping neural networks: Sensitivity-informed provable pruning of neural networks. ArXiv , abs/1910.05422

work page arXiv 2019

[4] [4]

Dawer, G., Guo, Y., and Barbu, A. (2017). Generating compact tree ensembles via annealing. 2020 International Joint Conference on Neural Networks (IJCNN) , pages 1--8

work page 2017

[5] [5]

Dawer, G., Guo, Y., Liu, S., and Barbu, A. (2020). Neural rule ensembles: Encoding sparse feature interactions into neural networks. In 2020 International Joint Conference on Neural Networks (IJCNN) , pages 1--8. IEEE

work page 2020

[6] [6]

Devlin, J. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

Ding, X., Ding, G., Guo, Y., and Han, J. (2019). Centripetal sgd for pruning very deep convolutional networks with complicated structure. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 4938--4948

work page 2019

[8] [8]

Dosovitskiy, A. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

work page internal anchor Pith review Pith/arXiv arXiv 2020

[9] [9]

Guo, H., Tang, R., Ye, Y., Li, Z., and He, X. (2017). Deepfm: a factorization-machine based neural network for ctr prediction. arXiv preprint arXiv:1703.04247

work page internal anchor Pith review Pith/arXiv arXiv 2017

[10] [10]

Guo, Y., She, Y., and Barbu, A. (2021a). Network pruning via annealing and direct sparsity control. In 2021 International Joint Conference on Neural Networks (IJCNN) , pages 1--8

work page 2021

[11] [11]

N., and Barbu, A

Guo, Y., Wu, Y. N., and Barbu, A. (2021b). A study of local optima for learning feature interactions using neural networks. In 2021 International Joint Conference on Neural Networks (IJCNN) , pages 1--8

work page 2021

[12] [12]

Guo, Y., Yao, A., and Chen, Y. (2016). Dynamic network surgery for efficient dnns. In Advances In Neural Information Processing Systems , pages 1379--1387

work page 2016

[13] [13]

Haeffele, B. D. and Vidal, R. (2015). Global optimality in tensor factorization, deep learning, and beyond. arXiv preprint arXiv:1506.07540

work page internal anchor Pith review Pith/arXiv arXiv 2015

[14] [14]

Han, S., Pool, J., Tran, J., and Dally, W. (2015). Learning both weights and connections for efficient neural network. Advances in neural information processing systems , 28

work page 2015

[15] [15]

He, K., Gkioxari, G., Doll \'a r, P., and Girshick, R. (2017a). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision , pages 2961--2969

work page

[16] [16]

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 770--778

work page 2016

[17] [17]

and Xiao, L

He, Y. and Xiao, L. (2023). Structured pruning for deep convolutional neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence

work page 2023

[18] [18]

He, Y., Zhang, X., and Sun, J. (2017b). Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision , pages 1389--1397

work page

[19] [19]

Hinton, G. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531

work page internal anchor Pith review Pith/arXiv arXiv 2015

[20] [20]

Howard, A. G. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [21]

M., Zhang, Z., and Suh, G

Hua, W., Zhou, Y., De Sa, C. M., Zhang, Z., and Suh, G. E. (2019). Channel gating neural networks. Advances in Neural Information Processing Systems , 32

work page 2019

[22] [22]

Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 4700--4708

work page 2017

[23] [23]

Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., and Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 2704--2713

work page 2018

[24] [24]

Krizhevsky, A., Hinton, G., et al. (2009). Learning multiple layers of features from tiny images

work page 2009

[25] [25]

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems , 25

work page 2012

[26] [26]

M., and Farhadi, A

Kusupati, A., Ramanujan, V., Somani, R., Wortsman, M., Jain, P., Kakade, S. M., and Farhadi, A. (2020). Soft threshold weight reparameterization for learnable sparsity. In International Conference on Machine Learning

work page 2020

[27] [27]

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE , 86(11):2278--2324

work page 1998

[28] [28]

and Cortes, C

LeCun, Y. and Cortes, C. (2010). MNIST handwritten digit database

work page 2010

[29] [29]

Lemaire, C., Achkar, A., and Jodoin, P.-M. (2019). Structured pruning of neural networks with budget-aware regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 9108--9116

work page 2019

[30] [30]

\"O ., Loeff, N., and Pfister, T

Lim, B., Ar k, S. \"O ., Loeff, N., and Pfister, T. (2021). Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting , 37(4):1748--1764

work page 2021

[31] [31]

Lin, J., Rao, Y., Lu, J., and Zhou, J. (2017). Runtime neural pruning. Advances in neural information processing systems , 30

work page 2017

[32] [32]

U., Barba, L., Dmitriev, D., and Jaggi, M

Lin, T., Stich, S. U., Barba, L., Dmitriev, D., and Jaggi, M. (2020). Dynamic model pruning with feedback. ArXiv , abs/2006.07253

work page arXiv 2020

[33] [33]

Liu, H., Simonyan, K., and Yang, Y. (2018). Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055

work page internal anchor Pith review Pith/arXiv arXiv 2018

[34] [34]

Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., and Zhang, C. (2017). Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision , pages 2736--2744

work page 2017

[35] [35]

AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference

Luo, J.-H. and Wu, J. (2018). Autopruner: An end-to-end trainable filter pruning method for efficient deep model inference. ArXiv , abs/1805.08941

work page internal anchor Pith review Pith/arXiv arXiv 2018

[36] [36]

and Wu, J

Luo, J.-H. and Wu, J. (2019). Neural network pruning with residual-connections and limited-data. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages 1455--1464

work page 2019

[37] [37]

Luo, J.-H., Wu, J., and Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE international conference on computer vision , pages 5058--5066

work page 2017

[38] [38]

Mikolov, T. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 , 3781

work page internal anchor Pith review Pith/arXiv arXiv 2013

[39] [39]

Nayman, N., Noy, A., Ridnik, T., Friedman, I., Jin, R., and Zelnik-Manor, L. (2019). Xnas: Neural architecture search with expert advice. ArXiv , abs/1906.08031

work page internal anchor Pith review Pith/arXiv arXiv 2019

[40] [40]

Redmon, J. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition

work page 2016

[41] [41]

Savarese, P. H. P., Silva, H., and Maire, M. (2019). Winning the lottery with continuous sparsification. ArXiv , abs/1912.04427

work page arXiv 2019

[42] [42]

S., Reddy, P

Shakeela, S., Shankar, N. S., Reddy, P. M., Tulasi, T. K., and Koneru, M. M. (2021). Optimal ensemble learning based on distinctive feature selection by univariate anova-f statistics for ids. International Journal of Electronics and Telecommunications , pages 267--275

work page 2021

[43] [43]

Simonyan, K. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

work page internal anchor Pith review Pith/arXiv arXiv 2014

[44] [44]

Sun, Y., Zheng, L., Deng, W., and Wang, S. (2017). Svdnet for pedestrian retrieval. In Proceedings of the IEEE international conference on computer vision , pages 3800--3808

work page 2017

[45] [45]

Sutskever, I. (2014). Sequence to sequence learning with neural networks. arXiv preprint arXiv:1409.3215

work page internal anchor Pith review Pith/arXiv arXiv 2014

[46] [46]

E., and Hinton, G

Sutskever, I., Martens, J., Dahl, G. E., and Hinton, G. E. (2013). On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning

work page 2013

[47] [47]

Tan, C. M. J. and Motani, M. (2020). D rop N et: Reducing neural network complexity via iterative pruning. In III, H. D. and Singh, A., editors, Proceedings of the 37th International Conference on Machine Learning , volume 119 of Proceedings of Machine Learning Research , pages 9356--9366. PMLR

work page 2020

[48] [48]

Tian, Y., Krishnan, D., and Isola, P. (2019). Contrastive representation distillation. arXiv preprint arXiv:1910.10699

work page arXiv 2019

[49] [49]

and Ameen, S

Vadera, S. and Ameen, S. (2022). Methods for pruning deep neural networks. IEEE Access , 10:63280--63300

work page 2022

[50] [50]

R., Alizadeh, M., Farquhar, S., Lane, N

van Amersfoort, J. R., Alizadeh, M., Farquhar, S., Lane, N. D., and Gal, Y. (2020). Single shot structured pruning before training. ArXiv , abs/2007.00389

work page arXiv 2020

[51] [51]

and Barbu, A

Wang, M. and Barbu, A. (2019). Are screening methods useful in feature selection? an empirical study. PloS one , 14(9):e0220842

work page 2019

[52] [52]

and Barbu, A

Wang, M. and Barbu, A. (2022). Online feature screening for data streams with concept drift. IEEE Transactions on Knowledge and Data Engineering , 35(11):11693--11707

work page 2022

[53] [53]

and Zhou, C

Wang, Y. and Zhou, C. (2021). Feature selection method based on chi-square test and minimum redundancy. In Emerging Trends in Intelligent and Interactive Systems and Applications: Proceedings of the 5th International Conference on Intelligent, Interactive Systems and Applications (IISA2020) , pages 171--178. Springer

work page 2021

[54] [54]

Wu, J., Wang, Y., Wu, Z., Wang, Z., Veeraraghavan, A., and Lin, Y. (2018a). Deep k-means: Re-training and parameter sharing with harder cluster assignments for compressing deep convolutions. In International Conference on Machine Learning , pages 5363--5372. PMLR

work page

[55] [55]

Wu, S., Li, G., Chen, F., and Shi, L. (2018b). Training and inference with integers in deep neural networks. arXiv preprint arXiv:1802.04680

work page internal anchor Pith review Pith/arXiv arXiv

[56] [56]

xin Zhang, Y., Lin, M., Lin, C.-W., Chen, J., Huang, F., Wu, Y., Tian, Y., and Ji, R. (2021). Carrying out cnn channel pruning in a white box. IEEE Transactions on Neural Networks and Learning Systems , 34:7946--7955

work page 2021

[57] [57]

Yang, Y., Liu, T., Wang, Y., Zhou, J., Gan, Q., Wei, Z., Zhang, Z., Huang, Z., and Wipf, D. (2021). Graph neural networks inspired by classical iterative algorithms. In International Conference on Machine Learning , pages 11773--11783. PMLR

work page 2021

[58] [58]

Zhang, X., Zhou, X., Lin, M., and Sun, J. (2018). Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 6848--6856

work page 2018

[59] [59]

To prune, or not to prune: exploring the efficacy of pruning for model compression

Zhu, M. and Gupta, S. (2017). To prune, or not to prune: exploring the efficacy of pruning for model compression. ArXiv , abs/1710.01878

work page internal anchor Pith review Pith/arXiv arXiv 2017

[60] [60]

Zoph, B. (2016). Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578

work page internal anchor Pith review Pith/arXiv arXiv 2016