AutoSculpt: A Pattern-based Model Auto-pruning Framework Using Reinforcement Learning and Graph Learning

Jianpeng Qi; Junyu Dong; Lixian Jing; Yanwei Yu

arxiv: 2412.18091 · v3 · submitted 2024-12-24 · 💻 cs.AI

AutoSculpt: A Pattern-based Model Auto-pruning Framework Using Reinforcement Learning and Graph Learning

Lixian Jing , Jianpeng Qi , Junyu Dong , Yanwei Yu This is my paper

Pith reviewed 2026-05-23 06:40 UTC · model grok-4.3

classification 💻 cs.AI

keywords model pruningreinforcement learninggraph learningneural network compressionedge deploymentauto-pruningpattern-based sparsitydeep neural networks

0 comments

The pith

AutoSculpt automatically finds regular pruning patterns in neural networks via graph modeling and reinforcement learning so that standard inference engines can accelerate the resulting models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that represents deep neural networks as graphs to capture their structure and operator dependencies. It then embeds candidate pruning patterns and trains a reinforcement learning agent to choose which patterns to apply in order to maximize compression while preserving accuracy. The central goal is to produce pruned models whose sparsity takes the form of regular patterns that existing inference engines already know how to execute efficiently. If the approach works as described, it would let practitioners compress a wide range of architectures for edge deployment without needing custom accelerators or hand-crafted pruning rules.

Core claim

AutoSculpt constructs each DNN as a graph, embeds computationally efficient pruning patterns into that graph, and uses deep reinforcement learning to iteratively refine the pruning policy until the best trade-off between model size and accuracy is reached. On ResNet, MobileNet, VGG, and Vision Transformer the method produces pruning rates up to 90 percent and nearly 18 percent greater FLOPs reduction than prior auto-pruning baselines while remaining compatible with standard inference engines.

What carries the argument

Graph representation of network topology together with a deep reinforcement learning agent that selects and embeds regular, engine-recognizable pruning patterns.

If this is right

Pruning rates of up to 90 percent become achievable on convolutional and transformer architectures without manual pattern design.
FLOPs reduction improves by nearly 18 percent relative to existing auto-pruning baselines while accuracy remains comparable.
The resulting sparse models run faster on unmodified inference engines because the retained patterns match engine-supported structures.
The same pipeline applies across ResNet, MobileNet, VGG, and Vision Transformer without architecture-specific tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the discovered patterns transfer across hardware, the method could reduce reliance on device-specific pruning schedules.
Combining the graph-plus-RL loop with quantization might produce models that are both sparse and low-precision for further edge gains.
The same graph construction could be reused for other structured optimization tasks such as operator fusion or memory-layout search.

Load-bearing premise

The pruning patterns the method discovers are regular enough that existing inference engines will recognize and accelerate them at runtime without extra accuracy loss beyond what the reward already penalizes.

What would settle it

Measure wall-clock latency and accuracy of the pruned models on a standard inference engine; if the expected speedup fails to appear or accuracy drops exceed the reported levels on any of the tested architectures, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2412.18091 by Jianpeng Qi, Junyu Dong, Lixian Jing, Yanwei Yu.

**Figure 2.** Figure 2: An overview of our AutoSculpt framework. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: An Example of Graph Construction for CNN. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 6.** Figure 6: The effect of different DNN graph encoder. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: The efficiency of inference accuracy recovery. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

read the original abstract

As deep neural networks (DNNs) are increasingly deployed on edge devices, optimizing models for constrained computational resources is critical. Existing auto-pruning methods face challenges due to the diversity of DNN models, various operators (e.g., filters), and the difficulty in balancing pruning granularity with model accuracy. To address these limitations, we introduce AutoSculpt, a pattern-based automated pruning framework designed to enhance efficiency and accuracy by leveraging graph learning and deep reinforcement learning (DRL). AutoSculpt automatically identifies and prunes regular patterns within DNN architectures that can be recognized by existing inference engines, enabling runtime acceleration. Three key steps in AutoSculpt include: (1) Constructing DNNs as graphs to encode their topology and parameter dependencies, (2) embedding computationally efficient pruning patterns, and (3) utilizing DRL to iteratively refine auto-pruning strategies until the optimal balance between compression and accuracy is achieved. Experimental results demonstrate the effectiveness of AutoSculpt across various architectures, including ResNet, MobileNet, VGG, and Vision Transformer, achieving pruning rates of up to 90% and nearly 18% improvement in FLOPs reduction, outperforming all baselines. The codes can be available at https://github.com/jlx15588/AutoSculpt

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AutoSculpt's graph-plus-DRL pipeline for finding engine-recognizable pruning patterns is a practical extension of prior work, but the abstract's runtime-acceleration claim rests only on FLOPs numbers.

read the letter

The paper's core move is to turn pruning into three explicit steps: represent the network as a graph to capture topology, embed candidate regular patterns, and let DRL search for the combination that keeps accuracy while hitting high compression. That pipeline is the actual novelty; it is not a first-principles invention but a reasonable assembly of graph learning and reinforcement learning aimed at patterns that existing inference engines already know how to accelerate.

Referee Report

3 major / 1 minor

Summary. The paper proposes AutoSculpt, a framework that represents DNNs as graphs to capture topology and dependencies, embeds regular pruning patterns, and applies deep reinforcement learning to iteratively optimize pruning strategies. It evaluates the method on ResNet, MobileNet, VGG, and Vision Transformer architectures, reporting up to 90% pruning rates and nearly 18% better FLOPs reduction than baselines while claiming the resulting patterns enable runtime acceleration on existing inference engines. Code is released at a GitHub link.

Significance. If the runtime-acceleration claim holds, the work could meaningfully advance automated, pattern-aware pruning for edge deployment by combining graph learning with RL; the open code release is a clear strength that supports reproducibility and follow-on work.

major comments (3)

[Abstract] Abstract: the claim of 'nearly 18% improvement in FLOPs reduction' and 'outperforming all baselines' supplies no error bars, no explicit baseline names or hyper-parameter settings, and no ablation on the RL reward components; these omissions make the quantitative superiority unverifiable and load-bearing for the effectiveness claim.
[Abstract and §4] Abstract and §4 (Experimental Results): the assertion that identified patterns 'can be recognized by existing inference engines, enabling runtime acceleration' rests solely on pruning-rate and FLOPs metrics; no latency, throughput, or energy measurements on target engines (TensorRT, ONNX Runtime, etc.) are reported, which directly undermines the edge-deployment motivation.
[§3] §3 (Methodology): the state representation, action space, and reward function used by the DRL agent are described at a high level without equations or pseudocode; this prevents assessment of whether the 18% FLOPs gain is robust or an artifact of reward shaping, a load-bearing detail for the central auto-pruning claim.

minor comments (1)

[Abstract] Abstract: 'The codes can be available at' should read 'The code is available at'.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important areas for strengthening the presentation and validation of our claims. We address each major comment below and will incorporate revisions to improve verifiability and clarity.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'nearly 18% improvement in FLOPs reduction' and 'outperforming all baselines' supplies no error bars, no explicit baseline names or hyper-parameter settings, and no ablation on the RL reward components; these omissions make the quantitative superiority unverifiable and load-bearing for the effectiveness claim.

Authors: We agree that the abstract would benefit from greater specificity. The full manuscript identifies the baselines (standard methods including AMC, NetAdapt, and others) and reports hyper-parameters in Section 4, but error bars from repeated runs and a dedicated ablation on reward components are indeed absent. In revision we will add error bars, explicitly name the baselines and settings in the abstract, and include an ablation study on the RL reward terms in Section 4 to substantiate the reported gains. revision: yes
Referee: [Abstract and §4] Abstract and §4 (Experimental Results): the assertion that identified patterns 'can be recognized by existing inference engines, enabling runtime acceleration' rests solely on pruning-rate and FLOPs metrics; no latency, throughput, or energy measurements on target engines (TensorRT, ONNX Runtime, etc.) are reported, which directly undermines the edge-deployment motivation.

Authors: The referee correctly notes that the runtime-acceleration claim is supported only by the regularity of the discovered patterns and the resulting FLOPs reductions, without direct latency or throughput measurements on engines such as TensorRT or ONNX Runtime. This is a genuine gap relative to the edge-deployment motivation. In the revision we will either (a) add targeted latency measurements on representative hardware or (b) revise the wording in the abstract and Section 4 to describe the patterns as “compatible with existing engines, offering the potential for runtime acceleration” while explicitly acknowledging the absence of end-to-end timing results. revision: partial
Referee: [§3] §3 (Methodology): the state representation, action space, and reward function used by the DRL agent are described at a high level without equations or pseudocode; this prevents assessment of whether the 18% FLOPs gain is robust or an artifact of reward shaping, a load-bearing detail for the central auto-pruning claim.

Authors: We will expand Section 3 to include the precise mathematical formulations for the graph-based state representation, the discrete action space over pruning patterns, and the multi-term reward function (accuracy, FLOPs, and pattern regularity). Pseudocode for the overall DRL loop will also be added. These additions will allow readers to evaluate whether the reported improvements are robust to the chosen reward design. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces AutoSculpt as a framework that constructs DNN graphs, embeds pruning patterns, and applies DRL for strategy refinement, with results validated experimentally on standard architectures against baselines. No equations, fitted parameters, or self-citations appear in the provided text that would reduce any claimed prediction or result to the method's own inputs by construction. The approach relies on external RL and graph-learning techniques without self-definitional loops, uniqueness theorems from the same authors, or renaming of known results. The reported pruning rates and FLOPs improvements are presented as empirical outcomes rather than derivations that collapse to definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the method description does not introduce new physical or mathematical objects.

pith-pipeline@v0.9.0 · 5768 in / 1060 out tokens · 39552 ms · 2026-05-23T06:40:59.007760+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 2 internal anchors

[1]

Tailor, Luisa M Zintgraf, Joost van Amersfoort, Sebastian Farquhar, Nicholas Donald Lane, and Yarin Gal

Milad Alizadeh, Shyam A. Tailor, Luisa M Zintgraf, Joost van Amersfoort, Sebastian Farquhar, Nicholas Donald Lane, and Yarin Gal. Prospect pruning: Finding trainable weights at initialization using meta-gradients. InInternational Con- ference on Learning Representations, 2022. 6

work page 2022
[2]

How attentive are graph attention networks? InInternational Conference on Learning Representations, 2022

Shaked Brody, Uri Alon, and Eran Yahav. How attentive are graph attention networks? InInternational Conference on Learning Representations, 2022. 5

work page 2022
[3]

TVM: An automated end-to- end optimizing compiler for deep learning

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. TVM: An automated end-to- end optimizing compiler for deep learning. In13th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI), pages 578–594, 2018. 2

work page 2018
[4]

A survey on deep neural network pruning: Taxonomy, com- parison, analysis, and recommendations.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–20,

Hongrong Cheng, Miao Zhang, and Javen Qinfeng Shi. A survey on deep neural network pruning: Taxonomy, com- parison, analysis, and recommendations.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–20,

work page
[5]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255. IEEE, 2009. 6

work page 2009
[6]

Bayesian opti- mization with clustering and rollback for cnn auto pruning

Hanwei Fan, Jiandong Mu, and Wei Zhang. Bayesian opti- mization with clustering and rollback for cnn auto pruning. InEuropean Conference on Computer Vision, pages 494–

work page
[7]

DepGraph: Towards any structural prun- ing

Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang. DepGraph: Towards any structural prun- ing. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 16091– 16101, 2023. 1, 2, 6

work page 2023
[8]

Network pruning via performance maximization

Shangqian Gao, Feihu Huang, Weidong Cai, and Heng Huang. Network pruning via performance maximization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9270–9280,

work page
[9]

Multi- dimensional pruning: A unified framework for model com- pression

Jinyang Guo, Wanli Ouyang, and Dong Xu. Multi- dimensional pruning: A unified framework for model com- pression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1508–1517, 2020. 6

work page 2020
[10]

DTMM: De- ploying TinyML models on extremely weak IoT devices with pruning

Lixiang Han, Zhen Xiao, and Zhenjiang Li. DTMM: De- ploying TinyML models on extremely weak IoT devices with pruning. InIEEE International Conference on Computer Communications (INFOCOM). IEEE, 2024. 1, 2, 3

work page 2024
[11]

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Song Han, Huizi Mao, and William J Dally. Deep com- pression: Compressing deep neural networks with pruning, trained quantization and huffman coding.arXiv preprint arXiv:1510.00149, 2015. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2015
[12]

Structured pruning for deep con- volutional neural networks: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

Yang He and Lingao Xiao. Structured pruning for deep con- volutional neural networks: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 2

work page 2023
[13]

Soft filter pruning for accelerating deep convolutional neural networks

Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. Soft filter pruning for accelerating deep convolutional neural networks. InProceedings of the 27th International Joint Conference on Artificial Intelligence, page 2234–2240. AAAI Press, 2018. 2

work page 2018
[14]

AMC: Automl for model compression and ac- celeration on mobile devices

Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. AMC: Automl for model compression and ac- celeration on mobile devices. InProceedings of the Euro- pean Conference on Computer Vision (ECCV), pages 784– 800, 2018. 2, 3, 6

work page 2018
[15]

Filter pruning by switching to neighboring cnns with good attributes.IEEE Transactions on Neural Networks and Learning Systems, 34 (10):8044–8056, 2022

Yang He, Ping Liu, Linchao Zhu, and Yi Yang. Filter pruning by switching to neighboring cnns with good attributes.IEEE Transactions on Neural Networks and Learning Systems, 34 (10):8044–8056, 2022. 6

work page 2022
[16]

Filter pruning via feature discrimi- nation in deep neural networks

Zhiqiang He, Yaguan Qian, Yuqi Wang, Bin Wang, Xiaohui Guan, Zhaoquan Gu, Xiang Ling, Shaoning Zeng, Haijiang Wang, and Wujie Zhou. Filter pruning via feature discrimi- nation in deep neural networks. InEuropean Conference on Computer Vision, pages 245–261. Springer, 2022. 2, 6

work page 2022
[17]

Dis- trEdge: Speeding up convolutional neural network inference on distributed edge devices

Xueyu Hou, Yongjie Guan, Tao Han, and Ning Zhang. Dis- trEdge: Speeding up convolutional neural network inference on distributed edge devices. In2022 IEEE International Par- allel and Distributed Processing Symposium (IPDPS), pages 1097–1107. 1

work page
[18]

Soft masking for cost-constrained channel pruning

Ryan Humble, Maying Shen, Jorge Albericio Latorre, Eric Darve, and Jose Alvarez. Soft masking for cost-constrained channel pruning. InEuropean Conference on Computer Vi- sion, pages 641–657. Springer, 2022. 2, 6

work page 2022
[19]

Operation-aware soft chan- nel pruning using differentiable masks

Minsoo Kang and Bohyung Han. Operation-aware soft chan- nel pruning using differentiable masks. InProceedings of the 37th International Conference on Machine Learning, pages 5122–5131. PMLR, 2020. 6

work page 2020
[20]

Neuron merging: Compensating for pruned neu- rons.Advances in Neural Information Processing Systems, 33:585–595, 2020

Woojeong Kim, Suhyun Kim, Mincheol Park, and Geun- seok Jeon. Neuron merging: Compensating for pruned neu- rons.Advances in Neural Information Processing Systems, 33:585–595, 2020. 2, 6

work page 2020
[21]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. 6

work page 2009
[22]

Inducing and exploit- ing activation sparsity for fast inference on deep neural net- works

Mark Kurtz, Justin Kopinsky, Rati Gelashvili, Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Nir Shavit, and Dan Alistarh. Inducing and exploit- ing activation sparsity for fast inference on deep neural net- works. InProceedings of the 37th International Conference on Machine Learning, pages 5533–5543. PMLR, 2020. 2

work page 2020
[23]

Dy- namic dual gating neural networks

Fanrong Li, Gang Li, Xiangyu He, and Jian Cheng. Dy- namic dual gating neural networks. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5330–5339, 2021. 2, 6

work page 2021
[24]

Compressing convolutional neural net- works via factorized convolutional filters

Tuanhui Li, Baoyuan Wu, Yujiu Yang, Yanbo Fan, Yong Zhang, and Wei Liu. Compressing convolutional neural net- works via factorized convolutional filters. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3977–3986, 2019. 2

work page 2019
[25]

Towards compact cnns via collaborative compression

Yuchao Li, Shaohui Lin, Jianzhuang Liu, Qixiang Ye, Mengdi Wang, Fei Chao, Fan Yang, Jincheng Ma, Qi Tian, and Rongrong Ji. Towards compact cnns via collaborative compression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6438–6447, 2021. 6

work page 2021
[26]

Revisiting random channel 10 pruning for neural network compression

Yawei Li, Kamil Adamczewski, Wen Li, Shuhang Gu, Radu Timofte, and Luc Van Gool. Revisiting random channel 10 pruning for neural network compression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 191–201, 2022. 1

work page 2022
[27]

NPAS: A compiler-aware framework of unified network pruning and architecture search for beyond real- time mobile acceleration

Zhengang Li, Geng Yuan, Wei Niu, Pu Zhao, Yanyu Li, Yux- uan Cai, Xuan Shen, Zheng Zhan, Zhenglun Kong, Qing Jin, et al. NPAS: A compiler-aware framework of unified network pruning and architecture search for beyond real- time mobile acceleration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14255–14266...

work page 2021
[28]

Hrank: Filter pruning using high-rank feature map

Mingbao Lin, Rongrong Ji, Yan Wang, Yichen Zhang, Baochang Zhang, Yonghong Tian, and Ling Shao. Hrank: Filter pruning using high-rank feature map. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 1529–1538, 2020. 2, 6

work page 2020
[29]

Soks: Automatic searching of the optimal kernel shapes for stripe-wise net- work pruning.IEEE Transactions on Neural Networks and Learning Systems, 34(12):9912–9924, 2022

Guangzhe Liu, Ke Zhang, and Meibo Lv. Soks: Automatic searching of the optimal kernel shapes for stripe-wise net- work pruning.IEEE Transactions on Neural Networks and Learning Systems, 34(12):9912–9924, 2022. 2, 6

work page 2022
[30]

Group fisher pruning for practical network compression

Liyang Liu, Shilong Zhang, Zhanghui Kuang, Aojun Zhou, Jing-Hao Xue, Xinjiang Wang, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. Group fisher pruning for practical network compression. InInternational Conference on Machine Learning, pages 7021–7032. PMLR, 2021. 6

work page 2021
[31]

Group fisher pruning for practical network compression

Liyang Liu, Shilong Zhang, Zhanghui Kuang, Aojun Zhou, Jing-Hao Xue, Xinjiang Wang, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. Group fisher pruning for practical network compression. InInternational Conference on Machine Learning, pages 7021–7032. PMLR, 2021. 1

work page 2021
[32]

MetaPruning: Meta learning for automatic neural network channel pruning

Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Kwang-Ting Cheng, and Jian Sun. MetaPruning: Meta learning for automatic neural network channel pruning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3296–3305, 2019. 2

work page 2019
[33]

Joint multi-dimension pruning via numerical gradient update.IEEE Transactions on Image Processing, 30:8034–8045, 2021

Zechun Liu, Xiangyu Zhang, Zhiqiang Shen, Yichen Wei, Kwang-Ting Cheng, and Jian Sun. Joint multi-dimension pruning via numerical gradient update.IEEE Transactions on Image Processing, 30:8034–8045, 2021. 6

work page 2021
[34]

Christos Louizos, Max Welling, and Diederik P. Kingma. Learning sparse neural networks throughl 0 regularization. InInternational Conference on Learning Representations,

work page
[35]

Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices

Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, and Yanzhi Wang. Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5117– 5124, 2020. 2

work page 2020
[36]

Non-structured DNN weight pruning—Is it beneficial in any platform?IEEE Transactions on Neural Networks and Learning Systems, 33(9):4930–4944, 2021

Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, et al. Non-structured DNN weight pruning—Is it beneficial in any platform?IEEE Transactions on Neural Networks and Learning Systems, 33(9):4930–4944, 2021. 1

work page 2021
[37]

Patdnn: Achiev- ing real-time dnn execution on mobile devices with pattern- based weight pruning

Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. Patdnn: Achiev- ing real-time dnn execution on mobile devices with pattern- based weight pruning. InProceedings of the Twenty-Fifth International Conference on Architectural Support for Pro- gramming Languages and Operating Systems, pages 907– 922, 2020. 2

work page 2020
[38]

SOSP: Efficiently capturing global correlations by second-order structured pruning

Manuel Nonnenmacher, Thomas Pfeil, Ingo Steinwart, and David Reeb. SOSP: Efficiently capturing global correlations by second-order structured pruning. InInternational Confer- ence on Learning Representations, 2022. 6

work page 2022
[39]

Lookahead: A far-sighted alternative of magnitude-based pruning

Sejun Park*, Jaeho Lee*, Sangwoo Mo, and Jinwoo Shin. Lookahead: A far-sighted alternative of magnitude-based pruning. InInternational Conference on Learning Repre- sentations, 2020. 1, 2

work page 2020
[40]

Graph structure learning on user mobility data for social relationship infer- ence

Guangming Qin, Lexue Song, Yanwei Yu, Chao Huang, Wenzhe Jia, Yuan Cao, and Junyu Dong. Graph structure learning on user mobility data for social relationship infer- ence. InProceedings of the AAAI Conference on Artificial Intelligence, pages 4578–4586, 2023. 4

work page 2023
[41]

Movement pruning: Adaptive sparsity by fine-tuning.Advances in Neu- ral Information Processing Systems, 33:20378–20389, 2020

Victor Sanh, Thomas Wolf, and Alexander Rush. Movement pruning: Adaptive sparsity by fine-tuning.Advances in Neu- ral Information Processing Systems, 33:20378–20389, 2020. 1, 2

work page 2020
[42]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 5

work page internal anchor Pith review Pith/arXiv arXiv 2017
[43]

Demystifying TensorRT: Characterizing neural network inference engine on nvidia edge devices

Omais Shafi, Chinmay Rai, Rijurekha Sen, and Gayathri Ananthanarayanan. Demystifying TensorRT: Characterizing neural network inference engine on nvidia edge devices. In 2021 IEEE International Symposium on Workload Charac- terization (IISWC), pages 226–237, 2021. 2

work page 2021
[44]

CP-ViT: Cascade vision trans- former pruning via progressive sparsity prediction.arXiv preprint arXiv:2203.04570, 2022

Zhuoran Song, Yihong Xu, Zhezhi He, Li Jiang, Naifeng Jing, and Xiaoyao Liang. CP-ViT: Cascade vision trans- former pruning via progressive sparsity prediction.arXiv preprint arXiv:2203.04570, 2022. 6

work page arXiv 2022
[45]

Chip: Channel independence- based pruning for compact neural networks.Advances in Neural Information Processing Systems, 34:24604–24616,

Yang Sui, Miao Yin, Yi Xie, Huy Phan, Saman Aliari Zonouz, and Bo Yuan. Chip: Channel independence- based pruning for compact neural networks.Advances in Neural Information Processing Systems, 34:24604–24616,

work page
[46]

Adding before pruning: Sparse filter fusion for deep convolutional neural networks via auxiliary attention.IEEE Transactions on Neural Networks and Learning Systems,

Guanzhong Tian, Yiran Sun, Yuang Liu, Xianfang Zeng, Mengmeng Wang, Yong Liu, Jiangning Zhang, and Jun Chen. Adding before pruning: Sparse filter fusion for deep convolutional neural networks via auxiliary attention.IEEE Transactions on Neural Networks and Learning Systems,

work page
[47]

Attention is all you need.Advances in Neural Information Processing Systems, 2017

A Vaswani. Attention is all you need.Advances in Neural Information Processing Systems, 2017. 2

work page 2017
[48]

EigenDamage: Structured pruning in the kronecker- factored eigenbasis

Chaoqi Wang, Roger Grosse, Sanja Fidler, and Guodong Zhang. EigenDamage: Structured pruning in the kronecker- factored eigenbasis. InInternational Conference on Machine Learning, pages 6566–6575. PMLR, 2019. 2

work page 2019
[49]

Neural pruning via growing regularization

Huan Wang, Can Qin, Yulun Zhang, and Yun Fu. Neural pruning via growing regularization. InInternational Confer- ence on Learning Representations (ICLR), 2021. 2, 6

work page 2021
[50]

Learning structured sparsity in deep neural networks

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. InAdvances in Neural Information Processing Systems. Cur- ran Associates, Inc., 2016. 2

work page 2016
[51]

Auto graph encoder-decoder for neural network pruning

Sixing Yu, Arya Mazaheri, and Ali Jannesari. Auto graph encoder-decoder for neural network pruning. InProceedings 11 of the IEEE/CVF International Conference on Computer Vi- sion, pages 6362–6372, 2021. 3, 6

work page 2021
[52]

Topology- aware network pruning using multi-stage graph embedding and reinforcement learning

Sixing Yu, Arya Mazaheri, and Ali Jannesari. Topology- aware network pruning using multi-stage graph embedding and reinforcement learning. InInternational Conference on Machine Learning, pages 25656–25667. PMLR, 2022. 1, 2, 3, 6

work page 2022
[53]

LAPP: Layer adaptive progressive prun- ing for compressing CNNs from scratch.arXiv preprint arXiv:2309.14157, 2023

Pucheng Zhai, Kailing Guo, Fang Liu, Xiaofen Xing, and Xiangmin Xu. LAPP: Layer adaptive progressive prun- ing for compressing CNNs from scratch.arXiv preprint arXiv:2309.14157, 2023. 1

work page arXiv 2023
[54]

Model compression based on differentiable network channel pruning.IEEE Transactions on Neural Networks and Learn- ing Systems, 34(12):10203–10212, 2022

Yu-Jie Zheng, Si-Bao Chen, Chris HQ Ding, and Bin Luo. Model compression based on differentiable network channel pruning.IEEE Transactions on Neural Networks and Learn- ing Systems, 34(12):10203–10212, 2022. 2, 6

work page 2022
[55]

Model compression based on differentiable network channel pruning.IEEE Transactions on Neural Networks and Learn- ing Systems, 34(12):10203–10212, 2022

Yu-Jie Zheng, Si-Bao Chen, Chris HQ Ding, and Bin Luo. Model compression based on differentiable network channel pruning.IEEE Transactions on Neural Networks and Learn- ing Systems, 34(12):10203–10212, 2022. 1

work page 2022
[56]

Learn- ing N:M fine-grained structured sparse neural networks from scratch

Aojun Zhou, Yukun Ma, Junnan Zhu, Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, and Hongsheng Li. Learn- ing N:M fine-grained structured sparse neural networks from scratch. InInternational Conference on Learning Represen- tations, 2021. 1, 2

work page 2021
[57]

Ac- celerate cnn via recursive bayesian pruning

Yuefu Zhou, Ya Zhang, Yanfeng Wang, and Qi Tian. Ac- celerate cnn via recursive bayesian pruning. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 3306–3315, 2019. 2

work page 2019
[58]

Edge intelligence: Paving the last mile of arti- ficial intelligence with edge computing.Proceedings of the IEEE, 107:1738–1762, 2019

Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Jun- shan Zhang. Edge intelligence: Paving the last mile of arti- ficial intelligence with edge computing.Proceedings of the IEEE, 107:1738–1762, 2019. 1 12

work page 2019

[1] [1]

Tailor, Luisa M Zintgraf, Joost van Amersfoort, Sebastian Farquhar, Nicholas Donald Lane, and Yarin Gal

Milad Alizadeh, Shyam A. Tailor, Luisa M Zintgraf, Joost van Amersfoort, Sebastian Farquhar, Nicholas Donald Lane, and Yarin Gal. Prospect pruning: Finding trainable weights at initialization using meta-gradients. InInternational Con- ference on Learning Representations, 2022. 6

work page 2022

[2] [2]

How attentive are graph attention networks? InInternational Conference on Learning Representations, 2022

Shaked Brody, Uri Alon, and Eran Yahav. How attentive are graph attention networks? InInternational Conference on Learning Representations, 2022. 5

work page 2022

[3] [3]

TVM: An automated end-to- end optimizing compiler for deep learning

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. TVM: An automated end-to- end optimizing compiler for deep learning. In13th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI), pages 578–594, 2018. 2

work page 2018

[4] [4]

A survey on deep neural network pruning: Taxonomy, com- parison, analysis, and recommendations.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–20,

Hongrong Cheng, Miao Zhang, and Javen Qinfeng Shi. A survey on deep neural network pruning: Taxonomy, com- parison, analysis, and recommendations.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–20,

work page

[5] [5]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255. IEEE, 2009. 6

work page 2009

[6] [6]

Bayesian opti- mization with clustering and rollback for cnn auto pruning

Hanwei Fan, Jiandong Mu, and Wei Zhang. Bayesian opti- mization with clustering and rollback for cnn auto pruning. InEuropean Conference on Computer Vision, pages 494–

work page

[7] [7]

DepGraph: Towards any structural prun- ing

Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang. DepGraph: Towards any structural prun- ing. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 16091– 16101, 2023. 1, 2, 6

work page 2023

[8] [8]

Network pruning via performance maximization

Shangqian Gao, Feihu Huang, Weidong Cai, and Heng Huang. Network pruning via performance maximization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9270–9280,

work page

[9] [9]

Multi- dimensional pruning: A unified framework for model com- pression

Jinyang Guo, Wanli Ouyang, and Dong Xu. Multi- dimensional pruning: A unified framework for model com- pression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1508–1517, 2020. 6

work page 2020

[10] [10]

DTMM: De- ploying TinyML models on extremely weak IoT devices with pruning

Lixiang Han, Zhen Xiao, and Zhenjiang Li. DTMM: De- ploying TinyML models on extremely weak IoT devices with pruning. InIEEE International Conference on Computer Communications (INFOCOM). IEEE, 2024. 1, 2, 3

work page 2024

[11] [11]

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Song Han, Huizi Mao, and William J Dally. Deep com- pression: Compressing deep neural networks with pruning, trained quantization and huffman coding.arXiv preprint arXiv:1510.00149, 2015. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2015

[12] [12]

Structured pruning for deep con- volutional neural networks: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

Yang He and Lingao Xiao. Structured pruning for deep con- volutional neural networks: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 2

work page 2023

[13] [13]

Soft filter pruning for accelerating deep convolutional neural networks

Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. Soft filter pruning for accelerating deep convolutional neural networks. InProceedings of the 27th International Joint Conference on Artificial Intelligence, page 2234–2240. AAAI Press, 2018. 2

work page 2018

[14] [14]

AMC: Automl for model compression and ac- celeration on mobile devices

Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. AMC: Automl for model compression and ac- celeration on mobile devices. InProceedings of the Euro- pean Conference on Computer Vision (ECCV), pages 784– 800, 2018. 2, 3, 6

work page 2018

[15] [15]

Filter pruning by switching to neighboring cnns with good attributes.IEEE Transactions on Neural Networks and Learning Systems, 34 (10):8044–8056, 2022

Yang He, Ping Liu, Linchao Zhu, and Yi Yang. Filter pruning by switching to neighboring cnns with good attributes.IEEE Transactions on Neural Networks and Learning Systems, 34 (10):8044–8056, 2022. 6

work page 2022

[16] [16]

Filter pruning via feature discrimi- nation in deep neural networks

Zhiqiang He, Yaguan Qian, Yuqi Wang, Bin Wang, Xiaohui Guan, Zhaoquan Gu, Xiang Ling, Shaoning Zeng, Haijiang Wang, and Wujie Zhou. Filter pruning via feature discrimi- nation in deep neural networks. InEuropean Conference on Computer Vision, pages 245–261. Springer, 2022. 2, 6

work page 2022

[17] [17]

Dis- trEdge: Speeding up convolutional neural network inference on distributed edge devices

Xueyu Hou, Yongjie Guan, Tao Han, and Ning Zhang. Dis- trEdge: Speeding up convolutional neural network inference on distributed edge devices. In2022 IEEE International Par- allel and Distributed Processing Symposium (IPDPS), pages 1097–1107. 1

work page

[18] [18]

Soft masking for cost-constrained channel pruning

Ryan Humble, Maying Shen, Jorge Albericio Latorre, Eric Darve, and Jose Alvarez. Soft masking for cost-constrained channel pruning. InEuropean Conference on Computer Vi- sion, pages 641–657. Springer, 2022. 2, 6

work page 2022

[19] [19]

Operation-aware soft chan- nel pruning using differentiable masks

Minsoo Kang and Bohyung Han. Operation-aware soft chan- nel pruning using differentiable masks. InProceedings of the 37th International Conference on Machine Learning, pages 5122–5131. PMLR, 2020. 6

work page 2020

[20] [20]

Neuron merging: Compensating for pruned neu- rons.Advances in Neural Information Processing Systems, 33:585–595, 2020

Woojeong Kim, Suhyun Kim, Mincheol Park, and Geun- seok Jeon. Neuron merging: Compensating for pruned neu- rons.Advances in Neural Information Processing Systems, 33:585–595, 2020. 2, 6

work page 2020

[21] [21]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. 6

work page 2009

[22] [22]

Inducing and exploit- ing activation sparsity for fast inference on deep neural net- works

Mark Kurtz, Justin Kopinsky, Rati Gelashvili, Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Nir Shavit, and Dan Alistarh. Inducing and exploit- ing activation sparsity for fast inference on deep neural net- works. InProceedings of the 37th International Conference on Machine Learning, pages 5533–5543. PMLR, 2020. 2

work page 2020

[23] [23]

Dy- namic dual gating neural networks

Fanrong Li, Gang Li, Xiangyu He, and Jian Cheng. Dy- namic dual gating neural networks. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5330–5339, 2021. 2, 6

work page 2021

[24] [24]

Compressing convolutional neural net- works via factorized convolutional filters

Tuanhui Li, Baoyuan Wu, Yujiu Yang, Yanbo Fan, Yong Zhang, and Wei Liu. Compressing convolutional neural net- works via factorized convolutional filters. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3977–3986, 2019. 2

work page 2019

[25] [25]

Towards compact cnns via collaborative compression

Yuchao Li, Shaohui Lin, Jianzhuang Liu, Qixiang Ye, Mengdi Wang, Fei Chao, Fan Yang, Jincheng Ma, Qi Tian, and Rongrong Ji. Towards compact cnns via collaborative compression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6438–6447, 2021. 6

work page 2021

[26] [26]

Revisiting random channel 10 pruning for neural network compression

Yawei Li, Kamil Adamczewski, Wen Li, Shuhang Gu, Radu Timofte, and Luc Van Gool. Revisiting random channel 10 pruning for neural network compression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 191–201, 2022. 1

work page 2022

[27] [27]

NPAS: A compiler-aware framework of unified network pruning and architecture search for beyond real- time mobile acceleration

Zhengang Li, Geng Yuan, Wei Niu, Pu Zhao, Yanyu Li, Yux- uan Cai, Xuan Shen, Zheng Zhan, Zhenglun Kong, Qing Jin, et al. NPAS: A compiler-aware framework of unified network pruning and architecture search for beyond real- time mobile acceleration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14255–14266...

work page 2021

[28] [28]

Hrank: Filter pruning using high-rank feature map

Mingbao Lin, Rongrong Ji, Yan Wang, Yichen Zhang, Baochang Zhang, Yonghong Tian, and Ling Shao. Hrank: Filter pruning using high-rank feature map. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 1529–1538, 2020. 2, 6

work page 2020

[29] [29]

Soks: Automatic searching of the optimal kernel shapes for stripe-wise net- work pruning.IEEE Transactions on Neural Networks and Learning Systems, 34(12):9912–9924, 2022

Guangzhe Liu, Ke Zhang, and Meibo Lv. Soks: Automatic searching of the optimal kernel shapes for stripe-wise net- work pruning.IEEE Transactions on Neural Networks and Learning Systems, 34(12):9912–9924, 2022. 2, 6

work page 2022

[30] [30]

Group fisher pruning for practical network compression

Liyang Liu, Shilong Zhang, Zhanghui Kuang, Aojun Zhou, Jing-Hao Xue, Xinjiang Wang, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. Group fisher pruning for practical network compression. InInternational Conference on Machine Learning, pages 7021–7032. PMLR, 2021. 6

work page 2021

[31] [31]

Group fisher pruning for practical network compression

Liyang Liu, Shilong Zhang, Zhanghui Kuang, Aojun Zhou, Jing-Hao Xue, Xinjiang Wang, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. Group fisher pruning for practical network compression. InInternational Conference on Machine Learning, pages 7021–7032. PMLR, 2021. 1

work page 2021

[32] [32]

MetaPruning: Meta learning for automatic neural network channel pruning

Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Kwang-Ting Cheng, and Jian Sun. MetaPruning: Meta learning for automatic neural network channel pruning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3296–3305, 2019. 2

work page 2019

[33] [33]

Joint multi-dimension pruning via numerical gradient update.IEEE Transactions on Image Processing, 30:8034–8045, 2021

Zechun Liu, Xiangyu Zhang, Zhiqiang Shen, Yichen Wei, Kwang-Ting Cheng, and Jian Sun. Joint multi-dimension pruning via numerical gradient update.IEEE Transactions on Image Processing, 30:8034–8045, 2021. 6

work page 2021

[34] [34]

Christos Louizos, Max Welling, and Diederik P. Kingma. Learning sparse neural networks throughl 0 regularization. InInternational Conference on Learning Representations,

work page

[35] [35]

Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices

Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, and Yanzhi Wang. Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5117– 5124, 2020. 2

work page 2020

[36] [36]

Non-structured DNN weight pruning—Is it beneficial in any platform?IEEE Transactions on Neural Networks and Learning Systems, 33(9):4930–4944, 2021

Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, et al. Non-structured DNN weight pruning—Is it beneficial in any platform?IEEE Transactions on Neural Networks and Learning Systems, 33(9):4930–4944, 2021. 1

work page 2021

[37] [37]

Patdnn: Achiev- ing real-time dnn execution on mobile devices with pattern- based weight pruning

Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. Patdnn: Achiev- ing real-time dnn execution on mobile devices with pattern- based weight pruning. InProceedings of the Twenty-Fifth International Conference on Architectural Support for Pro- gramming Languages and Operating Systems, pages 907– 922, 2020. 2

work page 2020

[38] [38]

SOSP: Efficiently capturing global correlations by second-order structured pruning

Manuel Nonnenmacher, Thomas Pfeil, Ingo Steinwart, and David Reeb. SOSP: Efficiently capturing global correlations by second-order structured pruning. InInternational Confer- ence on Learning Representations, 2022. 6

work page 2022

[39] [39]

Lookahead: A far-sighted alternative of magnitude-based pruning

Sejun Park*, Jaeho Lee*, Sangwoo Mo, and Jinwoo Shin. Lookahead: A far-sighted alternative of magnitude-based pruning. InInternational Conference on Learning Repre- sentations, 2020. 1, 2

work page 2020

[40] [40]

Graph structure learning on user mobility data for social relationship infer- ence

Guangming Qin, Lexue Song, Yanwei Yu, Chao Huang, Wenzhe Jia, Yuan Cao, and Junyu Dong. Graph structure learning on user mobility data for social relationship infer- ence. InProceedings of the AAAI Conference on Artificial Intelligence, pages 4578–4586, 2023. 4

work page 2023

[41] [41]

Movement pruning: Adaptive sparsity by fine-tuning.Advances in Neu- ral Information Processing Systems, 33:20378–20389, 2020

Victor Sanh, Thomas Wolf, and Alexander Rush. Movement pruning: Adaptive sparsity by fine-tuning.Advances in Neu- ral Information Processing Systems, 33:20378–20389, 2020. 1, 2

work page 2020

[42] [42]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 5

work page internal anchor Pith review Pith/arXiv arXiv 2017

[43] [43]

Demystifying TensorRT: Characterizing neural network inference engine on nvidia edge devices

Omais Shafi, Chinmay Rai, Rijurekha Sen, and Gayathri Ananthanarayanan. Demystifying TensorRT: Characterizing neural network inference engine on nvidia edge devices. In 2021 IEEE International Symposium on Workload Charac- terization (IISWC), pages 226–237, 2021. 2

work page 2021

[44] [44]

CP-ViT: Cascade vision trans- former pruning via progressive sparsity prediction.arXiv preprint arXiv:2203.04570, 2022

Zhuoran Song, Yihong Xu, Zhezhi He, Li Jiang, Naifeng Jing, and Xiaoyao Liang. CP-ViT: Cascade vision trans- former pruning via progressive sparsity prediction.arXiv preprint arXiv:2203.04570, 2022. 6

work page arXiv 2022

[45] [45]

Chip: Channel independence- based pruning for compact neural networks.Advances in Neural Information Processing Systems, 34:24604–24616,

Yang Sui, Miao Yin, Yi Xie, Huy Phan, Saman Aliari Zonouz, and Bo Yuan. Chip: Channel independence- based pruning for compact neural networks.Advances in Neural Information Processing Systems, 34:24604–24616,

work page

[46] [46]

Adding before pruning: Sparse filter fusion for deep convolutional neural networks via auxiliary attention.IEEE Transactions on Neural Networks and Learning Systems,

Guanzhong Tian, Yiran Sun, Yuang Liu, Xianfang Zeng, Mengmeng Wang, Yong Liu, Jiangning Zhang, and Jun Chen. Adding before pruning: Sparse filter fusion for deep convolutional neural networks via auxiliary attention.IEEE Transactions on Neural Networks and Learning Systems,

work page

[47] [47]

Attention is all you need.Advances in Neural Information Processing Systems, 2017

A Vaswani. Attention is all you need.Advances in Neural Information Processing Systems, 2017. 2

work page 2017

[48] [48]

EigenDamage: Structured pruning in the kronecker- factored eigenbasis

Chaoqi Wang, Roger Grosse, Sanja Fidler, and Guodong Zhang. EigenDamage: Structured pruning in the kronecker- factored eigenbasis. InInternational Conference on Machine Learning, pages 6566–6575. PMLR, 2019. 2

work page 2019

[49] [49]

Neural pruning via growing regularization

Huan Wang, Can Qin, Yulun Zhang, and Yun Fu. Neural pruning via growing regularization. InInternational Confer- ence on Learning Representations (ICLR), 2021. 2, 6

work page 2021

[50] [50]

Learning structured sparsity in deep neural networks

Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. InAdvances in Neural Information Processing Systems. Cur- ran Associates, Inc., 2016. 2

work page 2016

[51] [51]

Auto graph encoder-decoder for neural network pruning

Sixing Yu, Arya Mazaheri, and Ali Jannesari. Auto graph encoder-decoder for neural network pruning. InProceedings 11 of the IEEE/CVF International Conference on Computer Vi- sion, pages 6362–6372, 2021. 3, 6

work page 2021

[52] [52]

Topology- aware network pruning using multi-stage graph embedding and reinforcement learning

Sixing Yu, Arya Mazaheri, and Ali Jannesari. Topology- aware network pruning using multi-stage graph embedding and reinforcement learning. InInternational Conference on Machine Learning, pages 25656–25667. PMLR, 2022. 1, 2, 3, 6

work page 2022

[53] [53]

LAPP: Layer adaptive progressive prun- ing for compressing CNNs from scratch.arXiv preprint arXiv:2309.14157, 2023

Pucheng Zhai, Kailing Guo, Fang Liu, Xiaofen Xing, and Xiangmin Xu. LAPP: Layer adaptive progressive prun- ing for compressing CNNs from scratch.arXiv preprint arXiv:2309.14157, 2023. 1

work page arXiv 2023

[54] [54]

Model compression based on differentiable network channel pruning.IEEE Transactions on Neural Networks and Learn- ing Systems, 34(12):10203–10212, 2022

Yu-Jie Zheng, Si-Bao Chen, Chris HQ Ding, and Bin Luo. Model compression based on differentiable network channel pruning.IEEE Transactions on Neural Networks and Learn- ing Systems, 34(12):10203–10212, 2022. 2, 6

work page 2022

[55] [55]

Model compression based on differentiable network channel pruning.IEEE Transactions on Neural Networks and Learn- ing Systems, 34(12):10203–10212, 2022

Yu-Jie Zheng, Si-Bao Chen, Chris HQ Ding, and Bin Luo. Model compression based on differentiable network channel pruning.IEEE Transactions on Neural Networks and Learn- ing Systems, 34(12):10203–10212, 2022. 1

work page 2022

[56] [56]

Learn- ing N:M fine-grained structured sparse neural networks from scratch

Aojun Zhou, Yukun Ma, Junnan Zhu, Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, and Hongsheng Li. Learn- ing N:M fine-grained structured sparse neural networks from scratch. InInternational Conference on Learning Represen- tations, 2021. 1, 2

work page 2021

[57] [57]

Ac- celerate cnn via recursive bayesian pruning

Yuefu Zhou, Ya Zhang, Yanfeng Wang, and Qi Tian. Ac- celerate cnn via recursive bayesian pruning. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 3306–3315, 2019. 2

work page 2019

[58] [58]

Edge intelligence: Paving the last mile of arti- ficial intelligence with edge computing.Proceedings of the IEEE, 107:1738–1762, 2019

Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Jun- shan Zhang. Edge intelligence: Paving the last mile of arti- ficial intelligence with edge computing.Proceedings of the IEEE, 107:1738–1762, 2019. 1 12

work page 2019