pith. sign in

arxiv: 2412.18091 · v3 · submitted 2024-12-24 · 💻 cs.AI

AutoSculpt: A Pattern-based Model Auto-pruning Framework Using Reinforcement Learning and Graph Learning

Pith reviewed 2026-05-23 06:40 UTC · model grok-4.3

classification 💻 cs.AI
keywords model pruningreinforcement learninggraph learningneural network compressionedge deploymentauto-pruningpattern-based sparsitydeep neural networks
0
0 comments X

The pith

AutoSculpt automatically finds regular pruning patterns in neural networks via graph modeling and reinforcement learning so that standard inference engines can accelerate the resulting models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that represents deep neural networks as graphs to capture their structure and operator dependencies. It then embeds candidate pruning patterns and trains a reinforcement learning agent to choose which patterns to apply in order to maximize compression while preserving accuracy. The central goal is to produce pruned models whose sparsity takes the form of regular patterns that existing inference engines already know how to execute efficiently. If the approach works as described, it would let practitioners compress a wide range of architectures for edge deployment without needing custom accelerators or hand-crafted pruning rules.

Core claim

AutoSculpt constructs each DNN as a graph, embeds computationally efficient pruning patterns into that graph, and uses deep reinforcement learning to iteratively refine the pruning policy until the best trade-off between model size and accuracy is reached. On ResNet, MobileNet, VGG, and Vision Transformer the method produces pruning rates up to 90 percent and nearly 18 percent greater FLOPs reduction than prior auto-pruning baselines while remaining compatible with standard inference engines.

What carries the argument

Graph representation of network topology together with a deep reinforcement learning agent that selects and embeds regular, engine-recognizable pruning patterns.

If this is right

  • Pruning rates of up to 90 percent become achievable on convolutional and transformer architectures without manual pattern design.
  • FLOPs reduction improves by nearly 18 percent relative to existing auto-pruning baselines while accuracy remains comparable.
  • The resulting sparse models run faster on unmodified inference engines because the retained patterns match engine-supported structures.
  • The same pipeline applies across ResNet, MobileNet, VGG, and Vision Transformer without architecture-specific tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the discovered patterns transfer across hardware, the method could reduce reliance on device-specific pruning schedules.
  • Combining the graph-plus-RL loop with quantization might produce models that are both sparse and low-precision for further edge gains.
  • The same graph construction could be reused for other structured optimization tasks such as operator fusion or memory-layout search.

Load-bearing premise

The pruning patterns the method discovers are regular enough that existing inference engines will recognize and accelerate them at runtime without extra accuracy loss beyond what the reward already penalizes.

What would settle it

Measure wall-clock latency and accuracy of the pruned models on a standard inference engine; if the expected speedup fails to appear or accuracy drops exceed the reported levels on any of the tested architectures, the central claim is falsified.

Figures

Figures reproduced from arXiv: 2412.18091 by Jianpeng Qi, Junyu Dong, Lixian Jing, Yanwei Yu.

Figure 1
Figure 1. Figure 1: Examples of three different pruning granularity methods [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An overview of our AutoSculpt framework. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: An Example of Graph Construction for CNN. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 6
Figure 6. Figure 6: The effect of different DNN graph encoder. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The efficiency of inference accuracy recovery. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
read the original abstract

As deep neural networks (DNNs) are increasingly deployed on edge devices, optimizing models for constrained computational resources is critical. Existing auto-pruning methods face challenges due to the diversity of DNN models, various operators (e.g., filters), and the difficulty in balancing pruning granularity with model accuracy. To address these limitations, we introduce AutoSculpt, a pattern-based automated pruning framework designed to enhance efficiency and accuracy by leveraging graph learning and deep reinforcement learning (DRL). AutoSculpt automatically identifies and prunes regular patterns within DNN architectures that can be recognized by existing inference engines, enabling runtime acceleration. Three key steps in AutoSculpt include: (1) Constructing DNNs as graphs to encode their topology and parameter dependencies, (2) embedding computationally efficient pruning patterns, and (3) utilizing DRL to iteratively refine auto-pruning strategies until the optimal balance between compression and accuracy is achieved. Experimental results demonstrate the effectiveness of AutoSculpt across various architectures, including ResNet, MobileNet, VGG, and Vision Transformer, achieving pruning rates of up to 90% and nearly 18% improvement in FLOPs reduction, outperforming all baselines. The codes can be available at https://github.com/jlx15588/AutoSculpt

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes AutoSculpt, a framework that represents DNNs as graphs to capture topology and dependencies, embeds regular pruning patterns, and applies deep reinforcement learning to iteratively optimize pruning strategies. It evaluates the method on ResNet, MobileNet, VGG, and Vision Transformer architectures, reporting up to 90% pruning rates and nearly 18% better FLOPs reduction than baselines while claiming the resulting patterns enable runtime acceleration on existing inference engines. Code is released at a GitHub link.

Significance. If the runtime-acceleration claim holds, the work could meaningfully advance automated, pattern-aware pruning for edge deployment by combining graph learning with RL; the open code release is a clear strength that supports reproducibility and follow-on work.

major comments (3)
  1. [Abstract] Abstract: the claim of 'nearly 18% improvement in FLOPs reduction' and 'outperforming all baselines' supplies no error bars, no explicit baseline names or hyper-parameter settings, and no ablation on the RL reward components; these omissions make the quantitative superiority unverifiable and load-bearing for the effectiveness claim.
  2. [Abstract and §4] Abstract and §4 (Experimental Results): the assertion that identified patterns 'can be recognized by existing inference engines, enabling runtime acceleration' rests solely on pruning-rate and FLOPs metrics; no latency, throughput, or energy measurements on target engines (TensorRT, ONNX Runtime, etc.) are reported, which directly undermines the edge-deployment motivation.
  3. [§3] §3 (Methodology): the state representation, action space, and reward function used by the DRL agent are described at a high level without equations or pseudocode; this prevents assessment of whether the 18% FLOPs gain is robust or an artifact of reward shaping, a load-bearing detail for the central auto-pruning claim.
minor comments (1)
  1. [Abstract] Abstract: 'The codes can be available at' should read 'The code is available at'.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important areas for strengthening the presentation and validation of our claims. We address each major comment below and will incorporate revisions to improve verifiability and clarity.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of 'nearly 18% improvement in FLOPs reduction' and 'outperforming all baselines' supplies no error bars, no explicit baseline names or hyper-parameter settings, and no ablation on the RL reward components; these omissions make the quantitative superiority unverifiable and load-bearing for the effectiveness claim.

    Authors: We agree that the abstract would benefit from greater specificity. The full manuscript identifies the baselines (standard methods including AMC, NetAdapt, and others) and reports hyper-parameters in Section 4, but error bars from repeated runs and a dedicated ablation on reward components are indeed absent. In revision we will add error bars, explicitly name the baselines and settings in the abstract, and include an ablation study on the RL reward terms in Section 4 to substantiate the reported gains. revision: yes

  2. Referee: [Abstract and §4] Abstract and §4 (Experimental Results): the assertion that identified patterns 'can be recognized by existing inference engines, enabling runtime acceleration' rests solely on pruning-rate and FLOPs metrics; no latency, throughput, or energy measurements on target engines (TensorRT, ONNX Runtime, etc.) are reported, which directly undermines the edge-deployment motivation.

    Authors: The referee correctly notes that the runtime-acceleration claim is supported only by the regularity of the discovered patterns and the resulting FLOPs reductions, without direct latency or throughput measurements on engines such as TensorRT or ONNX Runtime. This is a genuine gap relative to the edge-deployment motivation. In the revision we will either (a) add targeted latency measurements on representative hardware or (b) revise the wording in the abstract and Section 4 to describe the patterns as “compatible with existing engines, offering the potential for runtime acceleration” while explicitly acknowledging the absence of end-to-end timing results. revision: partial

  3. Referee: [§3] §3 (Methodology): the state representation, action space, and reward function used by the DRL agent are described at a high level without equations or pseudocode; this prevents assessment of whether the 18% FLOPs gain is robust or an artifact of reward shaping, a load-bearing detail for the central auto-pruning claim.

    Authors: We will expand Section 3 to include the precise mathematical formulations for the graph-based state representation, the discrete action space over pruning patterns, and the multi-term reward function (accuracy, FLOPs, and pattern regularity). Pseudocode for the overall DRL loop will also be added. These additions will allow readers to evaluate whether the reported improvements are robust to the chosen reward design. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces AutoSculpt as a framework that constructs DNN graphs, embeds pruning patterns, and applies DRL for strategy refinement, with results validated experimentally on standard architectures against baselines. No equations, fitted parameters, or self-citations appear in the provided text that would reduce any claimed prediction or result to the method's own inputs by construction. The approach relies on external RL and graph-learning techniques without self-definitional loops, uniqueness theorems from the same authors, or renaming of known results. The reported pruning rates and FLOPs improvements are presented as empirical outcomes rather than derivations that collapse to definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the method description does not introduce new physical or mathematical objects.

pith-pipeline@v0.9.0 · 5768 in / 1060 out tokens · 39552 ms · 2026-05-23T06:40:59.007760+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 2 internal anchors

  1. [1]

    Tailor, Luisa M Zintgraf, Joost van Amersfoort, Sebastian Farquhar, Nicholas Donald Lane, and Yarin Gal

    Milad Alizadeh, Shyam A. Tailor, Luisa M Zintgraf, Joost van Amersfoort, Sebastian Farquhar, Nicholas Donald Lane, and Yarin Gal. Prospect pruning: Finding trainable weights at initialization using meta-gradients. InInternational Con- ference on Learning Representations, 2022. 6

  2. [2]

    How attentive are graph attention networks? InInternational Conference on Learning Representations, 2022

    Shaked Brody, Uri Alon, and Eran Yahav. How attentive are graph attention networks? InInternational Conference on Learning Representations, 2022. 5

  3. [3]

    TVM: An automated end-to- end optimizing compiler for deep learning

    Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. TVM: An automated end-to- end optimizing compiler for deep learning. In13th USENIX Symposium on Operating Systems Design and Implementa- tion (OSDI), pages 578–594, 2018. 2

  4. [4]

    A survey on deep neural network pruning: Taxonomy, com- parison, analysis, and recommendations.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–20,

    Hongrong Cheng, Miao Zhang, and Javen Qinfeng Shi. A survey on deep neural network pruning: Taxonomy, com- parison, analysis, and recommendations.IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–20,

  5. [5]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255. IEEE, 2009. 6

  6. [6]

    Bayesian opti- mization with clustering and rollback for cnn auto pruning

    Hanwei Fan, Jiandong Mu, and Wei Zhang. Bayesian opti- mization with clustering and rollback for cnn auto pruning. InEuropean Conference on Computer Vision, pages 494–

  7. [7]

    DepGraph: Towards any structural prun- ing

    Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, and Xinchao Wang. DepGraph: Towards any structural prun- ing. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 16091– 16101, 2023. 1, 2, 6

  8. [8]

    Network pruning via performance maximization

    Shangqian Gao, Feihu Huang, Weidong Cai, and Heng Huang. Network pruning via performance maximization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9270–9280,

  9. [9]

    Multi- dimensional pruning: A unified framework for model com- pression

    Jinyang Guo, Wanli Ouyang, and Dong Xu. Multi- dimensional pruning: A unified framework for model com- pression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1508–1517, 2020. 6

  10. [10]

    DTMM: De- ploying TinyML models on extremely weak IoT devices with pruning

    Lixiang Han, Zhen Xiao, and Zhenjiang Li. DTMM: De- ploying TinyML models on extremely weak IoT devices with pruning. InIEEE International Conference on Computer Communications (INFOCOM). IEEE, 2024. 1, 2, 3

  11. [11]

    Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

    Song Han, Huizi Mao, and William J Dally. Deep com- pression: Compressing deep neural networks with pruning, trained quantization and huffman coding.arXiv preprint arXiv:1510.00149, 2015. 1, 2

  12. [12]

    Structured pruning for deep con- volutional neural networks: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

    Yang He and Lingao Xiao. Structured pruning for deep con- volutional neural networks: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 2

  13. [13]

    Soft filter pruning for accelerating deep convolutional neural networks

    Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. Soft filter pruning for accelerating deep convolutional neural networks. InProceedings of the 27th International Joint Conference on Artificial Intelligence, page 2234–2240. AAAI Press, 2018. 2

  14. [14]

    AMC: Automl for model compression and ac- celeration on mobile devices

    Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. AMC: Automl for model compression and ac- celeration on mobile devices. InProceedings of the Euro- pean Conference on Computer Vision (ECCV), pages 784– 800, 2018. 2, 3, 6

  15. [15]

    Filter pruning by switching to neighboring cnns with good attributes.IEEE Transactions on Neural Networks and Learning Systems, 34 (10):8044–8056, 2022

    Yang He, Ping Liu, Linchao Zhu, and Yi Yang. Filter pruning by switching to neighboring cnns with good attributes.IEEE Transactions on Neural Networks and Learning Systems, 34 (10):8044–8056, 2022. 6

  16. [16]

    Filter pruning via feature discrimi- nation in deep neural networks

    Zhiqiang He, Yaguan Qian, Yuqi Wang, Bin Wang, Xiaohui Guan, Zhaoquan Gu, Xiang Ling, Shaoning Zeng, Haijiang Wang, and Wujie Zhou. Filter pruning via feature discrimi- nation in deep neural networks. InEuropean Conference on Computer Vision, pages 245–261. Springer, 2022. 2, 6

  17. [17]

    Dis- trEdge: Speeding up convolutional neural network inference on distributed edge devices

    Xueyu Hou, Yongjie Guan, Tao Han, and Ning Zhang. Dis- trEdge: Speeding up convolutional neural network inference on distributed edge devices. In2022 IEEE International Par- allel and Distributed Processing Symposium (IPDPS), pages 1097–1107. 1

  18. [18]

    Soft masking for cost-constrained channel pruning

    Ryan Humble, Maying Shen, Jorge Albericio Latorre, Eric Darve, and Jose Alvarez. Soft masking for cost-constrained channel pruning. InEuropean Conference on Computer Vi- sion, pages 641–657. Springer, 2022. 2, 6

  19. [19]

    Operation-aware soft chan- nel pruning using differentiable masks

    Minsoo Kang and Bohyung Han. Operation-aware soft chan- nel pruning using differentiable masks. InProceedings of the 37th International Conference on Machine Learning, pages 5122–5131. PMLR, 2020. 6

  20. [20]

    Neuron merging: Compensating for pruned neu- rons.Advances in Neural Information Processing Systems, 33:585–595, 2020

    Woojeong Kim, Suhyun Kim, Mincheol Park, and Geun- seok Jeon. Neuron merging: Compensating for pruned neu- rons.Advances in Neural Information Processing Systems, 33:585–595, 2020. 2, 6

  21. [21]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009. 6

  22. [22]

    Inducing and exploit- ing activation sparsity for fast inference on deep neural net- works

    Mark Kurtz, Justin Kopinsky, Rati Gelashvili, Alexander Matveev, John Carr, Michael Goin, William Leiserson, Sage Moore, Nir Shavit, and Dan Alistarh. Inducing and exploit- ing activation sparsity for fast inference on deep neural net- works. InProceedings of the 37th International Conference on Machine Learning, pages 5533–5543. PMLR, 2020. 2

  23. [23]

    Dy- namic dual gating neural networks

    Fanrong Li, Gang Li, Xiangyu He, and Jian Cheng. Dy- namic dual gating neural networks. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5330–5339, 2021. 2, 6

  24. [24]

    Compressing convolutional neural net- works via factorized convolutional filters

    Tuanhui Li, Baoyuan Wu, Yujiu Yang, Yanbo Fan, Yong Zhang, and Wei Liu. Compressing convolutional neural net- works via factorized convolutional filters. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3977–3986, 2019. 2

  25. [25]

    Towards compact cnns via collaborative compression

    Yuchao Li, Shaohui Lin, Jianzhuang Liu, Qixiang Ye, Mengdi Wang, Fei Chao, Fan Yang, Jincheng Ma, Qi Tian, and Rongrong Ji. Towards compact cnns via collaborative compression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6438–6447, 2021. 6

  26. [26]

    Revisiting random channel 10 pruning for neural network compression

    Yawei Li, Kamil Adamczewski, Wen Li, Shuhang Gu, Radu Timofte, and Luc Van Gool. Revisiting random channel 10 pruning for neural network compression. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 191–201, 2022. 1

  27. [27]

    NPAS: A compiler-aware framework of unified network pruning and architecture search for beyond real- time mobile acceleration

    Zhengang Li, Geng Yuan, Wei Niu, Pu Zhao, Yanyu Li, Yux- uan Cai, Xuan Shen, Zheng Zhan, Zhenglun Kong, Qing Jin, et al. NPAS: A compiler-aware framework of unified network pruning and architecture search for beyond real- time mobile acceleration. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14255–14266...

  28. [28]

    Hrank: Filter pruning using high-rank feature map

    Mingbao Lin, Rongrong Ji, Yan Wang, Yichen Zhang, Baochang Zhang, Yonghong Tian, and Ling Shao. Hrank: Filter pruning using high-rank feature map. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 1529–1538, 2020. 2, 6

  29. [29]

    Soks: Automatic searching of the optimal kernel shapes for stripe-wise net- work pruning.IEEE Transactions on Neural Networks and Learning Systems, 34(12):9912–9924, 2022

    Guangzhe Liu, Ke Zhang, and Meibo Lv. Soks: Automatic searching of the optimal kernel shapes for stripe-wise net- work pruning.IEEE Transactions on Neural Networks and Learning Systems, 34(12):9912–9924, 2022. 2, 6

  30. [30]

    Group fisher pruning for practical network compression

    Liyang Liu, Shilong Zhang, Zhanghui Kuang, Aojun Zhou, Jing-Hao Xue, Xinjiang Wang, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. Group fisher pruning for practical network compression. InInternational Conference on Machine Learning, pages 7021–7032. PMLR, 2021. 6

  31. [31]

    Group fisher pruning for practical network compression

    Liyang Liu, Shilong Zhang, Zhanghui Kuang, Aojun Zhou, Jing-Hao Xue, Xinjiang Wang, Yimin Chen, Wenming Yang, Qingmin Liao, and Wayne Zhang. Group fisher pruning for practical network compression. InInternational Conference on Machine Learning, pages 7021–7032. PMLR, 2021. 1

  32. [32]

    MetaPruning: Meta learning for automatic neural network channel pruning

    Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Kwang-Ting Cheng, and Jian Sun. MetaPruning: Meta learning for automatic neural network channel pruning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3296–3305, 2019. 2

  33. [33]

    Joint multi-dimension pruning via numerical gradient update.IEEE Transactions on Image Processing, 30:8034–8045, 2021

    Zechun Liu, Xiangyu Zhang, Zhiqiang Shen, Yichen Wei, Kwang-Ting Cheng, and Jian Sun. Joint multi-dimension pruning via numerical gradient update.IEEE Transactions on Image Processing, 30:8034–8045, 2021. 6

  34. [34]

    Christos Louizos, Max Welling, and Diederik P. Kingma. Learning sparse neural networks throughl 0 regularization. InInternational Conference on Learning Representations,

  35. [35]

    Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices

    Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, and Yanzhi Wang. Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5117– 5124, 2020. 2

  36. [36]

    Non-structured DNN weight pruning—Is it beneficial in any platform?IEEE Transactions on Neural Networks and Learning Systems, 33(9):4930–4944, 2021

    Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, et al. Non-structured DNN weight pruning—Is it beneficial in any platform?IEEE Transactions on Neural Networks and Learning Systems, 33(9):4930–4944, 2021. 1

  37. [37]

    Patdnn: Achiev- ing real-time dnn execution on mobile devices with pattern- based weight pruning

    Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. Patdnn: Achiev- ing real-time dnn execution on mobile devices with pattern- based weight pruning. InProceedings of the Twenty-Fifth International Conference on Architectural Support for Pro- gramming Languages and Operating Systems, pages 907– 922, 2020. 2

  38. [38]

    SOSP: Efficiently capturing global correlations by second-order structured pruning

    Manuel Nonnenmacher, Thomas Pfeil, Ingo Steinwart, and David Reeb. SOSP: Efficiently capturing global correlations by second-order structured pruning. InInternational Confer- ence on Learning Representations, 2022. 6

  39. [39]

    Lookahead: A far-sighted alternative of magnitude-based pruning

    Sejun Park*, Jaeho Lee*, Sangwoo Mo, and Jinwoo Shin. Lookahead: A far-sighted alternative of magnitude-based pruning. InInternational Conference on Learning Repre- sentations, 2020. 1, 2

  40. [40]

    Graph structure learning on user mobility data for social relationship infer- ence

    Guangming Qin, Lexue Song, Yanwei Yu, Chao Huang, Wenzhe Jia, Yuan Cao, and Junyu Dong. Graph structure learning on user mobility data for social relationship infer- ence. InProceedings of the AAAI Conference on Artificial Intelligence, pages 4578–4586, 2023. 4

  41. [41]

    Movement pruning: Adaptive sparsity by fine-tuning.Advances in Neu- ral Information Processing Systems, 33:20378–20389, 2020

    Victor Sanh, Thomas Wolf, and Alexander Rush. Movement pruning: Adaptive sparsity by fine-tuning.Advances in Neu- ral Information Processing Systems, 33:20378–20389, 2020. 1, 2

  42. [42]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 5

  43. [43]

    Demystifying TensorRT: Characterizing neural network inference engine on nvidia edge devices

    Omais Shafi, Chinmay Rai, Rijurekha Sen, and Gayathri Ananthanarayanan. Demystifying TensorRT: Characterizing neural network inference engine on nvidia edge devices. In 2021 IEEE International Symposium on Workload Charac- terization (IISWC), pages 226–237, 2021. 2

  44. [44]

    CP-ViT: Cascade vision trans- former pruning via progressive sparsity prediction.arXiv preprint arXiv:2203.04570, 2022

    Zhuoran Song, Yihong Xu, Zhezhi He, Li Jiang, Naifeng Jing, and Xiaoyao Liang. CP-ViT: Cascade vision trans- former pruning via progressive sparsity prediction.arXiv preprint arXiv:2203.04570, 2022. 6

  45. [45]

    Chip: Channel independence- based pruning for compact neural networks.Advances in Neural Information Processing Systems, 34:24604–24616,

    Yang Sui, Miao Yin, Yi Xie, Huy Phan, Saman Aliari Zonouz, and Bo Yuan. Chip: Channel independence- based pruning for compact neural networks.Advances in Neural Information Processing Systems, 34:24604–24616,

  46. [46]

    Adding before pruning: Sparse filter fusion for deep convolutional neural networks via auxiliary attention.IEEE Transactions on Neural Networks and Learning Systems,

    Guanzhong Tian, Yiran Sun, Yuang Liu, Xianfang Zeng, Mengmeng Wang, Yong Liu, Jiangning Zhang, and Jun Chen. Adding before pruning: Sparse filter fusion for deep convolutional neural networks via auxiliary attention.IEEE Transactions on Neural Networks and Learning Systems,

  47. [47]

    Attention is all you need.Advances in Neural Information Processing Systems, 2017

    A Vaswani. Attention is all you need.Advances in Neural Information Processing Systems, 2017. 2

  48. [48]

    EigenDamage: Structured pruning in the kronecker- factored eigenbasis

    Chaoqi Wang, Roger Grosse, Sanja Fidler, and Guodong Zhang. EigenDamage: Structured pruning in the kronecker- factored eigenbasis. InInternational Conference on Machine Learning, pages 6566–6575. PMLR, 2019. 2

  49. [49]

    Neural pruning via growing regularization

    Huan Wang, Can Qin, Yulun Zhang, and Yun Fu. Neural pruning via growing regularization. InInternational Confer- ence on Learning Representations (ICLR), 2021. 2, 6

  50. [50]

    Learning structured sparsity in deep neural networks

    Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. InAdvances in Neural Information Processing Systems. Cur- ran Associates, Inc., 2016. 2

  51. [51]

    Auto graph encoder-decoder for neural network pruning

    Sixing Yu, Arya Mazaheri, and Ali Jannesari. Auto graph encoder-decoder for neural network pruning. InProceedings 11 of the IEEE/CVF International Conference on Computer Vi- sion, pages 6362–6372, 2021. 3, 6

  52. [52]

    Topology- aware network pruning using multi-stage graph embedding and reinforcement learning

    Sixing Yu, Arya Mazaheri, and Ali Jannesari. Topology- aware network pruning using multi-stage graph embedding and reinforcement learning. InInternational Conference on Machine Learning, pages 25656–25667. PMLR, 2022. 1, 2, 3, 6

  53. [53]

    LAPP: Layer adaptive progressive prun- ing for compressing CNNs from scratch.arXiv preprint arXiv:2309.14157, 2023

    Pucheng Zhai, Kailing Guo, Fang Liu, Xiaofen Xing, and Xiangmin Xu. LAPP: Layer adaptive progressive prun- ing for compressing CNNs from scratch.arXiv preprint arXiv:2309.14157, 2023. 1

  54. [54]

    Model compression based on differentiable network channel pruning.IEEE Transactions on Neural Networks and Learn- ing Systems, 34(12):10203–10212, 2022

    Yu-Jie Zheng, Si-Bao Chen, Chris HQ Ding, and Bin Luo. Model compression based on differentiable network channel pruning.IEEE Transactions on Neural Networks and Learn- ing Systems, 34(12):10203–10212, 2022. 2, 6

  55. [55]

    Model compression based on differentiable network channel pruning.IEEE Transactions on Neural Networks and Learn- ing Systems, 34(12):10203–10212, 2022

    Yu-Jie Zheng, Si-Bao Chen, Chris HQ Ding, and Bin Luo. Model compression based on differentiable network channel pruning.IEEE Transactions on Neural Networks and Learn- ing Systems, 34(12):10203–10212, 2022. 1

  56. [56]

    Learn- ing N:M fine-grained structured sparse neural networks from scratch

    Aojun Zhou, Yukun Ma, Junnan Zhu, Jianbo Liu, Zhijie Zhang, Kun Yuan, Wenxiu Sun, and Hongsheng Li. Learn- ing N:M fine-grained structured sparse neural networks from scratch. InInternational Conference on Learning Represen- tations, 2021. 1, 2

  57. [57]

    Ac- celerate cnn via recursive bayesian pruning

    Yuefu Zhou, Ya Zhang, Yanfeng Wang, and Qi Tian. Ac- celerate cnn via recursive bayesian pruning. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 3306–3315, 2019. 2

  58. [58]

    Edge intelligence: Paving the last mile of arti- ficial intelligence with edge computing.Proceedings of the IEEE, 107:1738–1762, 2019

    Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Jun- shan Zhang. Edge intelligence: Paving the last mile of arti- ficial intelligence with edge computing.Proceedings of the IEEE, 107:1738–1762, 2019. 1 12