Learning Instance-wise Sparsity for Accelerating Deep Models

Chang Xu; Chuanjian Liu; Chunjing Xu; Kai Han; Yunhe Wang

arxiv: 1907.11840 · v1 · pith:JVDKL4O7new · submitted 2019-07-27 · 💻 cs.CV · cs.LG

Learning Instance-wise Sparsity for Accelerating Deep Models

Chuanjian Liu , Yunhe Wang , Kai Han , Chunjing Xu , Chang Xu This is my paper

Pith reviewed 2026-05-24 15:13 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords instance-wise sparsityfeature decay regularizationinstance-wise feature pruningdeep model accelerationcoefficient of variationconvolutional neural networks

0 comments

The pith

Feature decay regularization creates instance-specific sparsity in neural network layers to speed up inference by pruning unimportant features per image.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes making feature maps sparse differently for each input image by adding a regularization term that encourages decay of unimportant features. This sparsity allows skipping computations for subtle features during inference, speeding up the network while maintaining accuracy on the task. The method selects which layers to prune using the coefficient of variation of feature importance across instances. Experiments on standard image datasets and networks show that this instance-aware approach reduces computation without significant performance loss. A sympathetic reader would care because most acceleration methods treat all data the same, but data varies, so tailoring sparsity to instances could be more efficient.

Core claim

An instance-wise feature pruning is developed by identifying informative features for different instances. Specifically, by investigating a feature decay regularization, we expect intermediate feature maps of each instance in deep neural networks to be sparse while preserving the overall network performance. During online inference, subtle features of input images extracted by intermediate layers of a well-trained neural network can be eliminated to accelerate the subsequent calculations. We further take coefficient of variation as a measure to select the layers that are appropriate for acceleration.

What carries the argument

Feature decay regularization that promotes sparsity in intermediate feature maps on a per-instance basis, combined with coefficient of variation for layer selection.

If this is right

Subtle features of input images can be eliminated during inference to accelerate subsequent calculations.
The overall network performance is preserved despite the instance-wise pruning.
Layers appropriate for acceleration are identified by the coefficient of variation measure.
The method respects differences between data instances rather than applying uniform pruning without regard to the input.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The learned per-instance sparsity patterns could support hardware that skips operations based on the current input content.
The regularization approach might extend to non-convolutional architectures if similar activation decay is applied.
Combining this with static pruning methods could produce models that are both structurally slim and dynamically sparse.
If the coefficient of variation reliably flags safe layers, the selection step could be automated without manual tuning per network.

Load-bearing premise

The feature decay regularization can induce sufficient sparsity in intermediate feature maps for different instances without degrading overall task performance.

What would settle it

Measuring whether the pruned network maintains baseline accuracy on a standard test set like CIFAR-10 or ImageNet after skipping the identified subtle features.

Figures

Figures reproduced from arXiv: 1907.11840 by Chang Xu, Chuanjian Liu, Chunjing Xu, Kai Han, Yunhe Wang.

**Figure 2.** Figure 2: The framework of our methods, includes train procedure with feature regularization and test procedure with feature sparsification. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The channel pruning results of VGG16. The x-axis is [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The accuracy of different classes of samples in CIFAR-10 [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: The easy and hard samples selected from Imagenet with [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

Exploring deep convolutional neural networks of high efficiency and low memory usage is very essential for a wide variety of machine learning tasks. Most of existing approaches used to accelerate deep models by manipulating parameters or filters without data, e.g., pruning and decomposition. In contrast, we study this problem from a different perspective by respecting the difference between data. An instance-wise feature pruning is developed by identifying informative features for different instances. Specifically, by investigating a feature decay regularization, we expect intermediate feature maps of each instance in deep neural networks to be sparse while preserving the overall network performance. During online inference, subtle features of input images extracted by intermediate layers of a well-trained neural network can be eliminated to accelerate the subsequent calculations. We further take coefficient of variation as a measure to select the layers that are appropriate for acceleration. Extensive experiments conducted on benchmark datasets and networks demonstrate the effectiveness of the proposed method.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper tries instance-wise feature map pruning via decay regularization for CNN speedups, but the abstract gives no numbers so the gains are unverified.

read the letter

The core idea is to add a feature decay term during training so that intermediate activations become sparse differently for each input image, then drop the weak ones at inference time and use coefficient of variation to pick which layers are safe to prune. That is the main new angle compared with the usual static filter or weight pruning methods mentioned in the abstract. The regularization is simple and the CV heuristic for layer choice is a practical detail that could be easy to implement. If the full experiments show consistent speedups on ResNet-style models with little accuracy drop on CIFAR or ImageNet, the trick might be worth trying in deployment settings where input statistics vary. The paper frames the distinction from parameter-based methods clearly and sticks to an empirical approach without overclaiming theory. The main weakness is that the provided abstract contains zero quantitative results, no baseline comparisons, and no error bars, so it is impossible to judge whether the sparsity is actually useful or just marginal. The claim that performance is preserved rests entirely on the promise of “extensive experiments,” which needs to be checked in the tables. Minor additional questions are whether the per-instance decision adds noticeable overhead and whether the CV threshold generalizes across architectures. This is the kind of paper that would interest people working on efficient inference rather than core theory. A reader looking for new regularization ideas for sparsity could pick up the method description. It is coherent enough on its own terms to deserve referee time so the actual numbers and any hidden costs can be evaluated.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes an instance-wise feature pruning approach for accelerating deep CNNs. It introduces a feature decay regularization to induce sparsity in per-instance intermediate feature maps while aiming to preserve overall network performance. During inference, subtle features are eliminated, with coefficient of variation used to identify suitable layers for pruning. The abstract states that extensive experiments on benchmark datasets and networks demonstrate the method's effectiveness.

Significance. If the empirical claims hold with preserved accuracy and measurable acceleration, the work would offer a data-dependent complement to parameter- or filter-based pruning methods, potentially improving efficiency by respecting instance-specific feature relevance rather than applying uniform pruning.

major comments (1)

[Abstract] Abstract: the central claim that 'extensive experiments conducted on benchmark datasets and networks demonstrate the effectiveness of the proposed method' is unsupported, as the manuscript provides no quantitative results, baselines, accuracy metrics, speedup numbers, error analysis, or details on performance preservation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the recommendation for major revision. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'extensive experiments conducted on benchmark datasets and networks demonstrate the effectiveness of the proposed method' is unsupported, as the manuscript provides no quantitative results, baselines, accuracy metrics, speedup numbers, error analysis, or details on performance preservation.

Authors: We agree that the abstract's claim is currently unsupported in the provided manuscript text, which consists only of the abstract without any experimental section, tables, or quantitative details. This is a valid observation. In the revised version we will add a complete Experiments section reporting results on standard benchmarks (e.g., CIFAR, ImageNet) and networks (e.g., ResNet, VGG), including direct comparisons to baselines, top-1 accuracy before/after pruning, measured inference speedup, and analysis confirming performance preservation. We will also update the abstract to reference specific quantitative outcomes if space permits. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical method using feature decay regularization to induce per-instance sparsity in intermediate feature maps, followed by coefficient-of-variation-based layer selection for inference-time pruning. No load-bearing derivations, predictions, or uniqueness claims reduce to self-definitions, fitted inputs, or self-citation chains; the approach is validated directly via benchmark experiments rather than internal construction. This is a standard empirical contribution with external falsifiability.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit details on free parameters, axioms, or invented entities; full text required for identification.

pith-pipeline@v0.9.0 · 5684 in / 1140 out tokens · 27841 ms · 2026-05-24T15:13:50.999096+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 2 internal anchors

[1]

Dynamic capacity networks

[Almahairi et al., 2016] Amjad Almahairi, Nicolas Ballas, Tim Cooijmans, Yin Zheng, Hugo Larochelle, and Aaron Courville. Dynamic capacity networks. In ICML,

work page 2016
[2]

Adaptive neural networks for efﬁcient inference

[Bolukbasi et al., 2017] Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. Adaptive neural networks for efﬁcient inference. In ICML,

work page 2017
[3]

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

[Courbariaux et al., 2015] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre Binaryconnect David. Training deep neural networks with binary weights during propaga- tions. arXiv preprint arXiv:1511.00363,

work page internal anchor Pith review Pith/arXiv arXiv 2015
[4]

Imagenet: A large-scale hierarchical image database

[Deng et al., 2009] Jia Deng, Wei Dong, Richard Socher, Li- Jia Li, Kai Li, and Fei-Fei Li. Imagenet: A large-scale hierarchical image database. In CVPR,

work page 2009
[5]

Exploiting linear structure within convolutional networks for efﬁcient evaluation

[Denton et al., 2014] Emily L Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. Exploiting linear structure within convolutional networks for efﬁcient evaluation. In NIPS,

work page 2014
[6]

More is less: A more complicated network with less inference complexity

[Dong et al., 2017] Xuanyi Dong, Junshi Huang, Yi Yang, and Shuicheng Yan. More is less: A more complicated network with less inference complexity. In CVPR,

work page 2017
[7]

Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, and Ruslan Salakhutdinov

[Figurnov et al., 2017] Michael Figurnov, Maxwell D. Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, and Ruslan Salakhutdinov. Spatially adaptive computation time for residual networks. In CVPR,

work page 2017
[8]

Dynamic Channel Pruning: Feature Boosting and Suppression

[Gao et al., 2018] Xitong Gao, Yiren Zhao, Łukasz Dudziak, Robert Mullins, and Cheng-zhong Xu. Dynamic channel pruning: Feature boosting and suppression. arXiv preprint arXiv:1810.05331,

work page internal anchor Pith review Pith/arXiv arXiv 2018
[9]

Deep compression: Compressing deep neural net- works with pruning, trained quantization and huffman cod- ing

[Han et al., 2016] Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural net- works with pruning, trained quantization and huffman cod- ing. In ICLR,

work page 2016
[10]

Deep residual learning for image recog- nition

[He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recog- nition. In CVPR,

work page 2016
[11]

Edward Suh

[Hua et al., 2018] Weizhe Hua, Christopher De Sa, Zhiru Zhang, and G. Edward Suh. Channel gating neural net- works. arXiv preprint arXiv:1805.12549,

work page arXiv 2018
[12]

Accurate image super-resolution using very deep convolutional networks

[Kim et al., 2016] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate image super-resolution using very deep convolutional networks. In CVPR,

work page 2016
[13]

Imagenet classiﬁcation with deep convolutional neural networks

[Krizhevsky et al., 2012] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In NIPS,

work page 2012
[14]

Learning multiple lay- ers of features from tiny images

[Krizhevsky, 2009] Alex Krizhevsky. Learning multiple lay- ers of features from tiny images. Technical report, Cite- seer,

work page 2009
[15]

Runtime neural pruning

[Lin et al., 2017] Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. Runtime neural pruning. In NIPS,

work page 2017
[16]

Dynamic deep neural networks: Optimizing accuracy-efﬁciency trade-offs by selective execution

[Liu and Deng, 2018] Lanlan Liu and Jia Deng. Dynamic deep neural networks: Optimizing accuracy-efﬁciency trade-offs by selective execution. In AAAI,

work page 2018
[17]

Learning efﬁcient convolutional networks through net- work slimming

[Liu et al., 2017] Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efﬁcient convolutional networks through net- work slimming. In ICCV,

work page 2017
[18]

Thinet: A ﬁlter level pruning method for deep neural network compression

[Luo et al., 2017] Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. Thinet: A ﬁlter level pruning method for deep neural network compression. In ICCV,

work page 2017
[19]

Deciding how to decide: Dynamic routing in artiﬁcial neural networks

[McGill and Perona, 2017] Mason McGill and Pietro Per- ona. Deciding how to decide: Dynamic routing in artiﬁcial neural networks. In ICML,

work page 2017
[20]

Xnor-net: Ima- genet classiﬁcation using binary convolutional neural net- works

[Rastegari et al., 2016] Mohammad Rastegari, Vicente Or- donez, Joseph Redmon, and Ali Farhadi. Xnor-net: Ima- genet classiﬁcation using binary convolutional neural net- works. In ECCV,

work page 2016
[21]

Faster r-cnn: Towards real-time object detection with region proposal networks

[Ren et al., 2015] Shaoqing Ren, Kaiming He, Ross Gir- shick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS,

work page 2015
[22]

Sbnet: Sparse blocks network for fast inference

[Ren et al., 2018] Mengye Ren, Andrei Pokrovsky, Bin Yang, and Raquel Urtasun. Sbnet: Sparse blocks network for fast inference. In CVPR,

work page 2018
[23]

Mobilenetv2: Inverted residuals and linear bottlenecks

[Sandler et al., 2018] Mark Sandler, Andrew Howard, Men- glong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR,

work page 2018
[24]

Very deep convolutional networks for large-scale image recognition

[Simonyan and Zisserman, 2015] Karen Simonyan and An- drew Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR,

work page 2015
[25]

[Teerapittayanon et al., 2016] Surat Teerapittayanon, Bradley McDanel, and H.T. Kung. Branchynet: Fast inference via early exiting from deep neural networks. In ICPR,

work page 2016
[26]

Mark, Noam Shazeer, and Kayvon Fata- halian

[Teja Mullapudi et al., 2018] Ravi Teja Mullapudi, William R. Mark, Noam Shazeer, and Kayvon Fata- halian. Hydranets: Specialized dynamic architectures for efﬁcient inference. In CVPR,

work page 2018
[27]

Improving the speed of neural networks on cpus

[Vanhoucke et al., 2011] Vincent Vanhoucke, Andrew Se- nior, and Mark Z Mao. Improving the speed of neural networks on cpus. In NIPS,

work page 2011
[28]

Convolutional networks with adaptive inference graphs

[Veit and Belongie, 2018] Andreas Veit and Serge Belongie. Convolutional networks with adaptive inference graphs. In ECCV,

work page 2018
[29]

Cnnpack: Packing convolu- tional neural networks in the frequency domain

[Wang et al., 2016] Yunhe Wang, Chang Xu, Shan You, Dacheng Tao, and Chao Xu. Cnnpack: Packing convolu- tional neural networks in the frequency domain. In NIPS,

work page 2016
[30]

Gonzalez

[Wang et al., 2018] Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E. Gonzalez. Skipnet: Learning dy- namic routing in convolutional networks. In ECCV,

work page 2018
[31]

Learning structured sparsity in deep neural networks

[Wen et al., 2016] Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. In NIPS,

work page 2016
[32]

l2, 1-norm regularized dis- criminative feature selection for unsupervised learning

[Yang et al., 2011] Yi Yang, Heng Tao Shen, Zhigang Ma, Zi Huang, and Xiaofang Zhou. l2, 1-norm regularized dis- criminative feature selection for unsupervised learning. In IJCAI, 2011

work page 2011

[1] [1]

Dynamic capacity networks

[Almahairi et al., 2016] Amjad Almahairi, Nicolas Ballas, Tim Cooijmans, Yin Zheng, Hugo Larochelle, and Aaron Courville. Dynamic capacity networks. In ICML,

work page 2016

[2] [2]

Adaptive neural networks for efﬁcient inference

[Bolukbasi et al., 2017] Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. Adaptive neural networks for efﬁcient inference. In ICML,

work page 2017

[3] [3]

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

[Courbariaux et al., 2015] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre Binaryconnect David. Training deep neural networks with binary weights during propaga- tions. arXiv preprint arXiv:1511.00363,

work page internal anchor Pith review Pith/arXiv arXiv 2015

[4] [4]

Imagenet: A large-scale hierarchical image database

[Deng et al., 2009] Jia Deng, Wei Dong, Richard Socher, Li- Jia Li, Kai Li, and Fei-Fei Li. Imagenet: A large-scale hierarchical image database. In CVPR,

work page 2009

[5] [5]

Exploiting linear structure within convolutional networks for efﬁcient evaluation

[Denton et al., 2014] Emily L Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. Exploiting linear structure within convolutional networks for efﬁcient evaluation. In NIPS,

work page 2014

[6] [6]

More is less: A more complicated network with less inference complexity

[Dong et al., 2017] Xuanyi Dong, Junshi Huang, Yi Yang, and Shuicheng Yan. More is less: A more complicated network with less inference complexity. In CVPR,

work page 2017

[7] [7]

Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, and Ruslan Salakhutdinov

[Figurnov et al., 2017] Michael Figurnov, Maxwell D. Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, and Ruslan Salakhutdinov. Spatially adaptive computation time for residual networks. In CVPR,

work page 2017

[8] [8]

Dynamic Channel Pruning: Feature Boosting and Suppression

[Gao et al., 2018] Xitong Gao, Yiren Zhao, Łukasz Dudziak, Robert Mullins, and Cheng-zhong Xu. Dynamic channel pruning: Feature boosting and suppression. arXiv preprint arXiv:1810.05331,

work page internal anchor Pith review Pith/arXiv arXiv 2018

[9] [9]

Deep compression: Compressing deep neural net- works with pruning, trained quantization and huffman cod- ing

[Han et al., 2016] Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural net- works with pruning, trained quantization and huffman cod- ing. In ICLR,

work page 2016

[10] [10]

Deep residual learning for image recog- nition

[He et al., 2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recog- nition. In CVPR,

work page 2016

[11] [11]

Edward Suh

[Hua et al., 2018] Weizhe Hua, Christopher De Sa, Zhiru Zhang, and G. Edward Suh. Channel gating neural net- works. arXiv preprint arXiv:1805.12549,

work page arXiv 2018

[12] [12]

Accurate image super-resolution using very deep convolutional networks

[Kim et al., 2016] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate image super-resolution using very deep convolutional networks. In CVPR,

work page 2016

[13] [13]

Imagenet classiﬁcation with deep convolutional neural networks

[Krizhevsky et al., 2012] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In NIPS,

work page 2012

[14] [14]

Learning multiple lay- ers of features from tiny images

[Krizhevsky, 2009] Alex Krizhevsky. Learning multiple lay- ers of features from tiny images. Technical report, Cite- seer,

work page 2009

[15] [15]

Runtime neural pruning

[Lin et al., 2017] Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. Runtime neural pruning. In NIPS,

work page 2017

[16] [16]

Dynamic deep neural networks: Optimizing accuracy-efﬁciency trade-offs by selective execution

[Liu and Deng, 2018] Lanlan Liu and Jia Deng. Dynamic deep neural networks: Optimizing accuracy-efﬁciency trade-offs by selective execution. In AAAI,

work page 2018

[17] [17]

Learning efﬁcient convolutional networks through net- work slimming

[Liu et al., 2017] Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. Learning efﬁcient convolutional networks through net- work slimming. In ICCV,

work page 2017

[18] [18]

Thinet: A ﬁlter level pruning method for deep neural network compression

[Luo et al., 2017] Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. Thinet: A ﬁlter level pruning method for deep neural network compression. In ICCV,

work page 2017

[19] [19]

Deciding how to decide: Dynamic routing in artiﬁcial neural networks

[McGill and Perona, 2017] Mason McGill and Pietro Per- ona. Deciding how to decide: Dynamic routing in artiﬁcial neural networks. In ICML,

work page 2017

[20] [20]

Xnor-net: Ima- genet classiﬁcation using binary convolutional neural net- works

[Rastegari et al., 2016] Mohammad Rastegari, Vicente Or- donez, Joseph Redmon, and Ali Farhadi. Xnor-net: Ima- genet classiﬁcation using binary convolutional neural net- works. In ECCV,

work page 2016

[21] [21]

Faster r-cnn: Towards real-time object detection with region proposal networks

[Ren et al., 2015] Shaoqing Ren, Kaiming He, Ross Gir- shick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS,

work page 2015

[22] [22]

Sbnet: Sparse blocks network for fast inference

[Ren et al., 2018] Mengye Ren, Andrei Pokrovsky, Bin Yang, and Raquel Urtasun. Sbnet: Sparse blocks network for fast inference. In CVPR,

work page 2018

[23] [23]

Mobilenetv2: Inverted residuals and linear bottlenecks

[Sandler et al., 2018] Mark Sandler, Andrew Howard, Men- glong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR,

work page 2018

[24] [24]

Very deep convolutional networks for large-scale image recognition

[Simonyan and Zisserman, 2015] Karen Simonyan and An- drew Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR,

work page 2015

[25] [25]

[Teerapittayanon et al., 2016] Surat Teerapittayanon, Bradley McDanel, and H.T. Kung. Branchynet: Fast inference via early exiting from deep neural networks. In ICPR,

work page 2016

[26] [26]

Mark, Noam Shazeer, and Kayvon Fata- halian

[Teja Mullapudi et al., 2018] Ravi Teja Mullapudi, William R. Mark, Noam Shazeer, and Kayvon Fata- halian. Hydranets: Specialized dynamic architectures for efﬁcient inference. In CVPR,

work page 2018

[27] [27]

Improving the speed of neural networks on cpus

[Vanhoucke et al., 2011] Vincent Vanhoucke, Andrew Se- nior, and Mark Z Mao. Improving the speed of neural networks on cpus. In NIPS,

work page 2011

[28] [28]

Convolutional networks with adaptive inference graphs

[Veit and Belongie, 2018] Andreas Veit and Serge Belongie. Convolutional networks with adaptive inference graphs. In ECCV,

work page 2018

[29] [29]

Cnnpack: Packing convolu- tional neural networks in the frequency domain

[Wang et al., 2016] Yunhe Wang, Chang Xu, Shan You, Dacheng Tao, and Chao Xu. Cnnpack: Packing convolu- tional neural networks in the frequency domain. In NIPS,

work page 2016

[30] [30]

Gonzalez

[Wang et al., 2018] Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E. Gonzalez. Skipnet: Learning dy- namic routing in convolutional networks. In ECCV,

work page 2018

[31] [31]

Learning structured sparsity in deep neural networks

[Wen et al., 2016] Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. Learning structured sparsity in deep neural networks. In NIPS,

work page 2016

[32] [32]

l2, 1-norm regularized dis- criminative feature selection for unsupervised learning

[Yang et al., 2011] Yi Yang, Heng Tao Shen, Zhigang Ma, Zi Huang, and Xiaofang Zhou. l2, 1-norm regularized dis- criminative feature selection for unsupervised learning. In IJCAI, 2011

work page 2011