Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Anderson Sartor; Chingyi Lin; Kartikeya Bhardwaj; Radu Marculescu

arxiv: 1907.11804 · v1 · pith:RIKRHYI4new · submitted 2019-07-26 · 📊 stat.ML · cs.CV· cs.DC· cs.LG

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Kartikeya Bhardwaj , Chingyi Lin , Anderson Sartor , Radu Marculescu This is my paper

Pith reviewed 2026-05-24 14:56 UTC · model grok-4.3

classification 📊 stat.ML cs.CVcs.DCcs.LG

keywords model compressiondistributed inferenceIoTneural network partitioningedge computingdeep learningknowledge partitioning

0 comments

The pith

NoNN partitions a teacher neural network into disjoint compressed student modules that match the teacher's accuracy for distributed inference on IoT devices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a network science-based partitioning algorithm can divide a large pretrained teacher model into disjoint knowledge partitions, each used to train an independent highly compressed student module. These students can then be combined at inference time with only minimal communication between them and without meaningful accuracy loss relative to the original teacher. A reader would care because existing compression methods ignore communication costs, leaving models that exceed single-device memory limits unusable on IoT hardware, whereas this approach yields up to 24x memory reduction, 12x faster performance, 14x lower energy per node, and 33x lower total latency on edge devices for CIFAR-10.

Core claim

NoNN compresses a large pretrained teacher deep network into several disjoint and highly-compressed student modules without loss of accuracy by using a network science-based knowledge partitioning algorithm to create the partitions, then training individual students on those partitions, achieving higher accuracy than several baselines, similar accuracy to the teacher, and minimal communication among students.

What carries the argument

The network science-based knowledge partitioning algorithm, which divides the teacher model into disjoint partitions so that independently trained student modules can be combined at inference time.

If this is right

NoNN achieves higher accuracy than several baselines and similar accuracy to the teacher model while using minimal communication among students.
On edge devices for CIFAR-10, NoNN yields up to 24x memory reduction versus the large teacher model.
Deployment shows up to 12x performance improvement and 14x energy reduction per node compared to the teacher.
For distributed inference across multiple edge devices, NoNN achieves up to 33x reduction in total latency versus a state-of-the-art model compression baseline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The disjoint partitions could allow students to be trained in parallel on separate hardware without sharing training data.
The same partitioning approach might apply to non-image tasks such as time-series prediction on sensor networks.
Minimal communication among students could reduce the impact of intermittent connectivity typical in IoT environments.

Load-bearing premise

The partitioning algorithm produces disjoint sections of the teacher model such that students trained separately on each section can be combined without meaningful accuracy loss.

What would settle it

A direct measurement showing that the accuracy of the combined NoNN students on a held-out test set falls substantially below the accuracy of the original teacher model would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.11804 by Anderson Sartor, Chingyi Lin, Kartikeya Bhardwaj, Radu Marculescu.

**Figure 2.** Figure 2: (a) Knowledge Distillation (KD) is based on a significa [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Spli ing a deep network horizontally leads to huge co [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Complete flow of our approach. (a) The pretrained tea [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: (a) Selecting an individual NoNN student architectu [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Teacher, baseline student, and NoNN models for vario [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Performance and energy as the number of Raspberry dev [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Accuracy as some devices become unavailable due to de [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

read the original abstract

Model compression has emerged as an important area of research for deploying deep learning models on Internet-of-Things (IoT). However, for extremely memory-constrained scenarios, even the compressed models cannot fit within the memory of a single device and, as a result, must be distributed across multiple devices. This leads to a distributed inference paradigm in which memory and communication costs represent a major bottleneck. Yet, existing model compression techniques are not communication-aware. Therefore, we propose Network of Neural Networks (NoNN), a new distributed IoT learning paradigm that compresses a large pretrained 'teacher' deep network into several disjoint and highly-compressed 'student' modules, without loss of accuracy. Moreover, we propose a network science-based knowledge partitioning algorithm for the teacher model, and then train individual students on the resulting disjoint partitions. Extensive experimentation on five image classification datasets, for user-defined memory/performance budgets, show that NoNN achieves higher accuracy than several baselines and similar accuracy as the teacher model, while using minimal communication among students. Finally, as a case study, we deploy the proposed model for CIFAR-10 dataset on edge devices and demonstrate significant improvements in memory footprint (up to 24x), performance (up to 12x), and energy per node (up to 14x) compared to the large teacher model. We further show that for distributed inference on multiple edge devices, our proposed NoNN model results in up to 33x reduction in total latency w.r.t. a state-of-the-art model compression baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NoNN splits a teacher into disjoint students via network science partitioning for low-communication distributed inference on tight IoT memory, with empirical accuracy close to the teacher and large measured gains on CIFAR-10 hardware.

read the letter

The paper's main point is straightforward: take a large teacher, partition its knowledge into disjoint pieces using a network-science algorithm, train separate compressed students on those pieces, and run inference across multiple devices with almost no communication between them. This targets cases where even a single compressed model exceeds one device's memory. The experiments cover five image datasets and show the assembled NoNN matching teacher accuracy while beating several baselines, plus a CIFAR-10 edge-device deployment claiming up to 24x memory cut, 12x performance lift, 14x energy drop per node, and 33x total latency reduction versus a strong baseline. Those hardware numbers are the clearest practical signal here. The partitioning step is presented as the novel piece that enables the disjoint students and low comms, and the results treat it as an empirical outcome rather than a theorem. The full text measures end-to-end accuracy directly, so the central claim does not rest on untested assumptions about perfect recombination. Soft spots are mostly around the empirical nature of the work. The gains depend on how consistently the partitioning produces useful disjoint sets; if that step is sensitive to the teacher architecture or dataset, the advantage narrows. Communication costs are reported as minimal, but any real deployment would still need to confirm that all overhead (synchronization, data movement) stays low outside the controlled case study. Baselines appear reasonable from the abstract, yet a referee would want explicit checks that they represent current best practice for distributed compression. The method is algorithmic and data-driven with no parameter-free derivations, so reproducibility hinges on the released code and exact partitioning details. This is aimed at people building models for memory-constrained edge hardware. It is solid enough on its own terms to deserve referee time; the concrete hardware measurements and multi-dataset accuracy results give a referee something concrete to evaluate even if the partitioning technique turns out to be the main point of debate.

Referee Report

2 major / 3 minor

Summary. The paper proposes Network of Neural Networks (NoNN), which compresses a pretrained teacher deep network into multiple disjoint, highly compressed student modules via a network science-based knowledge partitioning algorithm. The students are trained independently and combined at inference with minimal inter-student communication. Experiments on five image classification datasets show accuracy comparable to the teacher and superior to baselines; a CIFAR-10 edge-device case study reports up to 24x memory reduction, 12x performance improvement, 14x energy reduction per node, and 33x total latency reduction versus a state-of-the-art baseline.

Significance. If the empirical outcomes hold, the work supplies a practical, communication-aware compression technique for memory-constrained distributed IoT inference. The hardware deployment measurements constitute a concrete strength, moving beyond simulation to quantify end-to-end gains in memory, latency, and energy. The approach is algorithmic rather than parameter-free or machine-checked, so its value rests on the reproducibility and robustness of the reported accuracy and resource numbers across the five datasets.

major comments (2)

[§4] §4 (Experiments): accuracy tables report point estimates for NoNN versus teacher and baselines but omit error bars, number of random seeds, or statistical significance tests; without these it is impossible to determine whether the claimed parity with the teacher is robust or sensitive to partitioning randomness.
[§3.2] §3.2 (Knowledge Partitioning): the description of the network-science graph construction and community-detection step does not specify the precise similarity metric, threshold for edge weights, or post-processing that guarantees disjointness; because the central claim of “minimal communication” and “no accuracy loss” rests on this disjointness, the missing algorithmic detail is load-bearing for reproducibility.

minor comments (3)

[Figure 3, Table 2] Figure 3 and Table 2: axis labels and legend entries use inconsistent abbreviations (e.g., “NoNN” vs. “proposed”) that should be unified for clarity.
[§5] §5 (Hardware Case Study): the exact mapping of student modules to physical devices and the measured communication volume per inference are not tabulated; adding these numbers would strengthen the latency claim.
[Abstract, §1] Abstract and §1: the phrase “without loss of accuracy” is used; the body correctly qualifies this as “similar accuracy,” so the abstract wording should be aligned.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation of minor revision. We address each major comment point-by-point below and will update the manuscript accordingly to improve clarity and robustness.

read point-by-point responses

Referee: [§4] §4 (Experiments): accuracy tables report point estimates for NoNN versus teacher and baselines but omit error bars, number of random seeds, or statistical significance tests; without these it is impossible to determine whether the claimed parity with the teacher is robust or sensitive to partitioning randomness.

Authors: We agree that reporting variability is important for assessing robustness. The revised manuscript will include results averaged over multiple random seeds (at least 5) for the partitioning step, with error bars showing standard deviation. We will also add statistical significance tests (e.g., Wilcoxon signed-rank tests) comparing NoNN to the teacher to confirm that accuracy parity holds across runs and is not an artifact of a single partitioning. revision: yes
Referee: [§3.2] §3.2 (Knowledge Partitioning): the description of the network-science graph construction and community-detection step does not specify the precise similarity metric, threshold for edge weights, or post-processing that guarantees disjointness; because the central claim of “minimal communication” and “no accuracy loss” rests on this disjointness, the missing algorithmic detail is load-bearing for reproducibility.

Authors: We acknowledge the need for greater specificity in §3.2 to support reproducibility of the disjoint partitions. The revised manuscript will explicitly detail the similarity metric (cosine similarity on neuron activation vectors), the edge-weight threshold used to construct the graph, and the post-processing rule (assigning any residual overlaps to the community with the highest internal connectivity) that enforces disjoint student modules. These additions will directly substantiate the claims of minimal inter-student communication and accuracy preservation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation is self-contained

full rationale

The paper introduces an algorithmic partitioning method and evaluates the resulting NoNN model through direct experiments on five datasets, reporting measured accuracy, memory, latency, and energy metrics against baselines and the teacher model. No equations, derivations, or fitted parameters are described that reduce the claimed accuracy or performance gains to quantities defined by construction within the paper. Self-citations, if present, are not load-bearing for any uniqueness theorem or ansatz that would force the central results. The weakest assumption (disjoint partitions combining without accuracy loss) is presented and tested as an empirical outcome rather than a theoretical guarantee.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are identifiable. The partitioning algorithm and student training procedure likely contain implementation choices, but these cannot be enumerated without the full manuscript.

pith-pipeline@v0.9.0 · 5830 in / 1266 out tokens · 28206 ms · 2026-05-24T14:56:09.801644+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 10 internal anchors

[1]

Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be dee p?. In Advances in neural information processing systems. 2654–2662

work page 2014
[2]

Facebook. 2017. ONNX: Open Neural Network Exchange Forma t. https://onnx.ai/

work page 2017
[3]

Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. 2018. ChannelNets: Co mpact and Eﬃcient Convolutional Neural Networks via Channel-Wise Convolutions. In Advances in Neural Information Processing Systems . 5203–5211

work page 2018
[4]

Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: C ompressing deep neural networks with pruning, trained quantization and Huﬀman coding. arXiv:1510.00149 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[5]

Song Han, Jeﬀ Pool, John Tran, and William Dally. 2015. Learning bot h weights and connections for eﬃcient neural network. In NIPS. 1135–1143

work page 2015
[6]

Geoﬀrey Hinton, Oriol Vinyals, and Jeﬀ Dean. 2015. Distilling the knowl edge in a neural network. arXiv:1503.02531 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[7]

Jeremy Howard. 2018. Imagenet in 18 minutes. https://www.fa st.ai/2018/08/10/fastai-diu-imagenet/. (2018). Accessed: 2018-10-01

work page 2018
[8]

Itay Hubara and et al. 2017. Quantized neural networks: Training neu ral networks with low precision weights and activations. JMLR 18, 1 (2017), 6869–6898

work page 2017
[9]

Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashra f, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer paramete rs and< 0.5 MB model size. arXiv:1602.07360 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[10]

Juyong Kim, Yookoon Park, Gunhee Kim, and Sung Ju Hwang. 2017. Sp litNet: Learning to semantically split deep networks for parameter reduction and model parallelization. In International Conference on Machine Learning . 1866– 1874

work page 2017
[11]

Yoon Kim and Alexander M Rush. 2016. Sequence-level knowledg e distillation. arXiv preprint arXiv:1606.07947 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[12]

Liangzhen Lai, Naveen Suda, and Vikas Chandra. 2017. Deep Convo lutional Neural Network Inference with Floating- point Weights and Fixed-point Activations. arXiv:1703.03073 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning ﬁlters for eﬃcient convnets. arXiv:1608.08710 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[14]

Jiachen Mao and et al. 2017. Modnn: Local distributed mobile com puting system for deep neural network. In 2017 DATE Conference. IEEE, 1396–1401

work page 2017
[15]

Jiachen Mao, Zhongda Yang, Wei Wen, Chunpeng Wu, Linghao Song, Kent W Nixon, Xiang Chen, Hai Li, and Yiran Chen. 2017. Mednn: A distributed mobile system with enhanced partition a nd deployment for large-scale dnns. In Proceedings of the 36th International Conference on Comput er-Aided Design. IEEE Press, 751–756

work page 2017
[16]

Mark Newman, Albert-Laszlo Barabasi, and Duncan J Watts. 20 11. The structure and dynamics of networks . Vol. 19. Princeton University Press

work page
[17]

Mark EJ Newman. 2006. Modularity and community structure in net works. Proceedings of the national academy of sciences 103, 23 (2006), 8577–8582

work page 2006
[18]

Ariadna Quattoni and Antonio Torralba. 2009. Recognizing indoor scene s. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 413–420

work page 2009
[19]

Mark Sandler and et al. 2018. Inverted residuals and linear bott lenecks: Mobile networks for classiﬁcation, detection and segmentation. arXiv:1801.04381 (2018). ACM Trans. Embedd. Comput. Syst., Vol. 00, No. 0, Article 000. P ublication date: 2019. 000:22 Bhardwaj, et al

work page internal anchor Pith review Pith/arXiv arXiv 2018
[20]

STMicro. 2018. Datasheet for Arm-Based Microcontroller w ith up to 512KB total storage (including FLASH memory). Product Page: https://bit.ly/2I5ZSMR. Datasheet. https:/ /bit.ly/2Kz8ehD

work page 2018
[21]

Zhiyuan Tang, Dong Wang, and Zhiyong Zhang. 2016. Recurrent neural network training with dark knowledge transfer. In 2016 IEEE International Conference on Acoustics, Speech an d Signal Processing (ICASSP) . IEEE, 5900–5904

work page 2016
[22]

Peter Welinder, Steve Branson, Takeshi Mita, Catherine Wah, Fl orian Schroﬀ, Serge Belongie, and Pietro Perona. 2010. Caltech-UCSD birds 200. (2010)

work page 2010
[23]

Tien-Ju Yang and et al. 2016. Designing energy-eﬃcient convolutional ne ural networks using energy-aware pruning. arXiv:1611.05128 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[24]

Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual net works. BMVC (2016)

work page 2016
[25]

Sergey Zagoruyko and Nikos Komodakis. 2017. Improving the pe rformance of convolutional neural networks via attention transfer. ICLR (2017)

work page 2017
[26]

Xiangyu Zhang and et al. 2017. ShuﬄeNet: An Extremely Eﬃcient C onvolutional Neural Network for Mobile Devices. CoRR abs/1707.01083 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 201 7. Hello Edge: Keyword Spotting on Microcon- trollers. arXiv:1711.07128 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[28]

Zhuoran Zhao, Kamyar Mirzazad Barijough, and Andreas Gerst lauer. 2018. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters. IEEE Transactions on Computer-Aided Design of Inte- grated Circuits and Systems 37, 11 (2018), 2348–2359. ACM Trans. Embedd. Comput. Syst., Vol. 00, No. 0, Article 000. P ublication date: 2019

work page 2018

[1] [1]

Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be dee p?. In Advances in neural information processing systems. 2654–2662

work page 2014

[2] [2]

Facebook. 2017. ONNX: Open Neural Network Exchange Forma t. https://onnx.ai/

work page 2017

[3] [3]

Hongyang Gao, Zhengyang Wang, and Shuiwang Ji. 2018. ChannelNets: Co mpact and Eﬃcient Convolutional Neural Networks via Channel-Wise Convolutions. In Advances in Neural Information Processing Systems . 5203–5211

work page 2018

[4] [4]

Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: C ompressing deep neural networks with pruning, trained quantization and Huﬀman coding. arXiv:1510.00149 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[5] [5]

Song Han, Jeﬀ Pool, John Tran, and William Dally. 2015. Learning bot h weights and connections for eﬃcient neural network. In NIPS. 1135–1143

work page 2015

[6] [6]

Geoﬀrey Hinton, Oriol Vinyals, and Jeﬀ Dean. 2015. Distilling the knowl edge in a neural network. arXiv:1503.02531 (2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015

[7] [7]

Jeremy Howard. 2018. Imagenet in 18 minutes. https://www.fa st.ai/2018/08/10/fastai-diu-imagenet/. (2018). Accessed: 2018-10-01

work page 2018

[8] [8]

Itay Hubara and et al. 2017. Quantized neural networks: Training neu ral networks with low precision weights and activations. JMLR 18, 1 (2017), 6869–6898

work page 2017

[9] [9]

Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashra f, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer paramete rs and< 0.5 MB model size. arXiv:1602.07360 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[10] [10]

Juyong Kim, Yookoon Park, Gunhee Kim, and Sung Ju Hwang. 2017. Sp litNet: Learning to semantically split deep networks for parameter reduction and model parallelization. In International Conference on Machine Learning . 1866– 1874

work page 2017

[11] [11]

Yoon Kim and Alexander M Rush. 2016. Sequence-level knowledg e distillation. arXiv preprint arXiv:1606.07947 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[12] [12]

Liangzhen Lai, Naveen Suda, and Vikas Chandra. 2017. Deep Convo lutional Neural Network Inference with Floating- point Weights and Fixed-point Activations. arXiv:1703.03073 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[13] [13]

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning ﬁlters for eﬃcient convnets. arXiv:1608.08710 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[14] [14]

Jiachen Mao and et al. 2017. Modnn: Local distributed mobile com puting system for deep neural network. In 2017 DATE Conference. IEEE, 1396–1401

work page 2017

[15] [15]

Jiachen Mao, Zhongda Yang, Wei Wen, Chunpeng Wu, Linghao Song, Kent W Nixon, Xiang Chen, Hai Li, and Yiran Chen. 2017. Mednn: A distributed mobile system with enhanced partition a nd deployment for large-scale dnns. In Proceedings of the 36th International Conference on Comput er-Aided Design. IEEE Press, 751–756

work page 2017

[16] [16]

Mark Newman, Albert-Laszlo Barabasi, and Duncan J Watts. 20 11. The structure and dynamics of networks . Vol. 19. Princeton University Press

work page

[17] [17]

Mark EJ Newman. 2006. Modularity and community structure in net works. Proceedings of the national academy of sciences 103, 23 (2006), 8577–8582

work page 2006

[18] [18]

Ariadna Quattoni and Antonio Torralba. 2009. Recognizing indoor scene s. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 413–420

work page 2009

[19] [19]

Mark Sandler and et al. 2018. Inverted residuals and linear bott lenecks: Mobile networks for classiﬁcation, detection and segmentation. arXiv:1801.04381 (2018). ACM Trans. Embedd. Comput. Syst., Vol. 00, No. 0, Article 000. P ublication date: 2019. 000:22 Bhardwaj, et al

work page internal anchor Pith review Pith/arXiv arXiv 2018

[20] [20]

STMicro. 2018. Datasheet for Arm-Based Microcontroller w ith up to 512KB total storage (including FLASH memory). Product Page: https://bit.ly/2I5ZSMR. Datasheet. https:/ /bit.ly/2Kz8ehD

work page 2018

[21] [21]

Zhiyuan Tang, Dong Wang, and Zhiyong Zhang. 2016. Recurrent neural network training with dark knowledge transfer. In 2016 IEEE International Conference on Acoustics, Speech an d Signal Processing (ICASSP) . IEEE, 5900–5904

work page 2016

[22] [22]

Peter Welinder, Steve Branson, Takeshi Mita, Catherine Wah, Fl orian Schroﬀ, Serge Belongie, and Pietro Perona. 2010. Caltech-UCSD birds 200. (2010)

work page 2010

[23] [23]

Tien-Ju Yang and et al. 2016. Designing energy-eﬃcient convolutional ne ural networks using energy-aware pruning. arXiv:1611.05128 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[24] [24]

Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual net works. BMVC (2016)

work page 2016

[25] [25]

Sergey Zagoruyko and Nikos Komodakis. 2017. Improving the pe rformance of convolutional neural networks via attention transfer. ICLR (2017)

work page 2017

[26] [26]

Xiangyu Zhang and et al. 2017. ShuﬄeNet: An Extremely Eﬃcient C onvolutional Neural Network for Mobile Devices. CoRR abs/1707.01083 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[27] [27]

Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 201 7. Hello Edge: Keyword Spotting on Microcon- trollers. arXiv:1711.07128 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[28] [28]

Zhuoran Zhao, Kamyar Mirzazad Barijough, and Andreas Gerst lauer. 2018. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters. IEEE Transactions on Computer-Aided Design of Inte- grated Circuits and Systems 37, 11 (2018), 2348–2359. ACM Trans. Embedd. Comput. Syst., Vol. 00, No. 0, Article 000. P ublication date: 2019

work page 2018