SwiftNet: Using Graph Propagation as Meta-knowledge to Search Highly Representative Neural Architectures

Feng Yan; Hai Li; Harris Teague; Hsin-Pai Cheng; Shiyu Li; Tunhou Zhang; Yiran Chen; Yukun Yang

arxiv: 1906.08305 · v2 · pith:RNJDGRGZnew · submitted 2019-06-19 · 💻 cs.LG · cs.AI· cs.CV

SwiftNet: Using Graph Propagation as Meta-knowledge to Search Highly Representative Neural Architectures

Hsin-Pai Cheng , Tunhou Zhang , Yukun Yang , Feng Yan , Shiyu Li , Harris Teague , Hai Li , Yiran Chen This is my paper

Pith reviewed 2026-05-25 20:08 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV

keywords neural architecture searchgraph propagationmeta-knowledgeedge computingImageNetmobile neural networksaccuracy densityarchitecture pruning

0 comments

The pith

Graph propagation as meta-knowledge enables flexible node-wise neural architecture search without predefined cells, yielding SwiftNet models with higher accuracy density and lower search costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes GRAM to use graph propagation for accumulating meta-knowledge in neural architecture search. This supports a fine-grained node-wise search space and a structure-level pruning method to remove redundant operations. The resulting SwiftNet models achieve 2.15 times higher accuracy density than MobileNet-V2 and reduce search cost by 26 times compared to FBNet, while reaching 63.28 percent top-1 accuracy on ImageNet with few parameters and MACs. This approach matters for automating the design of efficient networks for edge devices under accuracy and cost constraints.

Core claim

GRAM adopts fine-grained node-wise search and accumulates the knowledge learned in updates into a meta-graph. As a result, GRAM can enable more flexible search space and achieve higher search efficiency. Without the constraints of predefined cell or blocks, a new structure-level pruning method removes redundant operations in neural architectures. SwiftNet, discovered by GRAM, outperforms MobileNet-V2 by 2.15x higher accuracy density and 2.42x faster with similar accuracy, and reduces the search cost by 26x compared with FBNet while achieving 2.35x higher accuracy density and 1.47x speedup.

What carries the argument

GRAM, the graph propagation mechanism that accumulates search knowledge into a meta-graph to guide architecture choices at the node level.

Load-bearing premise

The meta-graph propagation transfers useful search knowledge across architectures rather than simply memorizing high-scoring candidates from the current run.

What would settle it

An experiment that disables the graph propagation while keeping the node-wise search space and measures whether the discovered models have lower accuracy density or require more search time.

Figures

Figures reproduced from arXiv: 1906.08305 by Feng Yan, Hai Li, Harris Teague, Hsin-Pai Cheng, Shiyu Li, Tunhou Zhang, Yiran Chen, Yukun Yang.

**Figure 2.** Figure 2: Overview diagram of the search process. To form a sampled DNN, we subsample multiple DAGs from the complete DAG. After [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: A representative architecture discovered by GRAM. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: ImageNet-1K top-1 accuracy density comparison be [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Model scale vs Structure pruning level [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Trade-off between latency and accuracy on ImageNet-1K [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Designing neural architectures for edge devices is subject to constraints of accuracy, inference latency, and computational cost. Traditionally, researchers manually craft deep neural networks to meet the needs of mobile devices. Neural Architecture Search (NAS) was proposed to automate the neural architecture design without requiring extensive domain expertise and significant manual efforts. Recent works utilized NAS to design mobile models by taking into account hardware constraints and achieved state-of-the-art accuracy with fewer parameters and less computational cost measured in Multiply-accumulates (MACs). To find highly compact neural architectures, existing works relies on predefined cells and directly applying width multiplier, which may potentially limit the model flexibility, reduce the useful feature map information, and cause accuracy drop. To conquer this issue, we propose GRAM(GRAph propagation as Meta-knowledge) that adopts fine-grained (node-wise) search method and accumulates the knowledge learned in updates into a meta-graph. As a result, GRAM can enable more flexible search space and achieve higher search efficiency. Without the constraints of predefined cell or blocks, we propose a new structure-level pruning method to remove redundant operations in neural architectures. SwiftNet, which is a set of models discovered by GRAM, outperforms MobileNet-V2 by 2.15x higher accuracy density and 2.42x faster with similar accuracy. Compared with FBNet, SwiftNet reduces the search cost by 26x and achieves 2.35x higher accuracy density and 1.47x speedup while preserving similar accuracy. SwiftNetcan obtain 63.28% top-1 accuracy on ImageNet-1K with only 53M MACs and 2.07M parameters. The corresponding inference latency is only 19.09 ms on Google Pixel 1.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SwiftNet drops cell constraints via node-wise search plus meta-graph accumulation, but the abstract gives no ablation showing the graph step actually transfers knowledge beyond the expanded space.

read the letter

The paper's main move is to replace the usual cell or block templates in mobile NAS with a node-wise search space, then use a meta-graph that accumulates prior updates as guidance and apply structure-level pruning to drop redundant operations. This produces SwiftNet models that hit 63.28% top-1 on ImageNet-1K at 53M MACs and 2.07M parameters, with the stated 2.15x accuracy-density gain over MobileNet-V2 and 26x lower search cost than FBNet, plus 19 ms latency on Pixel 1. The removal of the cell limit is a concrete step past most prior hardware-aware NAS work. The reported numbers and direct comparisons are the parts that stand out as useful. The soft spot is exactly the one the stress-test note flags: the abstract attributes the efficiency and flexibility gains to the meta-graph propagation but supplies no ablation that turns propagation off while keeping the node-wise space and pruning fixed. There are also no error bars, no count of independent runs, and no search hyper-parameter details. Without those, it is difficult to tell whether the meta-graph is doing real cross-architecture transfer or whether the gains simply come from the larger search space itself. The stress-test concern holds on the text provided. This is for readers working on hardware-aware NAS who want to relax the cell constraint. The search procedure and pruning step are concrete enough that someone in that area could test the idea. It deserves peer review because the departure from cell-based methods is clear and the empirical claims are specific enough for referees to check once the full experiments and ablations are in front of them.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes GRAM (GRAph propagation as Meta-knowledge), a NAS method that accumulates updates into a meta-graph to support fine-grained node-wise search and structure-level pruning without relying on predefined cells or blocks. It presents SwiftNet models achieving 63.28% top-1 accuracy on ImageNet-1K with 53M MACs and 2.07M parameters, claiming 2.15× higher accuracy density than MobileNet-V2, 2.42× faster inference at similar accuracy, 26× lower search cost than FBNet, and 2.35× higher accuracy density with 1.47× speedup while preserving accuracy.

Significance. If the results are shown to be robust, the meta-graph approach could advance hardware-aware NAS by enabling more flexible search spaces and knowledge reuse across architectures, potentially reducing the reliance on cell-based designs for compact mobile models.

major comments (2)

[Abstract] Abstract: the headline claims (63.28% top-1 accuracy at 53M MACs / 2.07M params, 2.15× accuracy density vs. MobileNet-V2, 26× search-cost reduction vs. FBNet) are stated without error bars, number of independent runs, or search hyper-parameters, preventing verification of statistical reliability.
[Abstract] Abstract / Method description: no ablation is reported that disables the meta-graph propagation while retaining the node-wise search space and structure-level pruning; this is required to establish that the claimed efficiency gains arise from cross-architecture knowledge transfer rather than the expanded search space or pruning heuristic alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for your thorough review and constructive comments. We have carefully considered the points raised regarding the statistical reliability of our results and the need for an ablation study on the meta-graph component. Our responses are as follows.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claims (63.28% top-1 accuracy at 53M MACs / 2.07M params, 2.15× accuracy density vs. MobileNet-V2, 26× search-cost reduction vs. FBNet) are stated without error bars, number of independent runs, or search hyper-parameters, preventing verification of statistical reliability.

Authors: The search hyperparameters are detailed in the experimental setup section of the manuscript. We acknowledge the absence of error bars and multiple independent runs in the abstract, which stems from the high computational cost of performing multiple full NAS searches on ImageNet. We will revise the abstract to include the search hyperparameters and a note on the single-run nature of the results to improve verifiability. revision: yes
Referee: [Abstract] Abstract / Method description: no ablation is reported that disables the meta-graph propagation while retaining the node-wise search space and structure-level pruning; this is required to establish that the claimed efficiency gains arise from cross-architecture knowledge transfer rather than the expanded search space or pruning heuristic alone.

Authors: We agree that an ablation isolating the meta-graph propagation (while retaining node-wise search and pruning) would help confirm its contribution to efficiency gains. We will add this ablation study to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical NAS claims with external benchmarks

full rationale

The paper offers no derivation chain, equations, or first-principles predictions. All claims consist of empirical accuracy, MACs, parameter counts, and latency numbers benchmarked against independent external models (MobileNet-V2, FBNet). The meta-graph propagation and pruning are presented as algorithmic choices whose value is asserted via direct comparison rather than any self-referential reduction or fitted-parameter prediction. No self-citations are invoked as load-bearing uniqueness theorems. The result is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract supplies no explicit free parameters, axioms, or invented entities; all claims rest on the unstated assumption that the reported ImageNet numbers were obtained under standard training protocols and that the meta-graph mechanism improves search without additional hidden tuning.

pith-pipeline@v0.9.0 · 5877 in / 1249 out tokens · 19786 ms · 2026-05-25T20:08:50.807992+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 5 internal anchors

[1]

Mask r- cnn,

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r- cnn,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969, 2017. 1

work page 2017
[2]

Deep neural networks for acoustic modeling in speech recognition,

G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V . Vanhoucke, P. Nguyen, B. Kings- bury, et al., “Deep neural networks for acoustic modeling in speech recognition,” in IEEE Signal processing magazine , vol. 29, 2012. 1

work page 2012
[3]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in arXiv preprint arXiv:1810.04805 , 2018. 1

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

Phrase-based & neural unsupervised machine transla- tion,

G. Lample, M. Ott, A. Conneau, L. Denoyer, and M. Ran- zato, “Phrase-based & neural unsupervised machine transla- tion,” in Proceedings of the Conference on Empirical Meth- ods in Natural Language Processing, 2018. 1

work page 2018
[5]

Neural architecture search with re- inforcement learning,

B. Zoph and Q. V . Le, “Neural architecture search with re- inforcement learning,” in Proceedings of the International Conference on Learning Representations, 2017. 1, 3

work page 2017
[6]

Large-scale evolution of image classiﬁers,

E. Real, S. Moore, A. Selle, S. Saxena, Y . L. Suematsu, J. Tan, Q. V . Le, and A. Kurakin, “Large-scale evolution of image classiﬁers,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pp. 2902– 2911, JMLR. org, 2017. 1

work page 2017
[7]

Efﬁcient multi- objective neural architecture search via lamarckian evolu- tion,

T. Elsken, J. H. Metzen, and F. Hutter, “Efﬁcient multi- objective neural architecture search via lamarckian evolu- tion,” in Proceedings of the International Conference on Learning Representations, 2019. 1

work page 2019
[8]

Learning transferable architectures for scalable image recognition,

B. Zoph, V . Vasudevan, J. Shlens, and Q. V . Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710, 2018. 1, 2, 3, 6

work page 2018
[9]

Darts: Differentiable ar- chitecture search,

H. Liu, K. Simonyan, and Y . Yang, “Darts: Differentiable ar- chitecture search,” in Proceedings of the International Con- ference on Learning Representations, 2019. 1, 3

work page 2019
[10]

Fbnet: Hardware-aware ef- ﬁcient convnet design via differentiable neural architecture search,

B. Wu, X. Dai, P. Zhang, Y . Wang, F. Sun, Y . Wu, Y . Tian, P. Vajda, Y . Jia, and K. Keutzer, “Fbnet: Hardware-aware ef- ﬁcient convnet design via differentiable neural architecture search,” in Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition , 2019. 1, 2, 3, 4, 6, 7

work page 2019
[11]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efﬁ- cient convolutional neural networks for mobile vision appli- cations,” in arXiv preprint arXiv:1704.04861 , 2017. 1, 2, 5

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

Morphnet: Fast & simple resource- constrained structure learning of deep networks,

A. Gordon, E. Eban, O. Nachum, B. Chen, H. Wu, T.-J. Yang, and E. Choi, “Morphnet: Fast & simple resource- constrained structure learning of deep networks,” inProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1586–1595, 2018. 2, 3

work page 2018
[13]

ProxylessNAS: Direct neural architecture search on target task and hardware,

H. Cai, L. Zhu, and S. Han, “ProxylessNAS: Direct neural architecture search on target task and hardware,” inProceed- ings of the International Conference on Learning Represen- tations, 2019. 2, 3

work page 2019
[14]

Chamnet: Towards efﬁcient network design through platform-aware model adaptation,

X. Dai, P. Zhang, B. Wu, H. Yin, F. Sun, Y . Wang, M. Dukhan, Y . Hu, Y . Wu, Y . Jia, et al. , “Chamnet: Towards efﬁcient network design through platform-aware model adaptation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2, 3

work page 2019
[15]

Mobilenetv2: Inverted residuals and linear bottle- necks,

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottle- necks,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, 2018. 2, 3, 5, 7

work page 2018
[16]

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” inarXiv preprint arXiv:1602.07360, 2016. 2

work page internal anchor Pith review Pith/arXiv arXiv 2016
[17]

Efﬁcient neural architecture search via parameter sharing,

H. Pham, M. Y . Guan, B. Zoph, Q. V . Le, and J. Dean, “Efﬁcient neural architecture search via parameter sharing,” in Proceedings of the International Conference on Machine Learning, 2018. 3

work page 2018
[18]

Snas: stochastic neu- ral architecture search,

S. Xie, H. Zheng, C. Liu, and L. Lin, “Snas: stochastic neu- ral architecture search,” in Proceedings of the International Conference on Learning Representations, 2019. 3

work page 2019
[19]

MnasNet: Platform-Aware Neural Architecture Search for Mobile

M. Tan, B. Chen, R. Pang, V . Vasudevan, and Q. V . Le, “Mnasnet: Platform-aware neural architecture search for mobile,” in arXiv preprint arXiv:1807.11626, 2018. 3, 6

work page internal anchor Pith review Pith/arXiv arXiv 2018
[20]

Netadapt: Platform-aware neural network adaptation for mobile applications,

T.-J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. San- dler, V . Sze, and H. Adam, “Netadapt: Platform-aware neural network adaptation for mobile applications,” in Proceedings of the European Conference on Computer Vision (ECCV) , pp. 285–300, 2018. 3

work page 2018
[21]

Amc: Automl for model compression and acceleration on mobile devices,

Y . He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “Amc: Automl for model compression and acceleration on mobile devices,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–800, 2018. 3

work page 2018
[22]

The graph neural network model,

F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” in IEEE Transactions on Neural Networks, vol. 20, pp. 61–80, IEEE,

work page
[23]

Gated graph sequence neural networks,

Y . Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph sequence neural networks,” in Proceedings of the In- ternational Conference on Learning Representations , 2016. 3

work page 2016
[24]

Semi-supervised classiﬁca- tion with graph convolutional networks,

T. N. Kipf and M. Welling, “Semi-supervised classiﬁca- tion with graph convolutional networks,” in Proceedings of the International Conference on Learning Representations ,

work page
[25]

Graph hypernet- works for neural architecture search,

C. Zhang, M. Ren, and R. Urtasun, “Graph hypernet- works for neural architecture search,” in arXiv preprint arXiv:1810.05749, 2018. 3

work page arXiv 2018
[26]

Fast and accurate deep network learning by exponential linear units (elus),

D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus),” in Proceedings of the International Conference on Learning Representations, 2016. 4, 5

work page 2016
[27]

Gradient- based learning applied to document recognition,

Y . LeCun, L. Bottou, Y . Bengio, P. Haffner,et al., “Gradient- based learning applied to document recognition,” in Pro- ceedings of the IEEE, vol. 86, pp. 2278–2324, 1998. 4, 8

work page 1998
[28]

Regularized evolution for image classiﬁer architecture search,

E. Real, A. Aggarwal, Y . Huang, and Q. V . Le, “Regularized evolution for image classiﬁer architecture search,” in Pro- ceedings of the Association for the Advance of Artiﬁcial In- telligence Conference on Artiﬁcial Intelligence, 2019. 6

work page 2019
[29]

Learning multiple layers of features from tiny images,

A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” tech. rep., Citeseer, 2009. 6

work page 2009
[30]

On the importance of initialization and momentum in deep learning.,

I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton, “On the importance of initialization and momentum in deep learning.,” ICML (3), vol. 28, no. 1139-1147, p. 5, 2013. 6

work page 2013
[31]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009. 6

work page 2009
[32]

Rethinking the inception architecture for computer vision,

C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826, 2016. 6

work page 2016
[33]

Divide the gradient by a run- ning average of its recent magnitude. coursera neural netw,

T. Tieleman and G. Hinton, “Divide the gradient by a run- ning average of its recent magnitude. coursera neural netw,” Mach. Learn, 2012. 6

work page 2012
[34]

Densely connected convolutional networks,

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708, 2017. 6

work page 2017
[35]

Investigation of complex and hypercomplex receptive ﬁelds of visual cortex of the cat as spatial frequency ﬁlters,

V . Glezer, V . Ivanoff, and T. Tscherbach, “Investigation of complex and hypercomplex receptive ﬁelds of visual cortex of the cat as spatial frequency ﬁlters,” in Vision research, vol. 13, pp. 1875–IN6, Elsevier, 1973. 7

work page 1973
[36]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

H. Xiao, K. Rasul, and R. V ollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning al- gorithms,” in arXiv preprint arXiv:1708.07747, 2017. 8

work page internal anchor Pith review Pith/arXiv arXiv 2017
[37]

Reading digits in natural images with unsupervised fea- ture learning,

Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng, “Reading digits in natural images with unsupervised fea- ture learning,” in Proceesings of the NIPS workshop on deep learning and unsupervised feature learning, 2011. 8

work page 2011

[1] [1]

Mask r- cnn,

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r- cnn,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969, 2017. 1

work page 2017

[2] [2]

Deep neural networks for acoustic modeling in speech recognition,

G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V . Vanhoucke, P. Nguyen, B. Kings- bury, et al., “Deep neural networks for acoustic modeling in speech recognition,” in IEEE Signal processing magazine , vol. 29, 2012. 1

work page 2012

[3] [3]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in arXiv preprint arXiv:1810.04805 , 2018. 1

work page internal anchor Pith review Pith/arXiv arXiv 2018

[4] [4]

Phrase-based & neural unsupervised machine transla- tion,

G. Lample, M. Ott, A. Conneau, L. Denoyer, and M. Ran- zato, “Phrase-based & neural unsupervised machine transla- tion,” in Proceedings of the Conference on Empirical Meth- ods in Natural Language Processing, 2018. 1

work page 2018

[5] [5]

Neural architecture search with re- inforcement learning,

B. Zoph and Q. V . Le, “Neural architecture search with re- inforcement learning,” in Proceedings of the International Conference on Learning Representations, 2017. 1, 3

work page 2017

[6] [6]

Large-scale evolution of image classiﬁers,

E. Real, S. Moore, A. Selle, S. Saxena, Y . L. Suematsu, J. Tan, Q. V . Le, and A. Kurakin, “Large-scale evolution of image classiﬁers,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pp. 2902– 2911, JMLR. org, 2017. 1

work page 2017

[7] [7]

Efﬁcient multi- objective neural architecture search via lamarckian evolu- tion,

T. Elsken, J. H. Metzen, and F. Hutter, “Efﬁcient multi- objective neural architecture search via lamarckian evolu- tion,” in Proceedings of the International Conference on Learning Representations, 2019. 1

work page 2019

[8] [8]

Learning transferable architectures for scalable image recognition,

B. Zoph, V . Vasudevan, J. Shlens, and Q. V . Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710, 2018. 1, 2, 3, 6

work page 2018

[9] [9]

Darts: Differentiable ar- chitecture search,

H. Liu, K. Simonyan, and Y . Yang, “Darts: Differentiable ar- chitecture search,” in Proceedings of the International Con- ference on Learning Representations, 2019. 1, 3

work page 2019

[10] [10]

Fbnet: Hardware-aware ef- ﬁcient convnet design via differentiable neural architecture search,

B. Wu, X. Dai, P. Zhang, Y . Wang, F. Sun, Y . Wu, Y . Tian, P. Vajda, Y . Jia, and K. Keutzer, “Fbnet: Hardware-aware ef- ﬁcient convnet design via differentiable neural architecture search,” in Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition , 2019. 1, 2, 3, 4, 6, 7

work page 2019

[11] [11]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efﬁ- cient convolutional neural networks for mobile vision appli- cations,” in arXiv preprint arXiv:1704.04861 , 2017. 1, 2, 5

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

Morphnet: Fast & simple resource- constrained structure learning of deep networks,

A. Gordon, E. Eban, O. Nachum, B. Chen, H. Wu, T.-J. Yang, and E. Choi, “Morphnet: Fast & simple resource- constrained structure learning of deep networks,” inProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1586–1595, 2018. 2, 3

work page 2018

[13] [13]

ProxylessNAS: Direct neural architecture search on target task and hardware,

H. Cai, L. Zhu, and S. Han, “ProxylessNAS: Direct neural architecture search on target task and hardware,” inProceed- ings of the International Conference on Learning Represen- tations, 2019. 2, 3

work page 2019

[14] [14]

Chamnet: Towards efﬁcient network design through platform-aware model adaptation,

X. Dai, P. Zhang, B. Wu, H. Yin, F. Sun, Y . Wang, M. Dukhan, Y . Hu, Y . Wu, Y . Jia, et al. , “Chamnet: Towards efﬁcient network design through platform-aware model adaptation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2, 3

work page 2019

[15] [15]

Mobilenetv2: Inverted residuals and linear bottle- necks,

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottle- necks,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, 2018. 2, 3, 5, 7

work page 2018

[16] [16]

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” inarXiv preprint arXiv:1602.07360, 2016. 2

work page internal anchor Pith review Pith/arXiv arXiv 2016

[17] [17]

Efﬁcient neural architecture search via parameter sharing,

H. Pham, M. Y . Guan, B. Zoph, Q. V . Le, and J. Dean, “Efﬁcient neural architecture search via parameter sharing,” in Proceedings of the International Conference on Machine Learning, 2018. 3

work page 2018

[18] [18]

Snas: stochastic neu- ral architecture search,

S. Xie, H. Zheng, C. Liu, and L. Lin, “Snas: stochastic neu- ral architecture search,” in Proceedings of the International Conference on Learning Representations, 2019. 3

work page 2019

[19] [19]

MnasNet: Platform-Aware Neural Architecture Search for Mobile

M. Tan, B. Chen, R. Pang, V . Vasudevan, and Q. V . Le, “Mnasnet: Platform-aware neural architecture search for mobile,” in arXiv preprint arXiv:1807.11626, 2018. 3, 6

work page internal anchor Pith review Pith/arXiv arXiv 2018

[20] [20]

Netadapt: Platform-aware neural network adaptation for mobile applications,

T.-J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. San- dler, V . Sze, and H. Adam, “Netadapt: Platform-aware neural network adaptation for mobile applications,” in Proceedings of the European Conference on Computer Vision (ECCV) , pp. 285–300, 2018. 3

work page 2018

[21] [21]

Amc: Automl for model compression and acceleration on mobile devices,

Y . He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “Amc: Automl for model compression and acceleration on mobile devices,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–800, 2018. 3

work page 2018

[22] [22]

The graph neural network model,

F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” in IEEE Transactions on Neural Networks, vol. 20, pp. 61–80, IEEE,

work page

[23] [23]

Gated graph sequence neural networks,

Y . Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph sequence neural networks,” in Proceedings of the In- ternational Conference on Learning Representations , 2016. 3

work page 2016

[24] [24]

Semi-supervised classiﬁca- tion with graph convolutional networks,

T. N. Kipf and M. Welling, “Semi-supervised classiﬁca- tion with graph convolutional networks,” in Proceedings of the International Conference on Learning Representations ,

work page

[25] [25]

Graph hypernet- works for neural architecture search,

C. Zhang, M. Ren, and R. Urtasun, “Graph hypernet- works for neural architecture search,” in arXiv preprint arXiv:1810.05749, 2018. 3

work page arXiv 2018

[26] [26]

Fast and accurate deep network learning by exponential linear units (elus),

D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus),” in Proceedings of the International Conference on Learning Representations, 2016. 4, 5

work page 2016

[27] [27]

Gradient- based learning applied to document recognition,

Y . LeCun, L. Bottou, Y . Bengio, P. Haffner,et al., “Gradient- based learning applied to document recognition,” in Pro- ceedings of the IEEE, vol. 86, pp. 2278–2324, 1998. 4, 8

work page 1998

[28] [28]

Regularized evolution for image classiﬁer architecture search,

E. Real, A. Aggarwal, Y . Huang, and Q. V . Le, “Regularized evolution for image classiﬁer architecture search,” in Pro- ceedings of the Association for the Advance of Artiﬁcial In- telligence Conference on Artiﬁcial Intelligence, 2019. 6

work page 2019

[29] [29]

Learning multiple layers of features from tiny images,

A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” tech. rep., Citeseer, 2009. 6

work page 2009

[30] [30]

On the importance of initialization and momentum in deep learning.,

I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton, “On the importance of initialization and momentum in deep learning.,” ICML (3), vol. 28, no. 1139-1147, p. 5, 2013. 6

work page 2013

[31] [31]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009. 6

work page 2009

[32] [32]

Rethinking the inception architecture for computer vision,

C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826, 2016. 6

work page 2016

[33] [33]

Divide the gradient by a run- ning average of its recent magnitude. coursera neural netw,

T. Tieleman and G. Hinton, “Divide the gradient by a run- ning average of its recent magnitude. coursera neural netw,” Mach. Learn, 2012. 6

work page 2012

[34] [34]

Densely connected convolutional networks,

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708, 2017. 6

work page 2017

[35] [35]

Investigation of complex and hypercomplex receptive ﬁelds of visual cortex of the cat as spatial frequency ﬁlters,

V . Glezer, V . Ivanoff, and T. Tscherbach, “Investigation of complex and hypercomplex receptive ﬁelds of visual cortex of the cat as spatial frequency ﬁlters,” in Vision research, vol. 13, pp. 1875–IN6, Elsevier, 1973. 7

work page 1973

[36] [36]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

H. Xiao, K. Rasul, and R. V ollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning al- gorithms,” in arXiv preprint arXiv:1708.07747, 2017. 8

work page internal anchor Pith review Pith/arXiv arXiv 2017

[37] [37]

Reading digits in natural images with unsupervised fea- ture learning,

Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng, “Reading digits in natural images with unsupervised fea- ture learning,” in Proceesings of the NIPS workshop on deep learning and unsupervised feature learning, 2011. 8

work page 2011