EPNAS: Efficient Progressive Neural Architecture Search

Feng Yan; Greg Diamos; Haonan Yu; Peng Wang; Sercan Arik; Syed Zawad; Yanqi Zhou

arxiv: 1907.04648 · v1 · pith:RXN7AD5Knew · submitted 2019-07-07 · 💻 cs.LG

EPNAS: Efficient Progressive Neural Architecture Search

Yanqi Zhou , Peng Wang , Sercan Arik , Haonan Yu , Syed Zawad , Feng Yan , Greg Diamos This is my paper

Pith reviewed 2026-05-25 01:13 UTC · model grok-4.3

classification 💻 cs.LG

keywords neural architecture searchprogressive searchREINFORCEperformance predictionimage classificationCIFAR10ImageNetresource constraints

0 comments

The pith

EPNAS uses a progressive search policy with REINFORCE performance prediction to find high-accuracy networks faster than prior NAS methods on image tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EPNAS as a neural architecture search approach that explores large spaces via progressive policies and REINFORCE-based prediction of candidate performance. It supports parallel evaluation of networks on GPU or TPU clusters and extends to multiple constraints such as model size and compute cost. Experiments on CIFAR10 and ImageNet show it delivers both quicker searches and higher final accuracy than MobileNetV2, ENAS, and PNAS. A reader would care because the method makes architecture search more practical for deployment across varied hardware without exhaustive training of every candidate.

Core claim

EPNAS efficiently handles large search spaces through a novel progressive search policy with performance prediction based on REINFORCE. It searches target networks in parallel, which is more scalable on parallel systems such as GPU/TPU clusters. More importantly, EPNAS can be generalized to architecture search with multiple resource constraints, e.g., model size, compute complexity or intensity. On both CIFAR10 and ImageNet, EPNAS is superior with respect to architecture searching speed and recognition accuracy.

What carries the argument

Progressive search policy with REINFORCE-based performance prediction that ranks architectures without full training of each candidate.

If this is right

EPNAS applies directly to searches under simultaneous constraints such as model size and compute intensity.
Parallel network evaluation scales the method to GPU and TPU clusters without serial bottlenecks.
The same policy yields architectures that exceed MobileNetV2 accuracy on both CIFAR10 and ImageNet.
Resource-aware search becomes feasible for mobile and cloud platforms without separate runs per constraint.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the ranking prediction generalizes, EPNAS could shorten development cycles for custom models on new datasets.
The parallel design suggests straightforward extension to distributed training setups beyond single clusters.
Constraint handling may allow direct optimization for latency targets on specific hardware without post-search pruning.

Load-bearing premise

The REINFORCE-based performance prediction accurately ranks candidate architectures in large search spaces without requiring full training of each candidate.

What would settle it

A head-to-head run on ImageNet where EPNAS produces lower top-1 accuracy or longer total search time than ENAS or PNAS under identical constraints would falsify the superiority claim.

Figures

Figures reproduced from arXiv: 1907.04648 by Feng Yan, Greg Diamos, Haonan Yu, Peng Wang, Sercan Arik, Syed Zawad, Yanqi Zhou.

**Figure 1.** Figure 1: REINFORCE step for policy gradient. N is the number of parallel policy networks to adapt a baseline architecture at episode of i. optimization and proposed architecture transforming policy networks. As stated in Sec. 1, rather than rebuilding the entire network from scratch, we adopt a progressive strategy with REINFORCE [37] for more efficient architecture search so that architectures searched in previous… view at source ↗

**Figure 2.** Figure 2: Policy Network of EPNAS. It is an LSTM-based network, which first generates [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Left: A layer-by-layer search insert operation example. A conv operation is inserted [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Accuracy VS. total search time for CIFAR-10. Note the accuracy reported here is [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

In this paper, we propose Efficient Progressive Neural Architecture Search (EPNAS), a neural architecture search (NAS) that efficiently handles large search space through a novel progressive search policy with performance prediction based on REINFORCE~\cite{Williams.1992.PG}. EPNAS is designed to search target networks in parallel, which is more scalable on parallel systems such as GPU/TPU clusters. More importantly, EPNAS can be generalized to architecture search with multiple resource constraints, \eg, model size, compute complexity or intensity, which is crucial for deployment in widespread platforms such as mobile and cloud. We compare EPNAS against other state-of-the-art (SoTA) network architectures (\eg, MobileNetV2~\cite{mobilenetv2}) and efficient NAS algorithms (\eg, ENAS~\cite{pham2018efficient}, and PNAS~\cite{Liu2017b}) on image recognition tasks using CIFAR10 and ImageNet. On both datasets, EPNAS is superior \wrt architecture searching speed and recognition accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EPNAS adds parallel progressive search and multi-constraint handling to prior NAS work via a REINFORCE predictor, but the abstract supplies no evidence that the predictor produces reliable rankings.

read the letter

EPNAS extends PNAS-style progressive search by running it in parallel across clusters and adding support for multiple constraints like model size and compute, all guided by a REINFORCE performance predictor. The abstract says this leads to faster architecture search and higher recognition accuracy than ENAS, PNAS, and MobileNetV2 on CIFAR-10 and ImageNet. The parallel and multi-constraint features are the concrete additions here. They address scalability and deployment practicality in ways the cited prior work does not emphasize. The paper handles the literature citations cleanly and focuses on a real need for constrained NAS. The soft spot is the unvalidated predictor. As the stress-test points out, the efficiency and accuracy rest on the assumption that REINFORCE rankings correlate with true post-training performance, but the abstract provides no rank correlation metrics, no ablation studies, and no experimental protocol details to check this. That makes the superiority claims difficult to evaluate from what's given. This work would interest people in the efficient ML community who need NAS methods that respect hardware limits. A reader could extract the multi-constraint idea for their own setups, but only after seeing stronger evidence in the full paper. It is coherent enough and grounded in existing methods to merit peer review. I recommend sending it to referees, with the main request being added validation for the performance prediction step.

Referee Report

1 major / 0 minor

Summary. The paper proposes EPNAS, a neural architecture search algorithm that employs a progressive search policy combined with a REINFORCE-based performance predictor to efficiently explore large search spaces. It emphasizes parallel search on GPU/TPU clusters and generalization to multiple resource constraints (model size, compute). Experiments on CIFAR-10 and ImageNet are claimed to show superiority over MobileNetV2, ENAS, and PNAS in both search speed and final recognition accuracy.

Significance. If the REINFORCE predictor's rankings prove reliable and the efficiency/accuracy claims are substantiated with proper controls, the work would offer a practical advance in scalable, constraint-aware NAS suitable for deployment on varied hardware platforms.

major comments (1)

[Abstract] Abstract: the central claims of superiority in search speed and accuracy rest on the unvalidated assumption that the REINFORCE performance predictor produces rankings that correlate with true post-training accuracies. No rank-correlation statistics, held-out validation of the predictor, or ablation against random ranking are referenced, making it impossible to assess whether the reported gains are load-bearing or artifacts of the search procedure.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and the specific comment on validation of the performance predictor. We address this point below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claims of superiority in search speed and accuracy rest on the unvalidated assumption that the REINFORCE performance predictor produces rankings that correlate with true post-training accuracies. No rank-correlation statistics, held-out validation of the predictor, or ablation against random ranking are referenced, making it impossible to assess whether the reported gains are load-bearing or artifacts of the search procedure.

Authors: We agree that the manuscript would be strengthened by explicit validation of the REINFORCE predictor. In the revised version we will add (1) Spearman's rank correlation between predictor scores and final accuracies on a held-out set of 200 architectures, (2) a description of how the predictor was trained and validated during search, and (3) an ablation replacing the learned predictor with random ranking while keeping all other components fixed. These additions will allow readers to judge whether the reported speed and accuracy gains depend on the quality of the rankings. revision: yes

Circularity Check

0 steps flagged

No circularity: method uses external RL baseline without self-referential reduction

full rationale

The abstract and description present EPNAS as employing a REINFORCE-based predictor within a progressive search policy, with claims of superiority on CIFAR-10 and ImageNet. No equations, fitting procedures, or derivation steps are supplied that would allow a reduction (e.g., a performance prediction shown to be identical to its training targets by construction, or a uniqueness result imported solely via self-citation). The REINFORCE reference is to an external 1992 paper. Absent any load-bearing self-citation chain or ansatz smuggled through prior author work, the derivation chain cannot be shown to collapse to its inputs. This is the expected outcome for an empirical NAS description lacking internal mathematical closure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no equations or experimental protocol, so no specific free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5730 in / 1010 out tokens · 31807 ms · 2026-05-25T01:13:35.245861+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 27 internal anchors

[1]

https://ai.googleblog.com/2017/08/ launching-speech-commands-dataset.html

Launching the speech commands dataset. https://ai.googleblog.com/2017/08/ launching-speech-commands-dataset.html

work page 2017
[2]

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

D. Amodei et al. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. arXiv:1512.02595, December 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[3]

An evolutionary algorithm that constructs recurrent neural networks

Peter J Angeline et al. An evolutionary algorithm that constructs recurrent neural networks. IEEE transactions on Neural Networks, 5(1):54–65, 1994

work page 1994
[4]

Designing Neural Network Architectures using Reinforcement Learning

Bowen Baker et al. Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[5]

Accelerating Neural Architecture Search using Performance Prediction

Bowen Baker et al. Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823, 2017. Y ANQI ZHOU ET.AL.: EPNAS 11

work page internal anchor Pith review Pith/arXiv arXiv 2017
[6]

Under- standing and simplifying one-shot architecture search

Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Under- standing and simplifying one-shot architecture search. In International Conference on Machine Learning, pages 549–558, 2018

work page 2018
[7]

Random search for hyper-parameter optimization

James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. J. Mach. Learn. Res. , 13:281–305, February 2012. ISSN 1532-4435. URL http://dl.acm.org/ citation.cfm?id=2188385.2188395

work page arXiv 2012
[8]

Handbook of markov chain monte carlo

Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng. Handbook of markov chain monte carlo. CRC press, 2011

work page 2011
[9]

Efﬁcient architecture search by network transformation

Han Cai et al. Efﬁcient architecture search by network transformation. AAAI, 2018

work page 2018
[10]

F. Chollet. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv: 1610.02357, October 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[11]

Imagenet: A large- scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

work page 2009
[12]

Dpp-net: Device- aware progressive search for pareto-optimal neural architectures

Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, and Min Sun. Dpp-net: Device- aware progressive search for pareto-optimal neural architectures. ECCV, 2018

work page 2018
[13]

Neural Architecture Search: A Survey

Thomas Elsken et al. Neural architecture search: A survey. arXiv preprint arXiv:1808.05377 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[14]

Morphnet: Fast & simple resource-constrained structure learning of deep networks

Ariel Gordon, Elad Eban, Oﬁr Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, and Edward Choi. Morphnet: Fast & simple resource-constrained structure learning of deep networks. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 1586–1595, 2018

work page 2018
[15]

Learning both weights and connections for efﬁcient neural networks

Song Han et al. Learning both weights and connections for efﬁcient neural networks. NIPS, pages 1135–1143, 2015

work page 2015
[16]

Channel pruning for accelerating very deep neural networks

Yihui He, Xiangyu Zhang, and Jian Sun. Channel pruning for accelerating very deep neural networks. In The IEEE International Conference on Computer Vision (ICCV) , Oct 2017

work page 2017
[17]

Amc: Automl for model compression and acceleration on mobile devices

Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), pages 784–800, 2018

work page 2018
[18]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G. Howard et al. Mobilenets: Efﬁcient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017. URL http://arxiv.org/abs/1704.04861

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

Densely connected convolutional networks

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In CVPR, volume 1, page 3, 2017

work page 2017
[20]

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

Itay Hubara et al. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv:1609.07061, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[21]

Neural Architecture Search with Bayesian Optimisation and Optimal Transport

Kirthevasan Kandasamy et al. Neural architecture search with bayesian optimisation and optimal transport. CoRR, abs/1802.07191, 2018. URL http://arxiv.org/abs/1802.07191

work page internal anchor Pith review Pith/arXiv arXiv 2018
[22]

Progressive Growing of GANs for Improved Quality, Stability, and Variation

T. Karras et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv: 1710.10196, October 2017. 12 Y ANQI ZHOU ET.AL.: EPNAS

work page internal anchor Pith review Pith/arXiv arXiv 2017
[23]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009

work page 2009
[24]

The cifar-10 dataset

Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The cifar-10 dataset. online: http://www. cs. toronto. edu/kriz/cifar . html, 55, 2014

work page 2014
[25]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014

work page 2014
[26]

Sparse convolutional neural networks

Baoyuan Liu et al. Sparse convolutional neural networks. In CVPR, pages 806–814, June 2015

work page 2015
[27]

Progressive neural architecture search

Chenxi Liu et al. Progressive neural architecture search. In ECCV, pages 19–34, 2018

work page 2018
[28]

DARTS: Differentiable Architecture Search

Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search.arXiv preprint arXiv:1806.09055, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

Hierarchical representations for efﬁcient architecture search

Hanxiao Liu et al. Hierarchical representations for efﬁcient architecture search. ICLR, 2018

work page 2018
[30]

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov and Frank Hutter. SGDR: stochastic gradient descent with restarts. CoRR, abs/1608.03983, 2016. URL http://arxiv.org/abs/1608.03983

work page internal anchor Pith review Pith/arXiv arXiv 2016
[31]

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. Shufﬂenet v2: Practical guidelines for efﬁcient cnn architecture design. arXiv preprint arXiv:1807.11164, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

DeepArchitect: Automatically Designing and Training Deep Architectures

Renato Negrinho and Geoff Gordon. DeepArchitect: Automatically Designing and Training Deep Architectures. 2017. URL http://arxiv.org/abs/1704.08792

work page internal anchor Pith review Pith/arXiv arXiv 2017
[34]

Efficient Neural Architecture Search via Parameter Sharing

Hieu Pham et al. Efﬁcient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[35]

Large-Scale Evolution of Image Classifiers

Esteban Real et al. Large-scale evolution of image classiﬁers. arXiv preprint arXiv:1703.01041, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[36]

Regularized Evolution for Image Classifier Architecture Search

Esteban Real et al. Regularized evolution for image classiﬁer architecture search. CoRR, abs/1802.01548, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[37]

Simple statistical gradient-following algorithms for connectionist reinforcement learning

Williams R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. 1992

work page 1992
[38]

Le Samuel L

Chris Ying Quoc V . Le Samuel L. Smith, Pieter-Jan Kindermans. Don’t decay the learning rate, increase the batch size. ICLR, 2018

work page 2018
[39]

MobileNetV2: Inverted Residuals and Linear Bottlenecks

M. Sandler et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv: 1801.04381, January 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[40]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556, September 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[41]

Structured Transforms for Small-Footprint Deep Learning

V . Sindhwani et al. Structured Transforms for Small-Footprint Deep Learning.arXiv:1510.01722, October 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[42]

Evolving neural networks through augmenting topologies

Kenneth O Stanley and Risto Miikkulainen. Evolving neural networks through augmenting topologies. Evolutionary computation, 10(2):99–127, 2002. Y ANQI ZHOU ET.AL.: EPNAS 13

work page 2002
[43]

MnasNet: Platform-Aware Neural Architecture Search for Mobile

Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, and Quoc V Le. Mnasnet: Platform- aware neural architecture search for mobile. arXiv preprint arXiv:1807.11626, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[44]

Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution

Frank Hutter Thomas Elsken, Jan Hendrik Metzen. Multi-objective architecture search for cnns. CoRR, 2018. URL https://arxiv.org/abs/1804.09081

work page internal anchor Pith review Pith/arXiv arXiv 2018
[45]

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

A. van den Oord et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis.arXiv:1711.10433, November 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[46]

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Y . Wu, Schuster, et al. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv:1609.08144, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[47]

Aggregated residual transformations for deep neural networks

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 5987–5995. IEEE, 2017

work page 2017
[48]

Exploring Randomly Wired Neural Networks for Image Recognition

Saining Xie, Alexander Kirillov, Ross Girshick, and Kaiming He. Exploring randomly wired neural networks for image recognition. arXiv preprint arXiv:1904.01569, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[49]

Snas: Stochastic neural architecture search

Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. Snas: Stochastic neural architecture search. ICLR, 2019

work page 2019
[50]

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

K. Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. arXiv: 1502.03044, February 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[51]

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufﬂenet: An extremely efﬁcient convolutional neural network for mobile devices. CoRR, abs/1707.01083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[52]

Barret Zoph and Quoc V . Le. Neural Architecture Search with Reinforcement Learning. 2016. ISSN 1938-7228. doi: 10.1016/j.knosys.2015.01.010. URL http://arxiv.org/abs/ 1611.01578

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/j.knosys.2015.01.010 2016
[53]

Learning transferable architec- tures for scalable image recognition

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architec- tures for scalable image recognition. CVPR, 2018

work page 2018

[1] [1]

https://ai.googleblog.com/2017/08/ launching-speech-commands-dataset.html

Launching the speech commands dataset. https://ai.googleblog.com/2017/08/ launching-speech-commands-dataset.html

work page 2017

[2] [2]

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

D. Amodei et al. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. arXiv:1512.02595, December 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[3] [3]

An evolutionary algorithm that constructs recurrent neural networks

Peter J Angeline et al. An evolutionary algorithm that constructs recurrent neural networks. IEEE transactions on Neural Networks, 5(1):54–65, 1994

work page 1994

[4] [4]

Designing Neural Network Architectures using Reinforcement Learning

Bowen Baker et al. Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[5] [5]

Accelerating Neural Architecture Search using Performance Prediction

Bowen Baker et al. Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823, 2017. Y ANQI ZHOU ET.AL.: EPNAS 11

work page internal anchor Pith review Pith/arXiv arXiv 2017

[6] [6]

Under- standing and simplifying one-shot architecture search

Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Under- standing and simplifying one-shot architecture search. In International Conference on Machine Learning, pages 549–558, 2018

work page 2018

[7] [7]

Random search for hyper-parameter optimization

James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. J. Mach. Learn. Res. , 13:281–305, February 2012. ISSN 1532-4435. URL http://dl.acm.org/ citation.cfm?id=2188385.2188395

work page arXiv 2012

[8] [8]

Handbook of markov chain monte carlo

Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng. Handbook of markov chain monte carlo. CRC press, 2011

work page 2011

[9] [9]

Efﬁcient architecture search by network transformation

Han Cai et al. Efﬁcient architecture search by network transformation. AAAI, 2018

work page 2018

[10] [10]

F. Chollet. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv: 1610.02357, October 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[11] [11]

Imagenet: A large- scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

work page 2009

[12] [12]

Dpp-net: Device- aware progressive search for pareto-optimal neural architectures

Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, and Min Sun. Dpp-net: Device- aware progressive search for pareto-optimal neural architectures. ECCV, 2018

work page 2018

[13] [13]

Neural Architecture Search: A Survey

Thomas Elsken et al. Neural architecture search: A survey. arXiv preprint arXiv:1808.05377 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[14] [14]

Morphnet: Fast & simple resource-constrained structure learning of deep networks

Ariel Gordon, Elad Eban, Oﬁr Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, and Edward Choi. Morphnet: Fast & simple resource-constrained structure learning of deep networks. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 1586–1595, 2018

work page 2018

[15] [15]

Learning both weights and connections for efﬁcient neural networks

Song Han et al. Learning both weights and connections for efﬁcient neural networks. NIPS, pages 1135–1143, 2015

work page 2015

[16] [16]

Channel pruning for accelerating very deep neural networks

Yihui He, Xiangyu Zhang, and Jian Sun. Channel pruning for accelerating very deep neural networks. In The IEEE International Conference on Computer Vision (ICCV) , Oct 2017

work page 2017

[17] [17]

Amc: Automl for model compression and acceleration on mobile devices

Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), pages 784–800, 2018

work page 2018

[18] [18]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G. Howard et al. Mobilenets: Efﬁcient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017. URL http://arxiv.org/abs/1704.04861

work page internal anchor Pith review Pith/arXiv arXiv 2017

[19] [19]

Densely connected convolutional networks

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In CVPR, volume 1, page 3, 2017

work page 2017

[20] [20]

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

Itay Hubara et al. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv:1609.07061, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[21] [21]

Neural Architecture Search with Bayesian Optimisation and Optimal Transport

Kirthevasan Kandasamy et al. Neural architecture search with bayesian optimisation and optimal transport. CoRR, abs/1802.07191, 2018. URL http://arxiv.org/abs/1802.07191

work page internal anchor Pith review Pith/arXiv arXiv 2018

[22] [22]

Progressive Growing of GANs for Improved Quality, Stability, and Variation

T. Karras et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv: 1710.10196, October 2017. 12 Y ANQI ZHOU ET.AL.: EPNAS

work page internal anchor Pith review Pith/arXiv arXiv 2017

[23] [23]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009

work page 2009

[24] [24]

The cifar-10 dataset

Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The cifar-10 dataset. online: http://www. cs. toronto. edu/kriz/cifar . html, 55, 2014

work page 2014

[25] [25]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014

work page 2014

[26] [26]

Sparse convolutional neural networks

Baoyuan Liu et al. Sparse convolutional neural networks. In CVPR, pages 806–814, June 2015

work page 2015

[27] [27]

Progressive neural architecture search

Chenxi Liu et al. Progressive neural architecture search. In ECCV, pages 19–34, 2018

work page 2018

[28] [28]

DARTS: Differentiable Architecture Search

Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search.arXiv preprint arXiv:1806.09055, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[29] [29]

Hierarchical representations for efﬁcient architecture search

Hanxiao Liu et al. Hierarchical representations for efﬁcient architecture search. ICLR, 2018

work page 2018

[30] [30]

SGDR: Stochastic Gradient Descent with Warm Restarts

Ilya Loshchilov and Frank Hutter. SGDR: stochastic gradient descent with restarts. CoRR, abs/1608.03983, 2016. URL http://arxiv.org/abs/1608.03983

work page internal anchor Pith review Pith/arXiv arXiv 2016

[31] [31]

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. Shufﬂenet v2: Practical guidelines for efﬁcient cnn architecture design. arXiv preprint arXiv:1807.11164, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[32] [32]

DeepArchitect: Automatically Designing and Training Deep Architectures

Renato Negrinho and Geoff Gordon. DeepArchitect: Automatically Designing and Training Deep Architectures. 2017. URL http://arxiv.org/abs/1704.08792

work page internal anchor Pith review Pith/arXiv arXiv 2017

[33] [34]

Efficient Neural Architecture Search via Parameter Sharing

Hieu Pham et al. Efﬁcient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[34] [35]

Large-Scale Evolution of Image Classifiers

Esteban Real et al. Large-scale evolution of image classiﬁers. arXiv preprint arXiv:1703.01041, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[35] [36]

Regularized Evolution for Image Classifier Architecture Search

Esteban Real et al. Regularized evolution for image classiﬁer architecture search. CoRR, abs/1802.01548, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[36] [37]

Simple statistical gradient-following algorithms for connectionist reinforcement learning

Williams R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. 1992

work page 1992

[37] [38]

Le Samuel L

Chris Ying Quoc V . Le Samuel L. Smith, Pieter-Jan Kindermans. Don’t decay the learning rate, increase the batch size. ICLR, 2018

work page 2018

[38] [39]

MobileNetV2: Inverted Residuals and Linear Bottlenecks

M. Sandler et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv: 1801.04381, January 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[39] [40]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556, September 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[40] [41]

Structured Transforms for Small-Footprint Deep Learning

V . Sindhwani et al. Structured Transforms for Small-Footprint Deep Learning.arXiv:1510.01722, October 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[41] [42]

Evolving neural networks through augmenting topologies

Kenneth O Stanley and Risto Miikkulainen. Evolving neural networks through augmenting topologies. Evolutionary computation, 10(2):99–127, 2002. Y ANQI ZHOU ET.AL.: EPNAS 13

work page 2002

[42] [43]

MnasNet: Platform-Aware Neural Architecture Search for Mobile

Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, and Quoc V Le. Mnasnet: Platform- aware neural architecture search for mobile. arXiv preprint arXiv:1807.11626, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[43] [44]

Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution

Frank Hutter Thomas Elsken, Jan Hendrik Metzen. Multi-objective architecture search for cnns. CoRR, 2018. URL https://arxiv.org/abs/1804.09081

work page internal anchor Pith review Pith/arXiv arXiv 2018

[44] [45]

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

A. van den Oord et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis.arXiv:1711.10433, November 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[45] [46]

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Y . Wu, Schuster, et al. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv:1609.08144, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[46] [47]

Aggregated residual transformations for deep neural networks

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 5987–5995. IEEE, 2017

work page 2017

[47] [48]

Exploring Randomly Wired Neural Networks for Image Recognition

Saining Xie, Alexander Kirillov, Ross Girshick, and Kaiming He. Exploring randomly wired neural networks for image recognition. arXiv preprint arXiv:1904.01569, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[48] [49]

Snas: Stochastic neural architecture search

Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. Snas: Stochastic neural architecture search. ICLR, 2019

work page 2019

[49] [50]

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

K. Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. arXiv: 1502.03044, February 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[50] [51]

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufﬂenet: An extremely efﬁcient convolutional neural network for mobile devices. CoRR, abs/1707.01083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[51] [52]

Barret Zoph and Quoc V . Le. Neural Architecture Search with Reinforcement Learning. 2016. ISSN 1938-7228. doi: 10.1016/j.knosys.2015.01.010. URL http://arxiv.org/abs/ 1611.01578

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/j.knosys.2015.01.010 2016

[52] [53]

Learning transferable architec- tures for scalable image recognition

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architec- tures for scalable image recognition. CVPR, 2018

work page 2018