pith. sign in

arxiv: 1907.04648 · v1 · pith:RXN7AD5Knew · submitted 2019-07-07 · 💻 cs.LG

EPNAS: Efficient Progressive Neural Architecture Search

Pith reviewed 2026-05-25 01:13 UTC · model grok-4.3

classification 💻 cs.LG
keywords neural architecture searchprogressive searchREINFORCEperformance predictionimage classificationCIFAR10ImageNetresource constraints
0
0 comments X

The pith

EPNAS uses a progressive search policy with REINFORCE performance prediction to find high-accuracy networks faster than prior NAS methods on image tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EPNAS as a neural architecture search approach that explores large spaces via progressive policies and REINFORCE-based prediction of candidate performance. It supports parallel evaluation of networks on GPU or TPU clusters and extends to multiple constraints such as model size and compute cost. Experiments on CIFAR10 and ImageNet show it delivers both quicker searches and higher final accuracy than MobileNetV2, ENAS, and PNAS. A reader would care because the method makes architecture search more practical for deployment across varied hardware without exhaustive training of every candidate.

Core claim

EPNAS efficiently handles large search spaces through a novel progressive search policy with performance prediction based on REINFORCE. It searches target networks in parallel, which is more scalable on parallel systems such as GPU/TPU clusters. More importantly, EPNAS can be generalized to architecture search with multiple resource constraints, e.g., model size, compute complexity or intensity. On both CIFAR10 and ImageNet, EPNAS is superior with respect to architecture searching speed and recognition accuracy.

What carries the argument

Progressive search policy with REINFORCE-based performance prediction that ranks architectures without full training of each candidate.

If this is right

  • EPNAS applies directly to searches under simultaneous constraints such as model size and compute intensity.
  • Parallel network evaluation scales the method to GPU and TPU clusters without serial bottlenecks.
  • The same policy yields architectures that exceed MobileNetV2 accuracy on both CIFAR10 and ImageNet.
  • Resource-aware search becomes feasible for mobile and cloud platforms without separate runs per constraint.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the ranking prediction generalizes, EPNAS could shorten development cycles for custom models on new datasets.
  • The parallel design suggests straightforward extension to distributed training setups beyond single clusters.
  • Constraint handling may allow direct optimization for latency targets on specific hardware without post-search pruning.

Load-bearing premise

The REINFORCE-based performance prediction accurately ranks candidate architectures in large search spaces without requiring full training of each candidate.

What would settle it

A head-to-head run on ImageNet where EPNAS produces lower top-1 accuracy or longer total search time than ENAS or PNAS under identical constraints would falsify the superiority claim.

Figures

Figures reproduced from arXiv: 1907.04648 by Feng Yan, Greg Diamos, Haonan Yu, Peng Wang, Sercan Arik, Syed Zawad, Yanqi Zhou.

Figure 1
Figure 1. Figure 1: REINFORCE step for policy gradient. N is the number of parallel policy networks to adapt a baseline architecture at episode of i. optimization and proposed architecture transforming policy networks. As stated in Sec. 1, rather than rebuilding the entire network from scratch, we adopt a progressive strategy with REINFORCE [37] for more efficient architecture search so that architectures searched in previous… view at source ↗
Figure 2
Figure 2. Figure 2: Policy Network of EPNAS. It is an LSTM-based network, which first generates [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Left: A layer-by-layer search insert operation example. A conv operation is inserted [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Accuracy VS. total search time for CIFAR-10. Note the accuracy reported here is [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

In this paper, we propose Efficient Progressive Neural Architecture Search (EPNAS), a neural architecture search (NAS) that efficiently handles large search space through a novel progressive search policy with performance prediction based on REINFORCE~\cite{Williams.1992.PG}. EPNAS is designed to search target networks in parallel, which is more scalable on parallel systems such as GPU/TPU clusters. More importantly, EPNAS can be generalized to architecture search with multiple resource constraints, \eg, model size, compute complexity or intensity, which is crucial for deployment in widespread platforms such as mobile and cloud. We compare EPNAS against other state-of-the-art (SoTA) network architectures (\eg, MobileNetV2~\cite{mobilenetv2}) and efficient NAS algorithms (\eg, ENAS~\cite{pham2018efficient}, and PNAS~\cite{Liu2017b}) on image recognition tasks using CIFAR10 and ImageNet. On both datasets, EPNAS is superior \wrt architecture searching speed and recognition accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes EPNAS, a neural architecture search algorithm that employs a progressive search policy combined with a REINFORCE-based performance predictor to efficiently explore large search spaces. It emphasizes parallel search on GPU/TPU clusters and generalization to multiple resource constraints (model size, compute). Experiments on CIFAR-10 and ImageNet are claimed to show superiority over MobileNetV2, ENAS, and PNAS in both search speed and final recognition accuracy.

Significance. If the REINFORCE predictor's rankings prove reliable and the efficiency/accuracy claims are substantiated with proper controls, the work would offer a practical advance in scalable, constraint-aware NAS suitable for deployment on varied hardware platforms.

major comments (1)
  1. [Abstract] Abstract: the central claims of superiority in search speed and accuracy rest on the unvalidated assumption that the REINFORCE performance predictor produces rankings that correlate with true post-training accuracies. No rank-correlation statistics, held-out validation of the predictor, or ablation against random ranking are referenced, making it impossible to assess whether the reported gains are load-bearing or artifacts of the search procedure.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and the specific comment on validation of the performance predictor. We address this point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claims of superiority in search speed and accuracy rest on the unvalidated assumption that the REINFORCE performance predictor produces rankings that correlate with true post-training accuracies. No rank-correlation statistics, held-out validation of the predictor, or ablation against random ranking are referenced, making it impossible to assess whether the reported gains are load-bearing or artifacts of the search procedure.

    Authors: We agree that the manuscript would be strengthened by explicit validation of the REINFORCE predictor. In the revised version we will add (1) Spearman's rank correlation between predictor scores and final accuracies on a held-out set of 200 architectures, (2) a description of how the predictor was trained and validated during search, and (3) an ablation replacing the learned predictor with random ranking while keeping all other components fixed. These additions will allow readers to judge whether the reported speed and accuracy gains depend on the quality of the rankings. revision: yes

Circularity Check

0 steps flagged

No circularity: method uses external RL baseline without self-referential reduction

full rationale

The abstract and description present EPNAS as employing a REINFORCE-based predictor within a progressive search policy, with claims of superiority on CIFAR-10 and ImageNet. No equations, fitting procedures, or derivation steps are supplied that would allow a reduction (e.g., a performance prediction shown to be identical to its training targets by construction, or a uniqueness result imported solely via self-citation). The REINFORCE reference is to an external 1992 paper. Absent any load-bearing self-citation chain or ansatz smuggled through prior author work, the derivation chain cannot be shown to collapse to its inputs. This is the expected outcome for an empirical NAS description lacking internal mathematical closure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no equations or experimental protocol, so no specific free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5730 in / 1010 out tokens · 31807 ms · 2026-05-25T01:13:35.245861+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 27 internal anchors

  1. [1]

    https://ai.googleblog.com/2017/08/ launching-speech-commands-dataset.html

    Launching the speech commands dataset. https://ai.googleblog.com/2017/08/ launching-speech-commands-dataset.html

  2. [2]

    Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

    D. Amodei et al. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. arXiv:1512.02595, December 2015

  3. [3]

    An evolutionary algorithm that constructs recurrent neural networks

    Peter J Angeline et al. An evolutionary algorithm that constructs recurrent neural networks. IEEE transactions on Neural Networks, 5(1):54–65, 1994

  4. [4]

    Designing Neural Network Architectures using Reinforcement Learning

    Bowen Baker et al. Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167, 2016

  5. [5]

    Accelerating Neural Architecture Search using Performance Prediction

    Bowen Baker et al. Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823, 2017. Y ANQI ZHOU ET.AL.: EPNAS 11

  6. [6]

    Under- standing and simplifying one-shot architecture search

    Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Under- standing and simplifying one-shot architecture search. In International Conference on Machine Learning, pages 549–558, 2018

  7. [7]

    Random search for hyper-parameter optimization

    James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. J. Mach. Learn. Res. , 13:281–305, February 2012. ISSN 1532-4435. URL http://dl.acm.org/ citation.cfm?id=2188385.2188395

  8. [8]

    Handbook of markov chain monte carlo

    Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng. Handbook of markov chain monte carlo. CRC press, 2011

  9. [9]

    Efficient architecture search by network transformation

    Han Cai et al. Efficient architecture search by network transformation. AAAI, 2018

  10. [10]

    F. Chollet. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv: 1610.02357, October 2016

  11. [11]

    Imagenet: A large- scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

  12. [12]

    Dpp-net: Device- aware progressive search for pareto-optimal neural architectures

    Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, and Min Sun. Dpp-net: Device- aware progressive search for pareto-optimal neural architectures. ECCV, 2018

  13. [13]

    Neural Architecture Search: A Survey

    Thomas Elsken et al. Neural architecture search: A survey. arXiv preprint arXiv:1808.05377 , 2018

  14. [14]

    Morphnet: Fast & simple resource-constrained structure learning of deep networks

    Ariel Gordon, Elad Eban, Ofir Nachum, Bo Chen, Hao Wu, Tien-Ju Yang, and Edward Choi. Morphnet: Fast & simple resource-constrained structure learning of deep networks. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 1586–1595, 2018

  15. [15]

    Learning both weights and connections for efficient neural networks

    Song Han et al. Learning both weights and connections for efficient neural networks. NIPS, pages 1135–1143, 2015

  16. [16]

    Channel pruning for accelerating very deep neural networks

    Yihui He, Xiangyu Zhang, and Jian Sun. Channel pruning for accelerating very deep neural networks. In The IEEE International Conference on Computer Vision (ICCV) , Oct 2017

  17. [17]

    Amc: Automl for model compression and acceleration on mobile devices

    Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), pages 784–800, 2018

  18. [18]

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    Andrew G. Howard et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017. URL http://arxiv.org/abs/1704.04861

  19. [19]

    Densely connected convolutional networks

    Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In CVPR, volume 1, page 3, 2017

  20. [20]

    Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

    Itay Hubara et al. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv:1609.07061, 2016

  21. [21]

    Neural Architecture Search with Bayesian Optimisation and Optimal Transport

    Kirthevasan Kandasamy et al. Neural architecture search with bayesian optimisation and optimal transport. CoRR, abs/1802.07191, 2018. URL http://arxiv.org/abs/1802.07191

  22. [22]

    Progressive Growing of GANs for Improved Quality, Stability, and Variation

    T. Karras et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arXiv: 1710.10196, October 2017. 12 Y ANQI ZHOU ET.AL.: EPNAS

  23. [23]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009

  24. [24]

    The cifar-10 dataset

    Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The cifar-10 dataset. online: http://www. cs. toronto. edu/kriz/cifar . html, 55, 2014

  25. [25]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014

  26. [26]

    Sparse convolutional neural networks

    Baoyuan Liu et al. Sparse convolutional neural networks. In CVPR, pages 806–814, June 2015

  27. [27]

    Progressive neural architecture search

    Chenxi Liu et al. Progressive neural architecture search. In ECCV, pages 19–34, 2018

  28. [28]

    DARTS: Differentiable Architecture Search

    Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search.arXiv preprint arXiv:1806.09055, 2018

  29. [29]

    Hierarchical representations for efficient architecture search

    Hanxiao Liu et al. Hierarchical representations for efficient architecture search. ICLR, 2018

  30. [30]

    SGDR: Stochastic Gradient Descent with Warm Restarts

    Ilya Loshchilov and Frank Hutter. SGDR: stochastic gradient descent with restarts. CoRR, abs/1608.03983, 2016. URL http://arxiv.org/abs/1608.03983

  31. [31]

    ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design

    Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. Shufflenet v2: Practical guidelines for efficient cnn architecture design. arXiv preprint arXiv:1807.11164, 2018

  32. [32]

    DeepArchitect: Automatically Designing and Training Deep Architectures

    Renato Negrinho and Geoff Gordon. DeepArchitect: Automatically Designing and Training Deep Architectures. 2017. URL http://arxiv.org/abs/1704.08792

  33. [34]

    Efficient Neural Architecture Search via Parameter Sharing

    Hieu Pham et al. Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268, 2018

  34. [35]

    Large-Scale Evolution of Image Classifiers

    Esteban Real et al. Large-scale evolution of image classifiers. arXiv preprint arXiv:1703.01041, 2017

  35. [36]

    Regularized Evolution for Image Classifier Architecture Search

    Esteban Real et al. Regularized evolution for image classifier architecture search. CoRR, abs/1802.01548, 2018

  36. [37]

    Simple statistical gradient-following algorithms for connectionist reinforcement learning

    Williams R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. 1992

  37. [38]

    Le Samuel L

    Chris Ying Quoc V . Le Samuel L. Smith, Pieter-Jan Kindermans. Don’t decay the learning rate, increase the batch size. ICLR, 2018

  38. [39]

    MobileNetV2: Inverted Residuals and Linear Bottlenecks

    M. Sandler et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv: 1801.04381, January 2018

  39. [40]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556, September 2014

  40. [41]

    Structured Transforms for Small-Footprint Deep Learning

    V . Sindhwani et al. Structured Transforms for Small-Footprint Deep Learning.arXiv:1510.01722, October 2015

  41. [42]

    Evolving neural networks through augmenting topologies

    Kenneth O Stanley and Risto Miikkulainen. Evolving neural networks through augmenting topologies. Evolutionary computation, 10(2):99–127, 2002. Y ANQI ZHOU ET.AL.: EPNAS 13

  42. [43]

    MnasNet: Platform-Aware Neural Architecture Search for Mobile

    Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, and Quoc V Le. Mnasnet: Platform- aware neural architecture search for mobile. arXiv preprint arXiv:1807.11626, 2018

  43. [44]

    Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution

    Frank Hutter Thomas Elsken, Jan Hendrik Metzen. Multi-objective architecture search for cnns. CoRR, 2018. URL https://arxiv.org/abs/1804.09081

  44. [45]

    Parallel WaveNet: Fast High-Fidelity Speech Synthesis

    A. van den Oord et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis.arXiv:1711.10433, November 2017

  45. [46]

    Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

    Y . Wu, Schuster, et al. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv:1609.08144, 2016

  46. [47]

    Aggregated residual transformations for deep neural networks

    Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 5987–5995. IEEE, 2017

  47. [48]

    Exploring Randomly Wired Neural Networks for Image Recognition

    Saining Xie, Alexander Kirillov, Ross Girshick, and Kaiming He. Exploring randomly wired neural networks for image recognition. arXiv preprint arXiv:1904.01569, 2019

  48. [49]

    Snas: Stochastic neural architecture search

    Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. Snas: Stochastic neural architecture search. ICLR, 2019

  49. [50]

    Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

    K. Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. arXiv: 1502.03044, February 2015

  50. [51]

    ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

    Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. CoRR, abs/1707.01083, 2017

  51. [52]

    Barret Zoph and Quoc V . Le. Neural Architecture Search with Reinforcement Learning. 2016. ISSN 1938-7228. doi: 10.1016/j.knosys.2015.01.010. URL http://arxiv.org/abs/ 1611.01578

  52. [53]

    Learning transferable architec- tures for scalable image recognition

    Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architec- tures for scalable image recognition. CVPR, 2018