Efficient Neural Architecture Search via Parameter Sharing

Barret Zoph; Hieu Pham; Jeff Dean; Melody Y. Guan; Quoc V. Le

arxiv: 1802.03268 · v2 · pith:JEA2NXYAnew · submitted 2018-02-09 · 💻 cs.LG · cs.CL· cs.CV· cs.NE· stat.ML

Efficient Neural Architecture Search via Parameter Sharing

Hieu Pham , Melody Y. Guan , Barret Zoph , Quoc V. Le , Jeff Dean This is my paper

classification 💻 cs.LG cs.CLcs.CVcs.NEstat.ML

keywords enasarchitectureneuralmodelsearchsubgraphtestarchitectures

0 comments

read the original abstract

We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Thanks to parameter sharing between child models, ENAS is fast: it delivers strong empirical performances using much fewer GPU-hours than all existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new state-of-the-art among all methods without post-training processing. On the CIFAR-10 dataset, ENAS designs novel architectures that achieve a test error of 2.89%, which is on par with NASNet (Zoph et al., 2018), whose test error is 2.65%.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis
cs.LG 2026-05 unverdicted novelty 7.0

QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.