XferNAS: Transfer Neural Architecture Search

Martin Wistuba

arxiv: 1907.08307 · v1 · pith:FVEKDQSAnew · submitted 2019-07-18 · 💻 cs.LG · cs.CV· cs.NE· stat.ML

XferNAS: Transfer Neural Architecture Search

Martin Wistuba This is my paper

Pith reviewed 2026-05-24 19:33 UTC · model grok-4.3

classification 💻 cs.LG cs.CVcs.NEstat.ML

keywords neural architecture searchtransfer learningNAS optimizersCIFAR-10CIFAR-100search time reductionknowledge reuse

0 comments

The pith

A transfer framework lets existing NAS optimizers reuse prior task knowledge and cuts search time from 200 to 6 GPU days.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a generally applicable framework that adds only minor changes to any existing neural architecture search optimizer so that knowledge from searches on source tasks can be reused on a new target task. Experiments integrate the framework with one optimizer and measure its effect on CIFAR-10 and CIFAR-100, showing search time drops by a factor of 33 while error rates reach new lows of 1.99 and 14.06. A separate study varies the amount of source and target data and finds the modified optimizer is never worse and is usually better than the unmodified version. A sympathetic reader cares because NAS has been too expensive for routine use; this approach keeps the original optimizer intact while making repeated searches practical.

Core claim

The central claim is that a transfer framework, realized through only minor modifications to existing NAS optimizers, reuses knowledge learned on source tasks to reduce search time and improve final architectures on target tasks. Integration with one optimizer on CIFAR-10 and CIFAR-100 yields a 33-fold reduction in GPU days and new record error rates of 1.99 and 14.06. The framework is shown to be robust across different quantities of source and target data, always matching or exceeding the performance of the base optimizer.

What carries the argument

The transfer framework that injects knowledge reuse from prior NAS searches into existing optimizers via minor code changes.

If this is right

Any existing NAS optimizer can be upgraded to a transfer version without redesigning its core logic.
Repeated architecture searches become feasible on modest compute budgets instead of hundreds of GPU days.
New state-of-the-art error rates on CIFAR-10 and CIFAR-100 are reachable while preserving the original optimizer's strengths.
Performance remains at least as good as the base optimizer even when source and target data quantities vary.
The same minor-change approach can be applied to other NAS benchmarks without task-specific redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be applied sequentially so that each new task accumulates knowledge from all earlier ones.
Similar reuse patterns might shorten other expensive hyperparameter searches that share architectural structure.
If negative transfer appears on dissimilar tasks, lightweight detection heuristics could be added without altering the core claim.
Industry pipelines that run many related searches could adopt the framework to amortize search cost across projects.

Load-bearing premise

Knowledge from source-task searches transfers positively to the target task even after only minor modifications and without substantial negative transfer.

What would settle it

An experiment on a new target task in which the transferred optimizer returns higher error rates or requires more search time than the unmodified optimizer would falsify the claimed benefit.

Figures

Figures reproduced from arXiv: 1907.08307 by Martin Wistuba.

**Figure 1.** Figure 1: An example of the integration of the transfer network into RL-based (a) and surrogate model-based (b) optimizers. The [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: XferNAS: Integration of the transfer network in NAO. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Correlation coefficient between predicted and true [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Convolution and reduction cell of XferNASNet. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

read the original abstract

The term Neural Architecture Search (NAS) refers to the automatic optimization of network architectures for a new, previously unknown task. Since testing an architecture is computationally very expensive, many optimizers need days or even weeks to find suitable architectures. However, this search time can be significantly reduced if knowledge from previous searches on different tasks is reused. In this work, we propose a generally applicable framework that introduces only minor changes to existing optimizers to leverage this feature. As an example, we select an existing optimizer and demonstrate the complexity of the integration of the framework as well as its impact. In experiments on CIFAR-10 and CIFAR-100, we observe a reduction in the search time from 200 to only 6 GPU days, a speed up by a factor of 33. In addition, we observe new records of 1.99 and 14.06 for NAS optimizers on the CIFAR benchmarks, respectively. In a separate study, we analyze the impact of the amount of source and target data. Empirically, we demonstrate that the proposed framework generally gives better results and, in the worst case, is just as good as the unmodified optimizer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

XferNAS adds a lightweight transfer layer to existing NAS optimizers and reports a 33x wall-clock cut plus new CIFAR records, but the gains rest on positive transfer that the experiments do not bound against dissimilar tasks.

read the letter

The core contribution is a general framework that lets you reuse knowledge from earlier NAS runs on other tasks by making only small changes to an existing optimizer. On CIFAR-10 and CIFAR-100 they show search time dropping from 200 GPU days to 6 while beating prior NAS numbers at 1.99% and 14.06% error. They also include a study varying the amount of source and target data, which gives some practical guidance on when the transfer helps most.

Referee Report

2 major / 2 minor

Summary. The paper proposes XferNAS, a general framework that adds only minor modifications to existing NAS optimizers to transfer knowledge from searches on source tasks, thereby accelerating architecture search on new target tasks. On CIFAR-10 and CIFAR-100 it reports reducing search cost from 200 to 6 GPU days (33× speedup) while setting new records of 1.99 % and 14.06 % error; a separate ablation studies the effect of source/target data volume and claims the method is never worse than the unmodified baseline.

Significance. If the positive-transfer premise holds across task distributions, the work would materially lower the computational barrier to NAS by amortizing prior search effort, making automated architecture design practical for a wider range of users. The design choice of minimal changes to existing optimizers is a pragmatic strength that facilitates adoption.

major comments (2)

[Abstract and §4] Abstract and §4 (data-volume study): the claim that the framework “generally gives better results and, in the worst case, is just as good” rests on an unquantified assumption of positive transfer; no metric of source–target task similarity, no worst-case bound on negative transfer under distribution shift, and no analysis beyond data-volume ablation are provided, yet these are load-bearing for both the 33× speedup and the record claims.
[§5] §5 (CIFAR benchmark results): the reported records (1.99 % / 14.06 %) and wall-clock reduction are given without the number of independent runs, standard deviations, or statistical comparison against the unmodified optimizer, so it is impossible to assess whether the observed gains are reliable or merely within run-to-run variance.

minor comments (2)

[Abstract] Abstract: the phrase “new records of 1.99 and 14.06 for NAS optimizers” is ambiguous; clarify whether these are test error rates of the discovered architectures or some other metric.
[§3] Notation: the transfer mechanism (how source knowledge is encoded and injected) is described only at a high level; a concise pseudocode or diagram in §3 would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments. We respond point-by-point to the major comments below, indicating where revisions will be made and where limitations prevent further changes.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (data-volume study): the claim that the framework “generally gives better results and, in the worst case, is just as good” rests on an unquantified assumption of positive transfer; no metric of source–target task similarity, no worst-case bound on negative transfer under distribution shift, and no analysis beyond data-volume ablation are provided, yet these are load-bearing for both the 33× speedup and the record claims.

Authors: We agree that the claim is supported only by the empirical data-volume ablation in §4 rather than by a similarity metric or theoretical bound. No such metric or worst-case analysis appears in the manuscript. In revision we will qualify the statement in the abstract and §4 to read “in our experiments” and will add an explicit limitations paragraph noting the absence of a bound on negative transfer under arbitrary distribution shift. revision: yes
Referee: [§5] §5 (CIFAR benchmark results): the reported records (1.99 % / 14.06 %) and wall-clock reduction are given without the number of independent runs, standard deviations, or statistical comparison against the unmodified optimizer, so it is impossible to assess whether the observed gains are reliable or merely within run-to-run variance.

Authors: The CIFAR benchmark results in §5 were obtained from single runs of each search; the manuscript therefore contains no report of independent runs, standard deviations, or statistical tests. Because the original experiments were not replicated, we cannot supply these quantities. We will add a sentence in §5 stating that results are from single runs and noting the computational cost as the reason. revision: no

standing simulated objections not resolved

Provision of standard deviations or statistical comparisons for the §5 CIFAR results, because multiple independent runs were not performed in the original study.

Circularity Check

0 steps flagged

No circularity; empirical validation on external benchmarks

full rationale

The paper proposes a transfer framework for NAS optimizers and reports observed speedups (200 to 6 GPU days) and accuracies (1.99/14.06) from direct experiments on CIFAR-10/100. No equations, predictions, or first-principles results are derived; all central claims are measurements against fixed external benchmarks. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided text. The work is self-contained against external data and receives the default low score.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that architectural knowledge transfers across tasks; no free parameters or invented entities are visible in the abstract.

axioms (1)

domain assumption Architectural knowledge from source tasks transfers positively to target tasks with only minor optimizer changes.
This premise is required for the claimed speedup and record results to hold.

pith-pipeline@v0.9.0 · 5734 in / 1105 out tokens · 23942 ms · 2026-05-24T19:33:44.206965+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 5 internal anchors

[1]

Designing neural network architectures using reinforcement learning

Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. Designing neural network architectures using reinforcement learning. In 5th International Confer- ence on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceed- ings, 2017

work page 2017
[2]

Understanding and simplifying one-shot architecture search

Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Understanding and simplifying one-shot architecture search. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning , volume 80 of Proceedings of Machine Learning Re- search, pages 550–559, Stockholmsmässan, Stockholm ...

work page 2018
[3]

Efﬁcient architecture search by network transformation

Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. Efﬁcient architecture search by network transformation. In Proceedings of the Thirty-Second AAAI Conference on Artiﬁcial Intelligence, (AAAI-18), the 30th innovative Applications of Artiﬁcial Intelli- gence (IAAI-18), and the 8th AAAI Symposium on Edu- cational Advances in Artiﬁcial Intelligence ...

work page 2018
[4]

Path-level network transformation for efﬁcient architecture search

Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu. Path-level network transformation for efﬁcient architecture search. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10- 15, 2018, pages 677–686, 2018

work page 2018
[5]

ProxylessNAS: Direct neural architecture search on target task and hard- ware

Han Cai, Ligeng Zhu, and Song Han. ProxylessNAS: Direct neural architecture search on target task and hard- ware. In Proceedings of the International Conference on Learning Representations, ICLR 2019, New Orleans, Louisiana, USA, 2019

work page 2019
[6]

Terrance DeVries and Graham W. Taylor. Improved regularization of convolutional neural networks with cutout. CoRR, abs/1708.04552, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[7]

Shake-Shake regularization

Xavier Gastaldi. Shake-shake regularization. CoRR, abs/1705.07485, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[8]

A neural representation of sketch drawings

David Ha and Douglas Eck. A neural representation of sketch drawings. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018

work page 2018
[9]

Weinberger

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vi- sion and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 , pages 2261–2269. IEEE Computer Society, 2017

work page 2017
[10]

Nikolopoulos, Costas Bekas, and A

Roxana Istrate, Florian Scheidegger, Giovanni Mariani, Dimitrios S. Nikolopoulos, Costas Bekas, and A. Cris- tiano I. Malossi. TAPAS: train-less accuracy predictor for architecture search. In Proceedings of the Thirty- Third AAAI Conference on Artiﬁcial Intelligence, (AAAI- 19), Honolulu, Hawaii, USA, 2019

work page 2019
[11]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In 3rd International Con- ference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Pro- ceedings, 2015

work page 2015
[12]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009

work page 2009
[13]

Fractalnet: Ultra-deep neural networks without residuals

Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings , 2017

work page 2017
[14]

Progressive neural architecture search

Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), pages 19–34, 2018

work page 2018
[15]

Hierarchical representations for efﬁcient architecture search

Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisan- tha Fernando, and Koray Kavukcuoglu. Hierarchical representations for efﬁcient architecture search. In6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018

work page 2018
[16]

DARTS: differentiable architecture search

Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: differentiable architecture search. In Proceed- ings of the International Conference on Learning Repre- sentations, ICLR 2019, New Orleans, Louisiana, USA, 2019

work page 2019
[17]

Reed, Cheng-Yang Fu, and Alexan- der C

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexan- der C. Berg. SSD: single shot multibox detector. In Computer Vision - ECCV 2016 - 14th European Con- ference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I, pages 21–37, 2016

work page 2016
[18]

SGDR: stochastic gradient descent with warm restarts

Ilya Loshchilov and Frank Hutter. SGDR: stochastic gradient descent with warm restarts. In 5th Interna- tional Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017

work page 2017
[19]

Neural architecture optimization

Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie- Yan Liu. Neural architecture optimization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pages 7827–7838, 2018

work page 2018
[20]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011

work page 2011
[21]

Efﬁcient neural architecture search via parameters sharing

Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. Efﬁcient neural architecture search via parameters sharing. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Pro- ceedings of Machine Learning Research, pages 4095– 4104, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2...

work page 2018
[22]

Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V . Le. Aging evolution for image classiﬁer ar- chitecture search. In Proceedings of the Thirty-Third AAAI Conference on Artiﬁcial Intelligence, (AAAI-19), Honolulu, Hawaii, USA, 2019

work page 2019
[23]

Le, and Alexey Kurakin

Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V . Le, and Alexey Kurakin. Large-scale evolution of image classiﬁers. In Doina Precup and Yee Whye Teh, edi- tors, Proceedings of the 34th International Conference on Machine Learning , volume 70 of Proceedings of Machine Learning Research, pages 2902–2911, Inte...

work page 2017
[24]

Gir- shick, and Ali Farhadi

Joseph Redmon, Santosh Kumar Divvala, Ross B. Gir- shick, and Ali Farhadi. You only look once: Uniﬁed, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV , USA, June 27-30, 2016, pages 779–788, 2016

work page 2016
[25]

Deep learning architecture search by neuro-cell-based evolution with function-preserving mutations

Martin Wistuba. Deep learning architecture search by neuro-cell-based evolution with function-preserving mutations. In ECML/PKDD (2) , volume 11052 of Lecture Notes in Computer Science , pages 243–258. Springer, 2018

work page 2018
[26]

Practical deep learning architecture optimization

Martin Wistuba. Practical deep learning architecture optimization. In 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018, Turin, Italy, October 1-3, 2018, pages 263–272, 2018

work page 2018
[27]

Inductive Transfer for Neural Architecture Optimization

Martin Wistuba and Tejaswini Pedapati. Inductive transfer for neural architecture optimization. CoRR, abs/1903.03536, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903
[28]

A Survey on Neural Architecture Search

Martin Wistuba, Ambrish Rawat, and Tejaswini Peda- pati. A survey on neural architecture search. CoRR, abs/1905.01392, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905
[29]

Transfer learning with neural automl

Catherine Wong, Neil Houlsby, Yifeng Lu, and Andrea Gesmundo. Transfer learning with neural automl. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Process- ing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pages 8366–8375, 2018

work page 2018
[30]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion- mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[31]

Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He

Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transforma- tions for deep neural networks. In 2017 IEEE Con- ference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 , pages 5987–5995, 2017

work page 2017
[32]

SNAS: stochastic neural architecture search

Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. SNAS: stochastic neural architecture search. In Pro- ceedings of the International Conference on Learning Representations, ICLR 2019, New Orleans, Louisiana, USA, 2019

work page 2019
[33]

Practical block-wise neural network architecture generation

Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, and Cheng-Lin Liu. Practical block-wise neural network architecture generation. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 2423– 2432, 2018

work page 2018
[34]

Barret Zoph and Quoc V . Le. Neural architecture search with reinforcement learning. In 5th Interna- tional Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017

work page 2017
[35]

conv_dag

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V . Le. Learning transferable architectures for scalable image recognition. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 8697–8710, 2018. A Training details for the convolutional neural networks During the search proce...

work page 2018

[1] [1]

Designing neural network architectures using reinforcement learning

Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. Designing neural network architectures using reinforcement learning. In 5th International Confer- ence on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceed- ings, 2017

work page 2017

[2] [2]

Understanding and simplifying one-shot architecture search

Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Understanding and simplifying one-shot architecture search. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning , volume 80 of Proceedings of Machine Learning Re- search, pages 550–559, Stockholmsmässan, Stockholm ...

work page 2018

[3] [3]

Efﬁcient architecture search by network transformation

Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. Efﬁcient architecture search by network transformation. In Proceedings of the Thirty-Second AAAI Conference on Artiﬁcial Intelligence, (AAAI-18), the 30th innovative Applications of Artiﬁcial Intelli- gence (IAAI-18), and the 8th AAAI Symposium on Edu- cational Advances in Artiﬁcial Intelligence ...

work page 2018

[4] [4]

Path-level network transformation for efﬁcient architecture search

Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu. Path-level network transformation for efﬁcient architecture search. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10- 15, 2018, pages 677–686, 2018

work page 2018

[5] [5]

ProxylessNAS: Direct neural architecture search on target task and hard- ware

Han Cai, Ligeng Zhu, and Song Han. ProxylessNAS: Direct neural architecture search on target task and hard- ware. In Proceedings of the International Conference on Learning Representations, ICLR 2019, New Orleans, Louisiana, USA, 2019

work page 2019

[6] [6]

Terrance DeVries and Graham W. Taylor. Improved regularization of convolutional neural networks with cutout. CoRR, abs/1708.04552, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[7] [7]

Shake-Shake regularization

Xavier Gastaldi. Shake-shake regularization. CoRR, abs/1705.07485, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[8] [8]

A neural representation of sketch drawings

David Ha and Douglas Eck. A neural representation of sketch drawings. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018

work page 2018

[9] [9]

Weinberger

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vi- sion and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 , pages 2261–2269. IEEE Computer Society, 2017

work page 2017

[10] [10]

Nikolopoulos, Costas Bekas, and A

Roxana Istrate, Florian Scheidegger, Giovanni Mariani, Dimitrios S. Nikolopoulos, Costas Bekas, and A. Cris- tiano I. Malossi. TAPAS: train-less accuracy predictor for architecture search. In Proceedings of the Thirty- Third AAAI Conference on Artiﬁcial Intelligence, (AAAI- 19), Honolulu, Hawaii, USA, 2019

work page 2019

[11] [11]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In 3rd International Con- ference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Pro- ceedings, 2015

work page 2015

[12] [12]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009

work page 2009

[13] [13]

Fractalnet: Ultra-deep neural networks without residuals

Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings , 2017

work page 2017

[14] [14]

Progressive neural architecture search

Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), pages 19–34, 2018

work page 2018

[15] [15]

Hierarchical representations for efﬁcient architecture search

Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisan- tha Fernando, and Koray Kavukcuoglu. Hierarchical representations for efﬁcient architecture search. In6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018

work page 2018

[16] [16]

DARTS: differentiable architecture search

Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: differentiable architecture search. In Proceed- ings of the International Conference on Learning Repre- sentations, ICLR 2019, New Orleans, Louisiana, USA, 2019

work page 2019

[17] [17]

Reed, Cheng-Yang Fu, and Alexan- der C

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexan- der C. Berg. SSD: single shot multibox detector. In Computer Vision - ECCV 2016 - 14th European Con- ference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I, pages 21–37, 2016

work page 2016

[18] [18]

SGDR: stochastic gradient descent with warm restarts

Ilya Loshchilov and Frank Hutter. SGDR: stochastic gradient descent with warm restarts. In 5th Interna- tional Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017

work page 2017

[19] [19]

Neural architecture optimization

Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie- Yan Liu. Neural architecture optimization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pages 7827–7838, 2018

work page 2018

[20] [20]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011

work page 2011

[21] [21]

Efﬁcient neural architecture search via parameters sharing

Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. Efﬁcient neural architecture search via parameters sharing. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Pro- ceedings of Machine Learning Research, pages 4095– 4104, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2...

work page 2018

[22] [22]

Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V . Le. Aging evolution for image classiﬁer ar- chitecture search. In Proceedings of the Thirty-Third AAAI Conference on Artiﬁcial Intelligence, (AAAI-19), Honolulu, Hawaii, USA, 2019

work page 2019

[23] [23]

Le, and Alexey Kurakin

Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V . Le, and Alexey Kurakin. Large-scale evolution of image classiﬁers. In Doina Precup and Yee Whye Teh, edi- tors, Proceedings of the 34th International Conference on Machine Learning , volume 70 of Proceedings of Machine Learning Research, pages 2902–2911, Inte...

work page 2017

[24] [24]

Gir- shick, and Ali Farhadi

Joseph Redmon, Santosh Kumar Divvala, Ross B. Gir- shick, and Ali Farhadi. You only look once: Uniﬁed, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV , USA, June 27-30, 2016, pages 779–788, 2016

work page 2016

[25] [25]

Deep learning architecture search by neuro-cell-based evolution with function-preserving mutations

Martin Wistuba. Deep learning architecture search by neuro-cell-based evolution with function-preserving mutations. In ECML/PKDD (2) , volume 11052 of Lecture Notes in Computer Science , pages 243–258. Springer, 2018

work page 2018

[26] [26]

Practical deep learning architecture optimization

Martin Wistuba. Practical deep learning architecture optimization. In 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018, Turin, Italy, October 1-3, 2018, pages 263–272, 2018

work page 2018

[27] [27]

Inductive Transfer for Neural Architecture Optimization

Martin Wistuba and Tejaswini Pedapati. Inductive transfer for neural architecture optimization. CoRR, abs/1903.03536, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903

[28] [28]

A Survey on Neural Architecture Search

Martin Wistuba, Ambrish Rawat, and Tejaswini Peda- pati. A survey on neural architecture search. CoRR, abs/1905.01392, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905

[29] [29]

Transfer learning with neural automl

Catherine Wong, Neil Houlsby, Yifeng Lu, and Andrea Gesmundo. Transfer learning with neural automl. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Process- ing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pages 8366–8375, 2018

work page 2018

[30] [30]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion- mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[31] [31]

Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He

Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transforma- tions for deep neural networks. In 2017 IEEE Con- ference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 , pages 5987–5995, 2017

work page 2017

[32] [32]

SNAS: stochastic neural architecture search

Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. SNAS: stochastic neural architecture search. In Pro- ceedings of the International Conference on Learning Representations, ICLR 2019, New Orleans, Louisiana, USA, 2019

work page 2019

[33] [33]

Practical block-wise neural network architecture generation

Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, and Cheng-Lin Liu. Practical block-wise neural network architecture generation. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 2423– 2432, 2018

work page 2018

[34] [34]

Barret Zoph and Quoc V . Le. Neural architecture search with reinforcement learning. In 5th Interna- tional Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017

work page 2017

[35] [35]

conv_dag

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V . Le. Learning transferable architectures for scalable image recognition. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 8697–8710, 2018. A Training details for the convolutional neural networks During the search proce...

work page 2018