pith. sign in

arxiv: 1907.08307 · v1 · pith:FVEKDQSAnew · submitted 2019-07-18 · 💻 cs.LG · cs.CV· cs.NE· stat.ML

XferNAS: Transfer Neural Architecture Search

Pith reviewed 2026-05-24 19:33 UTC · model grok-4.3

classification 💻 cs.LG cs.CVcs.NEstat.ML
keywords neural architecture searchtransfer learningNAS optimizersCIFAR-10CIFAR-100search time reductionknowledge reuse
0
0 comments X

The pith

A transfer framework lets existing NAS optimizers reuse prior task knowledge and cuts search time from 200 to 6 GPU days.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a generally applicable framework that adds only minor changes to any existing neural architecture search optimizer so that knowledge from searches on source tasks can be reused on a new target task. Experiments integrate the framework with one optimizer and measure its effect on CIFAR-10 and CIFAR-100, showing search time drops by a factor of 33 while error rates reach new lows of 1.99 and 14.06. A separate study varies the amount of source and target data and finds the modified optimizer is never worse and is usually better than the unmodified version. A sympathetic reader cares because NAS has been too expensive for routine use; this approach keeps the original optimizer intact while making repeated searches practical.

Core claim

The central claim is that a transfer framework, realized through only minor modifications to existing NAS optimizers, reuses knowledge learned on source tasks to reduce search time and improve final architectures on target tasks. Integration with one optimizer on CIFAR-10 and CIFAR-100 yields a 33-fold reduction in GPU days and new record error rates of 1.99 and 14.06. The framework is shown to be robust across different quantities of source and target data, always matching or exceeding the performance of the base optimizer.

What carries the argument

The transfer framework that injects knowledge reuse from prior NAS searches into existing optimizers via minor code changes.

If this is right

  • Any existing NAS optimizer can be upgraded to a transfer version without redesigning its core logic.
  • Repeated architecture searches become feasible on modest compute budgets instead of hundreds of GPU days.
  • New state-of-the-art error rates on CIFAR-10 and CIFAR-100 are reachable while preserving the original optimizer's strengths.
  • Performance remains at least as good as the base optimizer even when source and target data quantities vary.
  • The same minor-change approach can be applied to other NAS benchmarks without task-specific redesign.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework could be applied sequentially so that each new task accumulates knowledge from all earlier ones.
  • Similar reuse patterns might shorten other expensive hyperparameter searches that share architectural structure.
  • If negative transfer appears on dissimilar tasks, lightweight detection heuristics could be added without altering the core claim.
  • Industry pipelines that run many related searches could adopt the framework to amortize search cost across projects.

Load-bearing premise

Knowledge from source-task searches transfers positively to the target task even after only minor modifications and without substantial negative transfer.

What would settle it

An experiment on a new target task in which the transferred optimizer returns higher error rates or requires more search time than the unmodified optimizer would falsify the claimed benefit.

Figures

Figures reproduced from arXiv: 1907.08307 by Martin Wistuba.

Figure 1
Figure 1. Figure 1: An example of the integration of the transfer network into RL-based (a) and surrogate model-based (b) optimizers. The [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: XferNAS: Integration of the transfer network in NAO. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Correlation coefficient between predicted and true [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Convolution and reduction cell of XferNASNet. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
read the original abstract

The term Neural Architecture Search (NAS) refers to the automatic optimization of network architectures for a new, previously unknown task. Since testing an architecture is computationally very expensive, many optimizers need days or even weeks to find suitable architectures. However, this search time can be significantly reduced if knowledge from previous searches on different tasks is reused. In this work, we propose a generally applicable framework that introduces only minor changes to existing optimizers to leverage this feature. As an example, we select an existing optimizer and demonstrate the complexity of the integration of the framework as well as its impact. In experiments on CIFAR-10 and CIFAR-100, we observe a reduction in the search time from 200 to only 6 GPU days, a speed up by a factor of 33. In addition, we observe new records of 1.99 and 14.06 for NAS optimizers on the CIFAR benchmarks, respectively. In a separate study, we analyze the impact of the amount of source and target data. Empirically, we demonstrate that the proposed framework generally gives better results and, in the worst case, is just as good as the unmodified optimizer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes XferNAS, a general framework that adds only minor modifications to existing NAS optimizers to transfer knowledge from searches on source tasks, thereby accelerating architecture search on new target tasks. On CIFAR-10 and CIFAR-100 it reports reducing search cost from 200 to 6 GPU days (33× speedup) while setting new records of 1.99 % and 14.06 % error; a separate ablation studies the effect of source/target data volume and claims the method is never worse than the unmodified baseline.

Significance. If the positive-transfer premise holds across task distributions, the work would materially lower the computational barrier to NAS by amortizing prior search effort, making automated architecture design practical for a wider range of users. The design choice of minimal changes to existing optimizers is a pragmatic strength that facilitates adoption.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (data-volume study): the claim that the framework “generally gives better results and, in the worst case, is just as good” rests on an unquantified assumption of positive transfer; no metric of source–target task similarity, no worst-case bound on negative transfer under distribution shift, and no analysis beyond data-volume ablation are provided, yet these are load-bearing for both the 33× speedup and the record claims.
  2. [§5] §5 (CIFAR benchmark results): the reported records (1.99 % / 14.06 %) and wall-clock reduction are given without the number of independent runs, standard deviations, or statistical comparison against the unmodified optimizer, so it is impossible to assess whether the observed gains are reliable or merely within run-to-run variance.
minor comments (2)
  1. [Abstract] Abstract: the phrase “new records of 1.99 and 14.06 for NAS optimizers” is ambiguous; clarify whether these are test error rates of the discovered architectures or some other metric.
  2. [§3] Notation: the transfer mechanism (how source knowledge is encoded and injected) is described only at a high level; a concise pseudocode or diagram in §3 would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments. We respond point-by-point to the major comments below, indicating where revisions will be made and where limitations prevent further changes.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (data-volume study): the claim that the framework “generally gives better results and, in the worst case, is just as good” rests on an unquantified assumption of positive transfer; no metric of source–target task similarity, no worst-case bound on negative transfer under distribution shift, and no analysis beyond data-volume ablation are provided, yet these are load-bearing for both the 33× speedup and the record claims.

    Authors: We agree that the claim is supported only by the empirical data-volume ablation in §4 rather than by a similarity metric or theoretical bound. No such metric or worst-case analysis appears in the manuscript. In revision we will qualify the statement in the abstract and §4 to read “in our experiments” and will add an explicit limitations paragraph noting the absence of a bound on negative transfer under arbitrary distribution shift. revision: yes

  2. Referee: [§5] §5 (CIFAR benchmark results): the reported records (1.99 % / 14.06 %) and wall-clock reduction are given without the number of independent runs, standard deviations, or statistical comparison against the unmodified optimizer, so it is impossible to assess whether the observed gains are reliable or merely within run-to-run variance.

    Authors: The CIFAR benchmark results in §5 were obtained from single runs of each search; the manuscript therefore contains no report of independent runs, standard deviations, or statistical tests. Because the original experiments were not replicated, we cannot supply these quantities. We will add a sentence in §5 stating that results are from single runs and noting the computational cost as the reason. revision: no

standing simulated objections not resolved
  • Provision of standard deviations or statistical comparisons for the §5 CIFAR results, because multiple independent runs were not performed in the original study.

Circularity Check

0 steps flagged

No circularity; empirical validation on external benchmarks

full rationale

The paper proposes a transfer framework for NAS optimizers and reports observed speedups (200 to 6 GPU days) and accuracies (1.99/14.06) from direct experiments on CIFAR-10/100. No equations, predictions, or first-principles results are derived; all central claims are measurements against fixed external benchmarks. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided text. The work is self-contained against external data and receives the default low score.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that architectural knowledge transfers across tasks; no free parameters or invented entities are visible in the abstract.

axioms (1)
  • domain assumption Architectural knowledge from source tasks transfers positively to target tasks with only minor optimizer changes.
    This premise is required for the claimed speedup and record results to hold.

pith-pipeline@v0.9.0 · 5734 in / 1105 out tokens · 23942 ms · 2026-05-24T19:33:44.206965+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 5 internal anchors

  1. [1]

    Designing neural network architectures using reinforcement learning

    Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. Designing neural network architectures using reinforcement learning. In 5th International Confer- ence on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceed- ings, 2017

  2. [2]

    Understanding and simplifying one-shot architecture search

    Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. Understanding and simplifying one-shot architecture search. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning , volume 80 of Proceedings of Machine Learning Re- search, pages 550–559, Stockholmsmässan, Stockholm ...

  3. [3]

    Efficient architecture search by network transformation

    Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. Efficient architecture search by network transformation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelli- gence (IAAI-18), and the 8th AAAI Symposium on Edu- cational Advances in Artificial Intelligence ...

  4. [4]

    Path-level network transformation for efficient architecture search

    Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu. Path-level network transformation for efficient architecture search. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10- 15, 2018, pages 677–686, 2018

  5. [5]

    ProxylessNAS: Direct neural architecture search on target task and hard- ware

    Han Cai, Ligeng Zhu, and Song Han. ProxylessNAS: Direct neural architecture search on target task and hard- ware. In Proceedings of the International Conference on Learning Representations, ICLR 2019, New Orleans, Louisiana, USA, 2019

  6. [6]

    Terrance DeVries and Graham W. Taylor. Improved regularization of convolutional neural networks with cutout. CoRR, abs/1708.04552, 2017

  7. [7]

    Shake-Shake regularization

    Xavier Gastaldi. Shake-shake regularization. CoRR, abs/1705.07485, 2017

  8. [8]

    A neural representation of sketch drawings

    David Ha and Douglas Eck. A neural representation of sketch drawings. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018

  9. [9]

    Weinberger

    Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vi- sion and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 , pages 2261–2269. IEEE Computer Society, 2017

  10. [10]

    Nikolopoulos, Costas Bekas, and A

    Roxana Istrate, Florian Scheidegger, Giovanni Mariani, Dimitrios S. Nikolopoulos, Costas Bekas, and A. Cris- tiano I. Malossi. TAPAS: train-less accuracy predictor for architecture search. In Proceedings of the Thirty- Third AAAI Conference on Artificial Intelligence, (AAAI- 19), Honolulu, Hawaii, USA, 2019

  11. [11]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In 3rd International Con- ference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Pro- ceedings, 2015

  12. [12]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009

  13. [13]

    Fractalnet: Ultra-deep neural networks without residuals

    Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings , 2017

  14. [14]

    Progressive neural architecture search

    Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), pages 19–34, 2018

  15. [15]

    Hierarchical representations for efficient architecture search

    Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisan- tha Fernando, and Koray Kavukcuoglu. Hierarchical representations for efficient architecture search. In6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018

  16. [16]

    DARTS: differentiable architecture search

    Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: differentiable architecture search. In Proceed- ings of the International Conference on Learning Repre- sentations, ICLR 2019, New Orleans, Louisiana, USA, 2019

  17. [17]

    Reed, Cheng-Yang Fu, and Alexan- der C

    Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexan- der C. Berg. SSD: single shot multibox detector. In Computer Vision - ECCV 2016 - 14th European Con- ference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part I, pages 21–37, 2016

  18. [18]

    SGDR: stochastic gradient descent with warm restarts

    Ilya Loshchilov and Frank Hutter. SGDR: stochastic gradient descent with warm restarts. In 5th Interna- tional Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017

  19. [19]

    Neural architecture optimization

    Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie- Yan Liu. Neural architecture optimization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pages 7827–7838, 2018

  20. [20]

    Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y . Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011

  21. [21]

    Efficient neural architecture search via parameters sharing

    Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. Efficient neural architecture search via parameters sharing. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Pro- ceedings of Machine Learning Research, pages 4095– 4104, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2...

  22. [22]

    Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V . Le. Aging evolution for image classifier ar- chitecture search. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, (AAAI-19), Honolulu, Hawaii, USA, 2019

  23. [23]

    Le, and Alexey Kurakin

    Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V . Le, and Alexey Kurakin. Large-scale evolution of image classifiers. In Doina Precup and Yee Whye Teh, edi- tors, Proceedings of the 34th International Conference on Machine Learning , volume 70 of Proceedings of Machine Learning Research, pages 2902–2911, Inte...

  24. [24]

    Gir- shick, and Ali Farhadi

    Joseph Redmon, Santosh Kumar Divvala, Ross B. Gir- shick, and Ali Farhadi. You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV , USA, June 27-30, 2016, pages 779–788, 2016

  25. [25]

    Deep learning architecture search by neuro-cell-based evolution with function-preserving mutations

    Martin Wistuba. Deep learning architecture search by neuro-cell-based evolution with function-preserving mutations. In ECML/PKDD (2) , volume 11052 of Lecture Notes in Computer Science , pages 243–258. Springer, 2018

  26. [26]

    Practical deep learning architecture optimization

    Martin Wistuba. Practical deep learning architecture optimization. In 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018, Turin, Italy, October 1-3, 2018, pages 263–272, 2018

  27. [27]

    Inductive Transfer for Neural Architecture Optimization

    Martin Wistuba and Tejaswini Pedapati. Inductive transfer for neural architecture optimization. CoRR, abs/1903.03536, 2019

  28. [28]

    A Survey on Neural Architecture Search

    Martin Wistuba, Ambrish Rawat, and Tejaswini Peda- pati. A survey on neural architecture search. CoRR, abs/1905.01392, 2019

  29. [29]

    Transfer learning with neural automl

    Catherine Wong, Neil Houlsby, Yifeng Lu, and Andrea Gesmundo. Transfer learning with neural automl. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Process- ing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada., pages 8366–8375, 2018

  30. [30]

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion- mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017

  31. [31]

    Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He

    Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transforma- tions for deep neural networks. In 2017 IEEE Con- ference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 , pages 5987–5995, 2017

  32. [32]

    SNAS: stochastic neural architecture search

    Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. SNAS: stochastic neural architecture search. In Pro- ceedings of the International Conference on Learning Representations, ICLR 2019, New Orleans, Louisiana, USA, 2019

  33. [33]

    Practical block-wise neural network architecture generation

    Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, and Cheng-Lin Liu. Practical block-wise neural network architecture generation. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 2423– 2432, 2018

  34. [34]

    Barret Zoph and Quoc V . Le. Neural architecture search with reinforcement learning. In 5th Interna- tional Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017

  35. [35]

    conv_dag

    Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V . Le. Learning transferable architectures for scalable image recognition. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 8697–8710, 2018. A Training details for the convolutional neural networks During the search proce...