pith. sign in

arxiv: 2509.26037 · v2 · pith:WKPDHE7Rnew · submitted 2025-09-30 · 💻 cs.AI · cs.CV· cs.LG

CoLLM-NAS: Collaborative Large Language Models for Efficient Knowledge-Guided Neural Architecture Search

Pith reviewed 2026-05-21 20:58 UTC · model grok-4.3

classification 💻 cs.AI cs.CVcs.LG
keywords neural architecture searchlarge language modelscollaborative LLMsknowledge-guided searchefficient NAStwo-stage NASImageNetNAS-Bench-201
0
0 comments X

The pith

A pair of large language models, one steering search direction and one generating candidates, delivers state-of-the-art neural architectures at 4 to 10 times lower search cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CoLLM-NAS as a two-stage framework that pairs a stateful Navigator LLM to set search direction with a stateless Generator LLM to create concrete architecture candidates. A Coordinator module handles their exchanges and incorporates evaluation feedback plus prior results into the process. This combination draws on the models' pre-trained understanding of neural network structures while adding progressive refinement from each iteration. A sympathetic reader would care because traditional neural architecture search often demands enormous computation or produces invalid designs; a reliable way to cut those costs could let more researchers explore tailored networks for specific hardware or tasks.

Core claim

CoLLM-NAS is a two-stage NAS framework that uses a stateful Navigator LLM to guide search direction, a stateless Generator LLM to synthesize high-quality candidates, and a Coordinator module to orchestrate inter-LLM communication and manage evaluation processes. The method efficiently guides the search by combining LLMs' inherent knowledge of structured neural architectures with progressive knowledge from iterative feedback and historical trajectory. Experimental results on ImageNet and NAS-Bench-201 show that CoLLM-NAS surpasses existing NAS methods and conventional search algorithms, achieving new state-of-the-art results while significantly reducing search costs by 4--10. Furthermore, CoL

What carries the argument

The CoLLM-NAS two-stage framework consisting of a stateful Navigator LLM for directional guidance, a stateless Generator LLM for candidate synthesis, and a Coordinator that manages communication and feedback integration.

If this is right

  • Achieves new state-of-the-art results on ImageNet and NAS-Bench-201 while cutting search costs by a factor of 4 to 10.
  • Consistently improves both accuracy and search efficiency when applied to existing two-stage NAS methods such as OFA, SPOS, and AutoFormer.
  • Generalizes across multiple search spaces including those for MobileNet, ShuffleNet, and AutoFormer variants.
  • Enables knowledge-guided search that avoids many of the invalid architectures produced by prior LLM-NAS approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same division of labor between one model that maintains search state and another that proposes concrete solutions might transfer to automated design problems outside neural networks, such as optimizing compiler passes or molecular structures.
  • Groups with modest computing budgets could use the approach to generate competitive custom models without renting large GPU clusters for weeks.
  • Explicit separation of directional guidance from candidate generation may offer a template for other multi-agent LLM systems that must balance exploration with concrete output.

Load-bearing premise

The assumption that pairing a stateful Navigator LLM with a stateless Generator LLM and feeding back evaluation results through a Coordinator will consistently produce valid, high-performing architectures without the invalidity or inefficiency problems of earlier LLM-based searches.

What would settle it

Running the method on NAS-Bench-201 and finding that the top architectures discovered do not exceed the accuracy of the best previously reported entries or that total search time remains within a factor of two of standard evolutionary or reinforcement-learning baselines.

Figures

Figures reproduced from arXiv: 2509.26037 by Yongtao Wang, Zhe Li, Zhiwei Lin.

Figure 1
Figure 1. Figure 1: Consistency heatmap between LLM-predicted and [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pipeline of CoLLM-NAS. The search starts with the Navigator LLM generating an initial [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: T-SNE visualization of CoLLM-NAS’s search dynamics on ImageNet-16-120 within NAS [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ablation on main mechanisms: (a) Comparison of iterative performance between CoLLM [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Performance comparison of different temperature settings on CIFAR-100 within NAS [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
read the original abstract

The integration of Large Language Models (LLMs) with Neural Architecture Search (NAS) has introduced new possibilities for automating the design of neural architectures. However, most existing methods face critical limitations, including architectural invalidity, computational inefficiency, and inferior performance compared to traditional NAS. In this work, we present Collaborative LLM-based NAS (CoLLM-NAS), a two-stage NAS framework with knowledge-guided search driven by two complementary LLMs. Specifically, we propose a stateful Navigator LLM to guide search direction, a stateless Generator LLM to synthesize high-quality candidates, and a Coordinator module to orchestrate inter-LLM communication and manage evaluation processes. CoLLM-NAS efficiently guides the search process by combining LLMs' inherent knowledge of structured neural architectures with progressive knowledge from iterative feedback and historical trajectory. Experimental results on ImageNet and NAS-Bench-201 show that CoLLM-NAS surpasses existing NAS methods and conventional search algorithms, achieving new state-of-the-art results while significantly reducing search costs by 4--10. Furthermore, CoLLM-NAS consistently enhances the performance and efficiency of various two-stage NAS methods (e.g., OFA, SPOS, and AutoFormer) across diverse search spaces (e.g., MobileNet, ShuffleNet, and AutoFormer), demonstrating its excellent generalization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents CoLLM-NAS, a two-stage collaborative LLM-based NAS framework consisting of a stateful Navigator LLM to guide search direction, a stateless Generator LLM to synthesize candidates, and a Coordinator module to orchestrate communication and evaluation. It claims to surpass existing NAS methods and conventional algorithms with new state-of-the-art results on ImageNet and NAS-Bench-201 while reducing search costs by a factor of 4-10, and to generalize by enhancing other two-stage NAS methods (OFA, SPOS, AutoFormer) across search spaces such as MobileNet, ShuffleNet, and AutoFormer.

Significance. If the performance and efficiency results hold under rigorous controls, the work would meaningfully advance LLM-guided NAS by addressing common issues of architectural invalidity and inefficiency through iterative knowledge feedback and inter-LLM coordination. The reported generalization across multiple search spaces and base methods is a positive aspect that could broaden practical applicability of automated architecture design.

major comments (2)
  1. [Abstract] Abstract: The central claim of 'significantly reducing search costs by 4--10' is load-bearing for the efficiency contribution. The description provides no information on the cost metric (e.g., whether cumulative LLM inference overhead from repeated Navigator-Generator-Coordinator cycles is included versus only architecture training/validation costs on ImageNet or NAS-Bench-201), which directly affects whether the 4-10x advantage over prior methods like SPOS or OFA can be substantiated.
  2. [Experimental Results] Experimental Results: The abstract asserts SOTA performance and superiority over existing NAS methods, yet supplies no details on baselines, number of runs, statistical significance, error bars, or controls for LLM output stochasticity. These omissions prevent evaluation of the reliability of the reported accuracy gains and generalization claims.
minor comments (2)
  1. [Abstract] Abstract: The phrasing 'reducing search costs by 4--10' is imprecise; it should read 'by a factor of 4 to 10' for clarity.
  2. [Abstract] Abstract: The roles of the Navigator, Generator, and Coordinator could be defined in one additional sentence to aid readers new to the collaborative setup.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and detailed comments on our manuscript. We address each major comment point by point below, agreeing that greater clarity on cost metrics and experimental reporting will strengthen the presentation. We commit to incorporating these revisions in the next version of the paper.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim of 'significantly reducing search costs by 4--10' is load-bearing for the efficiency contribution. The description provides no information on the cost metric (e.g., whether cumulative LLM inference overhead from repeated Navigator-Generator-Coordinator cycles is included versus only architecture training/validation costs on ImageNet or NAS-Bench-201), which directly affects whether the 4-10x advantage over prior methods like SPOS or OFA can be substantiated.

    Authors: We appreciate this observation on the cost metric. In CoLLM-NAS, the 4-10x reduction refers to the number of architecture evaluations (i.e., training and validation costs on ImageNet or NAS-Bench-201) enabled by the knowledge-guided iterative search, which is the standard metric in the NAS literature for comparing search efficiency against methods like SPOS and OFA. LLM inference overhead is not the primary component and is typically orders of magnitude smaller than GPU training costs, but we acknowledge the abstract does not explicitly clarify this distinction. We will revise the abstract to specify the cost metric and add a dedicated paragraph in the Experimental Results section providing a breakdown of total compute, including relative LLM overhead. revision: yes

  2. Referee: [Experimental Results] Experimental Results: The abstract asserts SOTA performance and superiority over existing NAS methods, yet supplies no details on baselines, number of runs, statistical significance, error bars, or controls for LLM output stochasticity. These omissions prevent evaluation of the reliability of the reported accuracy gains and generalization claims.

    Authors: Thank you for emphasizing the need for rigorous experimental details. The manuscript already compares against multiple baselines (SPOS, OFA, AutoFormer, and conventional algorithms) with results on ImageNet and NAS-Bench-201, and demonstrates generalization across search spaces. To improve reliability assessment, we will expand the Experimental Results section to report the number of independent runs, include mean performance with standard deviations (error bars), discuss controls for LLM stochasticity (e.g., fixed temperature and repeated sampling), and add statistical significance tests for key comparisons. These additions will be incorporated without altering the core claims. revision: yes

Circularity Check

0 steps flagged

Empirical NAS framework exhibits no circular derivation

full rationale

The paper presents CoLLM-NAS as an empirical two-stage search procedure combining stateful Navigator LLM, stateless Generator LLM, and Coordinator orchestration. No equations, first-principles derivations, or predictions are claimed that reduce by construction to fitted parameters or self-citations. Performance and cost results are reported from external benchmarks (ImageNet, NAS-Bench-201) and generalization tests on spaces like MobileNet. The method is self-contained against these benchmarks with no load-bearing self-citation chains or ansatz smuggling. This is the expected non-finding for an applied search algorithm rather than a closed mathematical claim.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the domain assumption that LLMs already encode useful knowledge about valid neural architectures and that iterative feedback can be effectively integrated without introducing new invalidity issues. The Coordinator is introduced as a new orchestration component without external validation.

axioms (1)
  • domain assumption LLMs possess inherent knowledge of structured neural architectures that can be leveraged to guide search
    Explicitly invoked in the abstract as the basis for knowledge-guided search.
invented entities (1)
  • Coordinator module no independent evidence
    purpose: Orchestrate inter-LLM communication and manage evaluation processes
    New component introduced to coordinate the Navigator and Generator.

pith-pipeline@v0.9.0 · 5772 in / 1385 out tokens · 66214 ms · 2026-05-21T20:58:50.243737+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

    cs.CV 2026-05 unverdicted novelty 6.0

    Authors structure architectural design knowledge with LLMs to create an open-ended NAS space and introduce FairNAD, which finds architectures improving 0.84, 2.17, and 2.35 points over SOTA on CIFAR-10, CIFAR-100, and...

  2. LLM as a Tool, Not an Agent: Code-Mined Tree Transformations for Neural Architecture Search

    cs.LG 2026-04 unverdicted novelty 6.0

    LLMasTool improves neural architecture search by evolving code-mined hierarchical trees with diversity-guided Bayesian planning and targeted LLM assistance, reporting gains of 0.69, 1.83, and 2.68 points on CIFAR-10, ...

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · cited by 2 Pith papers · 2 internal anchors

  1. [1]

    Claude sonnet 4

    Anthropic. Claude sonnet 4. https://www.anthropic.com/claude, 2025. Accessed: 2025-09-22

  2. [2]

    Once-for-all: Train one network and specialize it for efficient deployment

    Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. Once-for-all: Train one network and specialize it for efficient deployment. InInternational Conference on Learning Representations, 2020

  3. [3]

    ProxylessNAS: Direct neural architecture search on target task and hardware

    Han Cai, Ligeng Zhu, and Song Han. ProxylessNAS: Direct neural architecture search on target task and hardware. InInternational Conference on Learning Representations, 2019

  4. [4]

    Evoprompting: Language models for code-level neural architecture search.Advances in neural information processing systems, 36:7787–7817, 2023

    Angelica Chen, David Dohan, and David So. Evoprompting: Language models for code-level neural architecture search.Advances in neural information processing systems, 36:7787–7817, 2023

  5. [5]

    Autoformer: Searching trans- formers for visual recognition

    Minghao Chen, Houwen Peng, Jianlong Fu, and Haibin Ling. Autoformer: Searching trans- formers for visual recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 12270–12280, October 2021

  6. [6]

    {DARTS}- : Robustly stepping out of performance collapse without indicators

    Xiangxiang Chu, Xiaoxing Wang, Bo Zhang, Shun Lu, Xiaolin Wei, and Junchi Yan. {DARTS}- : Robustly stepping out of performance collapse without indicators. InInternational Conference on Learning Representations, 2021

  7. [7]

    Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search

    Xiangxiang Chu, Bo Zhang, and Ruijun Xu. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. InProceedings of the IEEE/CVF International Conference on computer vision, pages 12239–12248, 2021

  8. [8]

    Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning, 2025

    DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning, 2025

  9. [9]

    Imagenet: A large- scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

  10. [10]

    Nas-bench-201: Extending the scope of reproducible neural architecture search

    Xuanyi Dong and Yi Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. InInternational Conference on Learning Representations, 2020

  11. [11]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

  12. [12]

    Single path one-shot neural architecture search with uniform sampling

    Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. Single path one-shot neural architecture search with uniform sampling. InComputer vision–ECCV 2020: 16th European conference, glasgow, UK, August 23–28, 2020, proceedings, part XVI 16, pages 544–560. Springer, 2020

  13. [13]

    Sumnas: Supernet with unbiased meta-features for neural architecture search

    Hyeonmin Ha, Ji-Hoon Kim, Semin Park, and Byung-Gon Chun. Sumnas: Supernet with unbiased meta-features for neural architecture search. InInternational Conference on Learning Representations, 2022

  14. [14]

    Searching for mobilenetv3

    Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. InProceedings of the IEEE/CVF international conference on computer vision, pages 1314–1324, 2019

  15. [15]

    Greedynasv2: Greedier search with a greedy path filter

    Tao Huang, Shan You, Fei Wang, Chen Qian, Changshui Zhang, Xiaogang Wang, and Chang Xu. Greedynasv2: Greedier search with a greedy path filter. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11902–11911, 2022

  16. [16]

    Subnet-aware dynamic supernet training for neural architecture search

    Jeimin Jeon, Youngmin Oh, Junghyup Lee, Donghyeon Baek, Dohyung Kim, Chanho Eom, and Bumsub Ham. Subnet-aware dynamic supernet training for neural architecture search. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 30137–30146, 2025. 10

  17. [17]

    Rz-nas: Enhancing llm-guided neural architecture search via reflective zero-cost strategy

    Zipeng Ji, Guanghui Zhu, Chunfeng Yuan, and Yihua Huang. Rz-nas: Enhancing llm-guided neural architecture search via reflective zero-cost strategy. InForty-second International Conference on Machine Learning, 2025

  18. [18]

    Gonzalez, Hao Zhang, and Ion Stoica

    Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large lan- guage model serving with pagedattention. InProceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023

  19. [19]

    Random search and reproducibility for neural architecture search

    Liam Li and Ameet Talwalkar. Random search and reproducibility for neural architecture search. InUncertainty in artificial intelligence, pages 367–377. PMLR, 2020

  20. [20]

    DARTS: Differentiable architecture search

    Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable architecture search. InInternational Conference on Learning Representations, 2019

  21. [21]

    Pa&da: Jointly sampling path and data for consistent nas

    Shun Lu, Yu Hu, Longxing Yang, Zihao Sun, Jilin Mei, Jianchao Tan, and Chengru Song. Pa&da: Jointly sampling path and data for consistent nas. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11940–11949, 2023

  22. [22]

    Shufflenet v2: Practical guidelines for efficient cnn architecture design

    Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. Shufflenet v2: Practical guidelines for efficient cnn architecture design. InProceedings of the European conference on computer vision (ECCV), pages 116–131, 2018

  23. [23]

    Llmatic: neural architecture search via large language models and quality diversity optimization

    Muhammad Umair Nasir, Sam Earle, Julian Togelius, Steven James, and Christopher Cleghorn. Llmatic: neural architecture search via large language models and quality diversity optimization. Inproceedings of the Genetic and Evolutionary Computation Conference, pages 1110–1118, 2024

  24. [24]

    Gpt-5.https://openai.com/gpt-5, 2025

    OpenAI. Gpt-5.https://openai.com/gpt-5, 2025. Accessed: 2025-09-22

  25. [25]

    Introducing openai o3 and o4-mini

    OpenAI. Introducing openai o3 and o4-mini. https://openai.com/index/ o3-o4-mini-system-card/, April 2025. Accessed: 2025-09-22

  26. [26]

    Regularized evolution for image classifier architecture search

    Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized evolution for image classifier architecture search. InProceedings of the aaai conference on artificial intelligence, volume 33, pages 4780–4789, 2019

  27. [27]

    Large-scale evolution of image classifiers

    Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V Le, and Alexey Kurakin. Large-scale evolution of image classifiers. InInternational conference on machine learning, pages 2902–2911. PMLR, 2017

  28. [28]

    MobileNetV2: Inverted residuals and linear bottlenecks

    Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. MobileNetV2: Inverted residuals and linear bottlenecks. InCVPR, 2018

  29. [29]

    Mnasnet: Platform-aware neural architecture search for mobile

    Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. Mnasnet: Platform-aware neural architecture search for mobile. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2820–2828, 2019

  30. [30]

    Mingxing Tan and Quoc V . Le. Mixconv: Mixed depthwise convolutional kernels. InBMVC, page 74, 2019

  31. [31]

    Qwen3 technical report, 2025

    Qwen Team. Qwen3 technical report, 2025

  32. [32]

    Evolutionary com- putation in the era of large language model: Survey and roadmap.IEEE Transactions on Evolutionary Computation, 2024

    Xingyu Wu, Sheng-hao Wu, Jibin Wu, Liang Feng, and Kay Chen Tan. Evolutionary com- putation in the era of large language model: Survey and roadmap.IEEE Transactions on Evolutionary Computation, 2024

  33. [33]

    Pc-darts: Partial channel connections for memory-efficient architecture search

    Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and Hongkai Xiong. Pc-darts: Partial channel connections for memory-efficient architecture search. InInternational Conference on Learning Representations, 2020

  34. [34]

    Large language models as optimizers

    Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. Large language models as optimizers. InThe Twelfth International Conference on Learning Representations, 2024. 11

  35. [35]

    Greedynas: Towards fast one-shot nas with greedy supernet

    Shan You, Tao Huang, Mingmin Yang, Fei Wang, Chen Qian, and Changshui Zhang. Greedynas: Towards fast one-shot nas with greedy supernet. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1999–2008, 2020

  36. [36]

    Can GPT -4 Perform Neural Architecture Search ?, August 2023

    Mingkai Zheng, Xiu Su, Shan You, Fei Wang, Chen Qian, Chang Xu, and Samuel Albanie. Can gpt-4 perform neural architecture search?arXiv preprint arXiv:2304.10970, 2023

  37. [37]

    Neural Architecture Search with Reinforcement Learning

    Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning.arXiv preprint arXiv:1611.01578, 2016

  38. [38]

    C.2.2 User Prompt

    Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018. 12 A Key Experimental Settings Table 6: Key experimental settings. "same" indicates identical settings to the corresponding...