pith. sign in

arxiv: 2605.19247 · v1 · pith:I4ISUWJUnew · submitted 2026-05-19 · 💻 cs.CV

Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

Pith reviewed 2026-05-20 07:18 UTC · model grok-4.3

classification 💻 cs.CV
keywords neural architecture searchlarge language modelssearch space designknowledge structuringCIFAR-10ImageNetFairNAD
0
0 comments X

The pith

LLMs can structure design knowledge from papers into templates that enable more effective open-ended neural architecture search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Neural architecture search often fails because its search spaces are too small or biased. The paper proposes using large language models to fill in a high-level structural template by reading research papers, thereby building a large but organized space of possible architectures. It then introduces FairNAD, which explores this space through a combination of fair sampling, Pareto optimization, iterative LLM mutations, and feedback. The result is architectures that achieve higher accuracy than previous methods on standard image datasets. If the approach works, it suggests a way to make NAS more open-ended without losing efficiency.

Core claim

The central claim is that semi-automated design knowledge structuring with LLMs creates a rich and diverse search space from a high-level template populated by analyzing papers. Exploring this space with FairNAD, which uses multi-type mutation including fair idea sampling, Pareto-aware mutation, LLM-driven iterative mutation, and a fine-grained feedback loop, discovers high-performing architectures that improve accuracy by 0.84 points on CIFAR-10, 2.17 on CIFAR-100, and 2.35 on ImageNet16-120 over state-of-the-art methods.

What carries the argument

The high-level structural template of architectural attributes populated by an LLM from papers, which structures the open-ended search space for FairNAD's multi-type mutation exploration.

If this is right

  • Architectures discovered this way outperform current best methods on image classification tasks.
  • The structured space reduces the bias and low quality issues in previous LLM-assisted NAS.
  • Multi-type mutations allow broad and efficient exploration of the large space.
  • Fine-grained feedback loop helps in refining the search process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the template captures design knowledge well, similar structuring could improve search in other AI domains like language models or vision transformers.
  • Expanding the paper analysis to more recent or diverse sources might yield even better search spaces.
  • Integrating this with hardware-aware search could lead to practical efficient models.

Load-bearing premise

The assumption that an LLM can reliably populate a high-level structural template by analyzing papers to produce a rich, diverse, and unbiased search space that actually contains superior architectures when explored by FairNAD.

What would settle it

A direct comparison where the same FairNAD is run on a manually designed restricted search space versus the LLM-populated one, measuring if the structured version consistently finds better architectures.

Figures

Figures reproduced from arXiv: 2605.19247 by Atsushi Irie, Junji Otsuka, Marcel Gr\"opl, Masakazu Yoshimura, Takeshi Ohashi, Yuiko Sakuma, Zitang Sun.

Figure 1
Figure 1. Figure 1: Overview of the proposed NAS. (Top) The model design attribute tree is generated from state-of-the-art (SOTA) models and a structured template. This tree is used to extract high-quality, fine-grained model design knowledge. (Bottom) FairNAD, a LLM-driven framework, then searches for high-performing models using mutation with fair idea sampling, Pareto-aware mutation, LLM￾driven iterative mutation, and feed… view at source ↗
Figure 2
Figure 2. Figure 2: Example of frequency of model design ideas for (top) feature extracting operations and (bottom) block and connectivity. Extracting model design knowledge from external sources, such as papers, by simply prompting an LLM with a general query like “extract model design ideas from this paper” can lead to outputs heavily biased by the LLM’s internal knowledge and research trends. To illustrate this, we analyze… view at source ↗
Figure 3
Figure 3. Figure 3: The difference of the model design ideas between ( [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FairNAD employs a multi-type mutation to balance exploration and exploitation. (I) Model design ideas are uniformly sampled according to its attributes. (II) To explore models on the Pareto frontier, small models are scaled up, while large models undergo hyperparameter tuning. (III) An LLM agent then iteratively refines high-performing ideas and candidate models. Typical evolutionary searches perform cross… view at source ↗
Figure 5
Figure 5. Figure 5: Evolutionary process on CIFAR-100 for searching 500 architectures. [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
read the original abstract

Current neural architecture search (NAS) methods are often limited by their predefined, restrictive search spaces. While recent large language model (LLM)-assisted NAS methods enable open-ended search spaces, they often suffer from inefficient exploration due to biased or low-quality design ideas. To address these issues, we propose to semi-automatically structure model design knowledge to guide the search process. Our approach first defines a high-level structural template of architectural attributes. An LLM then populates this template by analyzing papers, creating a rich and diverse search space that embodies this structured design knowledge. To efficiently explore this vast space, we introduce FairNAD, using a multi-type mutation that enables broad exploration through mutation with fair idea sampling, Pareto-aware mutation, LLM-driven iterative mutation, and a fine-grained feedback loop. We demonstrate the effectiveness of FairNAD in discovering high-performing architectures that yield 0.84, 2.17, and 2.35 points improvement on CIFAR-10, CIFAR-100, and ImageNet16-120, respectively, compared to current state-of-the-art methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a semi-automated method to structure open-ended neural architecture search (NAS) by first defining a high-level structural template of architectural attributes and then using an LLM to populate it through analysis of research papers, thereby generating a rich and diverse search space. It introduces FairNAD, an exploration algorithm employing multi-type mutation (fair idea sampling, Pareto-aware mutation, LLM-driven iterative mutation) together with a fine-grained feedback loop. The central empirical claim is that architectures discovered by this pipeline yield accuracy improvements of 0.84, 2.17, and 2.35 points on CIFAR-10, CIFAR-100, and ImageNet16-120, respectively, relative to current state-of-the-art NAS methods.

Significance. If the performance claims are substantiated with appropriate controls and ablations, the work would represent a meaningful step toward practical open-ended NAS by combining LLM-based knowledge structuring with fairness-aware evolutionary search. The explicit handling of mutation-type probabilities and Pareto awareness addresses known biases in prior evolutionary NAS; the semi-automated template population is a novel angle that could reduce manual design effort while retaining interpretability.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Experiments): The headline improvements (0.84/2.17/2.35 points) are presented without any information on the number of independent runs, standard deviations, statistical significance tests, or controls for LLM stochasticity. This information is load-bearing for the central performance claim and must be supplied before the gains can be considered reliable.
  2. [§3 and §4] §3 (Method) and §4 (Experiments): No ablation or control experiment isolates the contribution of the LLM-populated structural template from the FairNAD search components. A baseline (e.g., random sampling or standard EA) run inside the identical LLM-structured space would quantify what fraction of the reported gains is due to space quality versus the multi-type mutation and feedback mechanisms; without it the attribution remains ambiguous.
minor comments (2)
  1. [§2] §2 (Related Work): The positioning against other recent LLM-assisted NAS methods could be sharpened by explicitly contrasting the semi-automated template population step with fully automated or prompt-only baselines.
  2. [§3.2] Notation in §3.2: The definitions of “mutation type probabilities” and “sampling fairness weights” are introduced as free parameters; a short sensitivity table or default values would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which has helped us strengthen the empirical rigor of the manuscript. We address each major comment below and have revised the manuscript to incorporate the requested information and additional controls.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): The headline improvements (0.84/2.17/2.35 points) are presented without any information on the number of independent runs, standard deviations, statistical significance tests, or controls for LLM stochasticity. This information is load-bearing for the central performance claim and must be supplied before the gains can be considered reliable.

    Authors: We agree that details on run counts, variability, and statistical testing are necessary to substantiate the central claims. In the revised manuscript we will report all headline results as means over five independent runs, accompanied by standard deviations and p-values from paired t-tests against the cited baselines. For LLM stochasticity we used temperature 0.0 during template population and fixed random seeds throughout FairNAD; these controls will be documented explicitly in the updated §4 together with the new statistical summary. revision: yes

  2. Referee: [§3 and §4] §3 (Method) and §4 (Experiments): No ablation or control experiment isolates the contribution of the LLM-populated structural template from the FairNAD search components. A baseline (e.g., random sampling or standard EA) run inside the identical LLM-structured space would quantify what fraction of the reported gains is due to space quality versus the multi-type mutation and feedback mechanisms; without it the attribution remains ambiguous.

    Authors: We concur that an ablation isolating the structured space from the search algorithm would clarify attribution. Although the current experiments compare FairNAD against prior methods that employ different spaces, we will add, in the revision, results for both random search and a standard evolutionary algorithm executed inside the identical LLM-populated space. These new baselines will be presented alongside the existing FairNAD results to quantify the incremental benefit of the multi-type mutation and feedback mechanisms. revision: yes

Circularity Check

0 steps flagged

No circularity: results rest on external benchmark comparisons

full rationale

The paper's derivation chain consists of defining a structural template, using an LLM to populate a search space from analyzed papers, and applying the FairNAD algorithm (multi-type mutation, Pareto-aware selection, LLM-driven iteration) to explore it. Reported gains (0.84/2.17/2.35 points on CIFAR-10/100/ImageNet16-120) are obtained by direct comparison against external SOTA methods on fixed public benchmarks. No equations, parameter-fitting steps, or self-citations are shown that would make any claimed result equivalent to its own inputs by construction. The central claims therefore remain independent of the reported outcomes and do not reduce to self-definition or fitted-input renaming.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the premise that LLM analysis of papers yields a high-quality structured search space and that FairNAD's mutation strategy can efficiently locate superior points within it; no explicit free parameters or invented entities are named in the abstract, but implicit tuning of mutation probabilities and LLM prompting choices is likely required.

free parameters (1)
  • mutation type probabilities and sampling fairness weights
    These control the balance among mutation types and are expected to be chosen or tuned to achieve the reported gains.
axioms (1)
  • domain assumption LLMs can extract and organize architectural design knowledge from papers into a template without introducing systematic bias or hallucinated attributes.
    The entire open-ended space is constructed by this LLM population step.

pith-pipeline@v0.9.0 · 5756 in / 1445 out tokens · 44448 ms · 2026-05-20T07:18:54.909960+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

92 extracted references · 92 canonical work pages · 5 internal anchors

  1. [1]

    [Accessed 27-04-2026]

    OpenMMLab — github.com.https://github.com/open-mmlab. [Accessed 27-04-2026]

  2. [2]

    Bergstra and Y

    J. Bergstra and Y . Bengio. Random search for hyper-parameter optimization.Journal of machine learning research, 13(2), 2012

  3. [3]

    H. Cai, L. Zhu, and S. Han. ProxylessNAS: Direct neural architecture search on target task and hardware. InInternational Conference on Learning Representations, 2019

  4. [4]

    H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han. Once-for-all: Train one network and specialize it for efficient deployment. InInternational Conference on Learning Representations, 2020

  5. [5]

    H. Cai, J. Li, M. Hu, C. Gan, and S. Han. Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction. InProceedings of the IEEE/CVF international conference on computer vision, pages 17302–17313, 2023

  6. [6]

    A. Chen, D. Dohan, and D. So. Evoprompting: Language models for code-level neural architecture search. Advances in neural information processing systems, 36:7787–7817, 2023

  7. [7]

    M. Chen, H. Peng, J. Fu, and H. Ling. Autoformer: Searching transformers for visual recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12270–12280, 2021

  8. [8]

    M. Chen, K. Wu, B. Ni, H. Peng, B. Liu, J. Fu, H. Chao, and H. Ling. Searching the search space of vision transformer.Advances in Neural Information Processing Systems, 34:8714–8726, 2021

  9. [9]

    X. Chen, R. Wang, M. Cheng, X. Tang, and C.-J. Hsieh. Drnas: Dirichlet neural architecture search. In International Conference on Learning Representations, 2021

  10. [10]

    Cheng, P

    J. Cheng, P. Clark, and K. Richardson. Language modeling by language models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

  11. [11]

    A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets

    P. Chrabaszcz, I. Loshchilov, and F. Hutter. A downsampled variant of imagenet as an alternative to the cifar datasets.arXiv preprint arXiv:1707.08819, 2017

  12. [12]

    X. Chu, B. Zhang, and R. Xu. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. InProceedings of the IEEE/CVF International Conference on computer vision, pages 12239–12248, 2021

  13. [13]

    K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii.IEEE transactions on evolutionary computation, 6(2):182–197, 2002

  14. [14]

    Dong and Y

    X. Dong and Y . Yang. One-shot neural architecture search via self-evaluated template network. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3681–3690, 2019

  15. [15]

    Dong and Y

    X. Dong and Y . Yang. Searching for a robust neural architecture in four gpu hours. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1761–1770, 2019

  16. [16]

    Dong and Y

    X. Dong and Y . Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. In International Conference on Learning Representations (ICLR), 2020

  17. [17]

    Dosovitskiy, L

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Min- derer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

  18. [18]

    Falkner, A

    S. Falkner, A. Klein, and F. Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. In International conference on machine learning, pages 1437–1446. PMLR, 2018

  19. [19]

    Graham, A

    B. Graham, A. El-Nouby, H. Touvron, P. Stock, A. Joulin, H. Jégou, and M. Douze. Levit: a vision transformer in convnet’s clothing for faster inference. InProceedings of the IEEE/CVF international conference on computer vision, pages 12259–12269, 2021

  20. [20]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

  21. [21]

    Howard, M

    A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y . Zhu, R. Pang, V . Vasudevan, et al. Searching for mobilenetv3. InProceedings of the IEEE/CVF international conference on computer vision, pages 1314–1324, 2019. 10

  22. [22]

    S. Hu, S. Xie, H. Zheng, C. Liu, J. Shi, X. Liu, and D. Lin. Dsnas: Direct neural architecture search without parameter retraining. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12084–12092, 2020

  23. [23]

    B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Dang, et al. Qwen2.5-coder technical report.arXiv preprint arXiv:2409.12186, 2024

  24. [24]

    Krizhevsky, G

    A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images.Master’s thesis, University of Toronto, 2009

  25. [25]

    Y . Li, G. Yuan, Y . Wen, J. Hu, G. Evangelidis, S. Tulyakov, Y . Wang, and J. Ren. EfficientFormer: Vision transformers at mobilenet speed.Advances in neural information processing systems, 35:12934–12949, 2022

  26. [26]

    Z. Li, Z. Lin, and Y . Wang. CoLLM-NAS: Collaborative large language models for efficient knowledge- guided neural architecture search.arXiv preprint arXiv:2509.26037, 2025

  27. [27]

    C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy. Progressive neural architecture search. InProceedings of the European conference on computer vision (ECCV), pages 19–34, 2018

  28. [28]

    H. Liu, K. Simonyan, and Y . Yang. DARTS: Differentiable architecture search. InInternational Conference on Learning Representations, 2019

  29. [29]

    Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo. Swin Transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

  30. [30]

    Z. Liu, H. Mao, C.-Y . Wu, C. Feichtenhofer, T. Darrell, and S. Xie. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022

  31. [31]

    N. Ma, X. Zhang, H.-T. Zheng, and J. Sun. ShuffleNet V2: Practical guidelines for efficient cnn architecture design. InProceedings of the European conference on computer vision (ECCV), pages 116–131, 2018

  32. [32]

    Mehta and M

    S. Mehta and M. Rastegari. MobileVit: Light-weight, general-purpose, and mobile-friendly vision transformer. InInternational Conference on Learning Representations, 2022

  33. [33]

    K. G. Mills, D. Niu, M. Salameh, W. Qiu, F. X. Han, P. Liu, J. Zhang, W. Lu, and S. Jui. Aio-p: Expanding neural performance predictors beyond image classification. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 9180–9189, 2023

  34. [34]

    K. G. Mills, F. X. Han, M. Salameh, S. Lu, C. Zhou, J. He, F. Sun, and D. Niu. Building optimal neural architectures using interpretable knowledge. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5726–5735, 2024

  35. [35]

    Movahedi, M

    S. Movahedi, M. Adabinejad, A. Imani, A. Keshavarz, M. Dehghani, A. Shakery, and B. N. Araabi. λ-darts: Mitigating performance collapse by harmonizing operation selection among cells. InThe Eleventh International Conference on Learning Representations, 2023

  36. [36]

    M. U. Nasir, S. Earle, J. Togelius, S. James, and C. Cleghorn. LLMatic: neural architecture search via large language models and quality diversity optimization. Inproceedings of the Genetic and Evolutionary Computation Conference, pages 1110–1118, 2024

  37. [37]

    H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean. Efficient neural architecture search via parameters sharing. InInternational conference on machine learning, pages 4095–4104. PMLR, 2018

  38. [38]

    M. H. Rahman and P. Chakraborty. LeMo-NADe: Multi-parameter neural architecture discovery with llms. arXiv preprint arXiv:2402.18443, 2024

  39. [39]

    E. Real, A. Aggarwal, Y . Huang, and Q. V . Le. Regularized evolution for image classifier architecture search. InProceedings of the aaai conference on artificial intelligence, volume 33, pages 4780–4789, 2019

  40. [40]

    Salameh, K

    M. Salameh, K. Mills, N. Hassanpour, F. Han, S. Zhang, W. Lu, S. Jui, C. Zhou, F. Sun, and D. Niu. Autogo: Automated computation graph optimization for neural network evolution.Advances in Neural Information Processing Systems, 36:74455–74477, 2023. 11

  41. [41]

    Sandler, A

    M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. MobileNetV2: Inverted residuals and linear bottlenecks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018

  42. [42]

    D. So, Q. Le, and C. Liang. The evolved transformer. InInternational conference on machine learning, pages 5877–5886. PMLR, 2019

  43. [43]

    Stamoulis, R

    D. Stamoulis, R. Ding, D. Wang, D. Lymberopoulos, B. Priyantha, J. Liu, and D. Marculescu. Single- path nas: Device-aware efficient convnet design. InJoint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations with Industrial Applications (ODML-CDNNRIA) in Conjunction with International Conference on Machine Learning, 2019

  44. [44]

    Suganuma, S

    M. Suganuma, S. Shirakawa, and T. Nagao. A genetic programming approach to designing convolutional neural network architectures. InProceedings of the genetic and evolutionary computation conference, pages 497–504, 2017

  45. [45]

    Szegedy, W

    C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015

  46. [46]

    Tan and Q

    M. Tan and Q. Le. EfficientNet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019

  47. [47]

    I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, et al. MLP-Mixer: An all-mlp architecture for vision.Advances in neural information processing systems, 34:24261–24272, 2021

  48. [48]

    R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3):229–256, 1992

  49. [49]

    B. Wu, X. Dai, P. Zhang, Y . Wang, F. Sun, Y . Wu, Y . Tian, P. Vajda, Y . Jia, and K. Keutzer. FBNet: Hardware-aware efficient convnet design via differentiable neural architecture search. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10734–10742, 2019

  50. [50]

    S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1492–1500, 2017

  51. [51]

    S. Xie, H. Zheng, C. Liu, and L. Lin. SNAS: stochastic neural architecture search. InInternational Conference on Learning Representations, 2019

  52. [52]

    Y . Xu, L. Xie, X. Zhang, X. Chen, G.-J. Qi, Q. Tian, and H. Xiong. PC-DARTS: Partial channel connections for memory-efficient architecture search. InInternational Conference on Learning Representations, 2020

  53. [53]

    A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

  54. [54]

    Z. Yang, W. Zeng, S. Jin, C. Qian, P. Luo, and W. Liu. Nader: Neural architecture design via multi- agent collaboration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4452–4461, 2025

  55. [55]

    P. Ye, B. Li, Y . Li, T. Chen, J. Fan, and W. Ouyang. b-DARTS: Beta-decay regularization for differentiable architecture search. Inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10874–10883, 2022

  56. [56]

    LLM as a Tool, Not an Agent: Code-Mined Tree Transformations for Neural Architecture Search

    M. Yoshimura, Z. Sun, Y . Sakuma, J. Otsuka, A. Irie, and T. Ohashi. Llm as a tool, not an agent: Code-mined tree transformations for neural architecture search.arXiv preprint arXiv:2604.16555, 2026

  57. [57]

    J. Yu, P. Jin, H. Liu, G. Bender, P.-J. Kindermans, M. Tan, T. Huang, X. Song, R. Pang, and Q. Le. BigNAS: Scaling up neural architecture search with big single-stage models. InEuropean Conference on Computer Vision, pages 702–717. Springer, 2020

  58. [58]

    Zhang, S

    M. Zhang, S. W. Su, S. Pan, X. Chang, E. M. Abbasnejad, and R. Haffari. iDARTS: Differentiable architecture search with stochastic implicit gradients. InInternational Conference on Machine Learning, pages 12557–12566. PMLR, 2021

  59. [59]

    Can GPT -4 Perform Neural Architecture Search ?, August 2023

    M. Zheng, X. Su, S. You, F. Wang, C. Qian, C. Xu, and S. Albanie. Can gpt-4 perform neural architecture search?arXiv preprint arXiv:2304.10970, 2023. 12

  60. [60]

    X. Zhou, X. Wu, L. Feng, Z. Lu, and K. C. Tan. Design principle transfer in neural architecture search via large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 23000–23008, 2025

  61. [61]

    Zoph and Q

    B. Zoph and Q. Le. Neural architecture search with reinforcement learning. InInternational Conference on Learning Representations, 2017

  62. [62]

    Arch. per stage

    B. Zoph, V . Vasudevan, J. Shlens, and Q. V . Le. Learning transferable architectures for scalable image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018. 13 A Experimental Setup Details 15 A.1 NAS-Bench-201 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 A.2 Evol...

  63. [63]

    input subtraction pooling

    Overly specific attributes: The LLM often fails to follow the instruction to extract general attributes and collects modules existing only in specific models (e.g., “input subtraction pooling”.)

  64. [64]

    Inconsistent categorization: The LLM classifies the same attribute into different categories when analyzing different reference models (e.g., ”grouped convolution” appears in multiple categories.)

  65. [65]

    dense connectivity for feature reuse

    Missing attributes in specific categories: Although specific main categories exist in the manual design, no corresponding attributes exist when analyzing the reference models (e.g., no sub-categories are found for “dense connectivity for feature reuse” in Table 10.) We attribute failures (1) and (2) primarily to the LLM’s capability. Specifically, (1) is ...

  66. [66]

    Incomplete generation: The LLM often truncates the output, failing to generate the complete code for complex architectures

  67. [67]

    Component hallucination: The model substitutes unknown modules or functions with plausible but non-existent or incorrect alternatives

  68. [68]

    Shape mismatch: Tensor shape mismatches frequently occur, particularly when integrating heterogeneous modules such as CNNs and Transformers. 26

  69. [69]

    Model downscaling failure: The initially generated model becomes excessively large, causing the subsequent model downscaling step to fail

  70. [70]

    Specifically, although the LLM performs well in determining whether the code has been modified, it often fails to determine whether the architecture is multi-layered

    Structural verification failure: The LLM incorrectly identifies a valid model as invalid, or an invalid model as valid. Specifically, although the LLM performs well in determining whether the code has been modified, it often fails to determine whether the architecture is multi-layered. We attribute failures (1) and (2) primarily to the resource constraint...

  71. [71]

    Model design attribute

    and Genesys [ 10]. The graph-based representation defines the module classes or network structures. For example, Genesys [ 10] predefines the GPTblock, a meta module implemented in PyTorch. This module can be factorized into a tree structure of sub-modules to be explored for language models. Genesys builds a module library from external sources, and the m...

  72. [72]

    Attributes which improves performance: {attribute_examples_for_performance_improvements}

  73. [73]

    convolution

    Attributes which improves efficiency: {attribute_examples_for_efficiency_improvements} Try to find attributes not in the above list as well. Constraints: • Be comprehensive • Ensure that each attribute is concise, specific, and clearly describes the model’s key innovations. For example, “convolution” is valid, but “a visual module” is too vague. • Avoid d...

  74. [74]

    Feature extraction operators: Core operations used to extract features from data. For example: • Convolution: Improvements such as kernel size design, dilated convolution (expanded receptive field), deformable convolution (spatially adaptive kernels), etc • Self-attention: The core mechanism of Transformers. Includes multi-head atten- tion for multi-persp...

  75. [75]

    For example: Batch Normalization, Layer Normalization, Group Normalization, Instance Normal- ization

    Normalization: Normalization is essential for stabilizing and accelerating training. For example: Batch Normalization, Layer Normalization, Group Normalization, Instance Normal- ization

  76. [76]

    For example: ReLU, Leaky ReLU, GeLU, Swish (SiLU) Block and connectivity level

    Activation: Nonlinearity into the network. For example: ReLU, Leaky ReLU, GeLU, Swish (SiLU) Block and connectivity level

  77. [77]

    For example: CNN stem, Patch embedding, Positional encoding

    Input encoding: Methods to encode input data. For example: CNN stem, Patch embedding, Positional encoding

  78. [78]

    For example: residual connections (ResNet), multi-branch structures (inception)

    Residual connections and multi-branch architectures: Structures to enhance the diversity of feature extraction. For example: residual connections (ResNet), multi-branch structures (inception)

  79. [79]

    For example: element-wise addition, concatenation along channels (DenseNet and Inception), multi-scale feature fusion (U-Net, FPN) 31

    Feature fusion and aggregation: Methods to combine features from different network locations (layers or branches). For example: element-wise addition, concatenation along channels (DenseNet and Inception), multi-scale feature fusion (U-Net, FPN) 31

  80. [80]

    For example: channel attention (SE block), spatial attention Network level

    Adaptive feature recalibration: Attention mechanisms that dynamically learn which information is important. For example: channel attention (SE block), spatial attention Network level

Showing first 80 references.