Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

Atsushi Irie; Junji Otsuka; Marcel Gr\"opl; Masakazu Yoshimura; Takeshi Ohashi; Yuiko Sakuma; Zitang Sun

arxiv: 2605.19247 · v1 · pith:I4ISUWJUnew · submitted 2026-05-19 · 💻 cs.CV

Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

Yuiko Sakuma , Masakazu Yoshimura , Marcel Gr\"opl , Zitang Sun , Junji Otsuka , Atsushi Irie , Takeshi Ohashi This is my paper

Pith reviewed 2026-05-20 07:18 UTC · model grok-4.3

classification 💻 cs.CV

keywords neural architecture searchlarge language modelssearch space designknowledge structuringCIFAR-10ImageNetFairNAD

0 comments

The pith

LLMs can structure design knowledge from papers into templates that enable more effective open-ended neural architecture search.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Neural architecture search often fails because its search spaces are too small or biased. The paper proposes using large language models to fill in a high-level structural template by reading research papers, thereby building a large but organized space of possible architectures. It then introduces FairNAD, which explores this space through a combination of fair sampling, Pareto optimization, iterative LLM mutations, and feedback. The result is architectures that achieve higher accuracy than previous methods on standard image datasets. If the approach works, it suggests a way to make NAS more open-ended without losing efficiency.

Core claim

The central claim is that semi-automated design knowledge structuring with LLMs creates a rich and diverse search space from a high-level template populated by analyzing papers. Exploring this space with FairNAD, which uses multi-type mutation including fair idea sampling, Pareto-aware mutation, LLM-driven iterative mutation, and a fine-grained feedback loop, discovers high-performing architectures that improve accuracy by 0.84 points on CIFAR-10, 2.17 on CIFAR-100, and 2.35 on ImageNet16-120 over state-of-the-art methods.

What carries the argument

The high-level structural template of architectural attributes populated by an LLM from papers, which structures the open-ended search space for FairNAD's multi-type mutation exploration.

If this is right

Architectures discovered this way outperform current best methods on image classification tasks.
The structured space reduces the bias and low quality issues in previous LLM-assisted NAS.
Multi-type mutations allow broad and efficient exploration of the large space.
Fine-grained feedback loop helps in refining the search process.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the template captures design knowledge well, similar structuring could improve search in other AI domains like language models or vision transformers.
Expanding the paper analysis to more recent or diverse sources might yield even better search spaces.
Integrating this with hardware-aware search could lead to practical efficient models.

Load-bearing premise

The assumption that an LLM can reliably populate a high-level structural template by analyzing papers to produce a rich, diverse, and unbiased search space that actually contains superior architectures when explored by FairNAD.

What would settle it

A direct comparison where the same FairNAD is run on a manually designed restricted search space versus the LLM-populated one, measuring if the structured version consistently finds better architectures.

Figures

Figures reproduced from arXiv: 2605.19247 by Atsushi Irie, Junji Otsuka, Marcel Gr\"opl, Masakazu Yoshimura, Takeshi Ohashi, Yuiko Sakuma, Zitang Sun.

**Figure 1.** Figure 1: Overview of the proposed NAS. (Top) The model design attribute tree is generated from state-of-the-art (SOTA) models and a structured template. This tree is used to extract high-quality, fine-grained model design knowledge. (Bottom) FairNAD, a LLM-driven framework, then searches for high-performing models using mutation with fair idea sampling, Pareto-aware mutation, LLMdriven iterative mutation, and feed… view at source ↗

**Figure 2.** Figure 2: Example of frequency of model design ideas for (top) feature extracting operations and (bottom) block and connectivity. Extracting model design knowledge from external sources, such as papers, by simply prompting an LLM with a general query like “extract model design ideas from this paper” can lead to outputs heavily biased by the LLM’s internal knowledge and research trends. To illustrate this, we analyze… view at source ↗

**Figure 3.** Figure 3: The difference of the model design ideas between ( [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: FairNAD employs a multi-type mutation to balance exploration and exploitation. (I) Model design ideas are uniformly sampled according to its attributes. (II) To explore models on the Pareto frontier, small models are scaled up, while large models undergo hyperparameter tuning. (III) An LLM agent then iteratively refines high-performing ideas and candidate models. Typical evolutionary searches perform cross… view at source ↗

**Figure 5.** Figure 5: Evolutionary process on CIFAR-100 for searching 500 architectures. [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

read the original abstract

Current neural architecture search (NAS) methods are often limited by their predefined, restrictive search spaces. While recent large language model (LLM)-assisted NAS methods enable open-ended search spaces, they often suffer from inefficient exploration due to biased or low-quality design ideas. To address these issues, we propose to semi-automatically structure model design knowledge to guide the search process. Our approach first defines a high-level structural template of architectural attributes. An LLM then populates this template by analyzing papers, creating a rich and diverse search space that embodies this structured design knowledge. To efficiently explore this vast space, we introduce FairNAD, using a multi-type mutation that enables broad exploration through mutation with fair idea sampling, Pareto-aware mutation, LLM-driven iterative mutation, and a fine-grained feedback loop. We demonstrate the effectiveness of FairNAD in discovering high-performing architectures that yield 0.84, 2.17, and 2.35 points improvement on CIFAR-10, CIFAR-100, and ImageNet16-120, respectively, compared to current state-of-the-art methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper uses LLMs to populate a structural template for open NAS then searches it with FairNAD, reporting modest gains, but lacks ablations that separate the space quality from the searcher.

read the letter

The one thing to take away is that the authors try to fix open-ended NAS by letting an LLM fill a high-level template with design attributes pulled from papers, then explore the resulting space with a new algorithm they call FairNAD. FairNAD mixes multi-type mutation, fair sampling, Pareto awareness, and LLM iterative refinement, and they report accuracy lifts of 0.84, 2.17, and 2.35 points on CIFAR-10, CIFAR-100, and ImageNet16-120 over prior SOTA methods.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a semi-automated method to structure open-ended neural architecture search (NAS) by first defining a high-level structural template of architectural attributes and then using an LLM to populate it through analysis of research papers, thereby generating a rich and diverse search space. It introduces FairNAD, an exploration algorithm employing multi-type mutation (fair idea sampling, Pareto-aware mutation, LLM-driven iterative mutation) together with a fine-grained feedback loop. The central empirical claim is that architectures discovered by this pipeline yield accuracy improvements of 0.84, 2.17, and 2.35 points on CIFAR-10, CIFAR-100, and ImageNet16-120, respectively, relative to current state-of-the-art NAS methods.

Significance. If the performance claims are substantiated with appropriate controls and ablations, the work would represent a meaningful step toward practical open-ended NAS by combining LLM-based knowledge structuring with fairness-aware evolutionary search. The explicit handling of mutation-type probabilities and Pareto awareness addresses known biases in prior evolutionary NAS; the semi-automated template population is a novel angle that could reduce manual design effort while retaining interpretability.

major comments (2)

[Abstract and §4] Abstract and §4 (Experiments): The headline improvements (0.84/2.17/2.35 points) are presented without any information on the number of independent runs, standard deviations, statistical significance tests, or controls for LLM stochasticity. This information is load-bearing for the central performance claim and must be supplied before the gains can be considered reliable.
[§3 and §4] §3 (Method) and §4 (Experiments): No ablation or control experiment isolates the contribution of the LLM-populated structural template from the FairNAD search components. A baseline (e.g., random sampling or standard EA) run inside the identical LLM-structured space would quantify what fraction of the reported gains is due to space quality versus the multi-type mutation and feedback mechanisms; without it the attribution remains ambiguous.

minor comments (2)

[§2] §2 (Related Work): The positioning against other recent LLM-assisted NAS methods could be sharpened by explicitly contrasting the semi-automated template population step with fully automated or prompt-only baselines.
[§3.2] Notation in §3.2: The definitions of “mutation type probabilities” and “sampling fairness weights” are introduced as free parameters; a short sensitivity table or default values would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which has helped us strengthen the empirical rigor of the manuscript. We address each major comment below and have revised the manuscript to incorporate the requested information and additional controls.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): The headline improvements (0.84/2.17/2.35 points) are presented without any information on the number of independent runs, standard deviations, statistical significance tests, or controls for LLM stochasticity. This information is load-bearing for the central performance claim and must be supplied before the gains can be considered reliable.

Authors: We agree that details on run counts, variability, and statistical testing are necessary to substantiate the central claims. In the revised manuscript we will report all headline results as means over five independent runs, accompanied by standard deviations and p-values from paired t-tests against the cited baselines. For LLM stochasticity we used temperature 0.0 during template population and fixed random seeds throughout FairNAD; these controls will be documented explicitly in the updated §4 together with the new statistical summary. revision: yes
Referee: [§3 and §4] §3 (Method) and §4 (Experiments): No ablation or control experiment isolates the contribution of the LLM-populated structural template from the FairNAD search components. A baseline (e.g., random sampling or standard EA) run inside the identical LLM-structured space would quantify what fraction of the reported gains is due to space quality versus the multi-type mutation and feedback mechanisms; without it the attribution remains ambiguous.

Authors: We concur that an ablation isolating the structured space from the search algorithm would clarify attribution. Although the current experiments compare FairNAD against prior methods that employ different spaces, we will add, in the revision, results for both random search and a standard evolutionary algorithm executed inside the identical LLM-populated space. These new baselines will be presented alongside the existing FairNAD results to quantify the incremental benefit of the multi-type mutation and feedback mechanisms. revision: yes

Circularity Check

0 steps flagged

No circularity: results rest on external benchmark comparisons

full rationale

The paper's derivation chain consists of defining a structural template, using an LLM to populate a search space from analyzed papers, and applying the FairNAD algorithm (multi-type mutation, Pareto-aware selection, LLM-driven iteration) to explore it. Reported gains (0.84/2.17/2.35 points on CIFAR-10/100/ImageNet16-120) are obtained by direct comparison against external SOTA methods on fixed public benchmarks. No equations, parameter-fitting steps, or self-citations are shown that would make any claimed result equivalent to its own inputs by construction. The central claims therefore remain independent of the reported outcomes and do not reduce to self-definition or fitted-input renaming.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the premise that LLM analysis of papers yields a high-quality structured search space and that FairNAD's mutation strategy can efficiently locate superior points within it; no explicit free parameters or invented entities are named in the abstract, but implicit tuning of mutation probabilities and LLM prompting choices is likely required.

free parameters (1)

mutation type probabilities and sampling fairness weights
These control the balance among mutation types and are expected to be chosen or tuned to achieve the reported gains.

axioms (1)

domain assumption LLMs can extract and organize architectural design knowledge from papers into a template without introducing systematic bias or hallucinated attributes.
The entire open-ended space is constructed by this LLM population step.

pith-pipeline@v0.9.0 · 5756 in / 1445 out tokens · 44448 ms · 2026-05-20T07:18:54.909960+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a semi-automated model design attribute structuring method that organizes design knowledge into a hierarchical attribute tree... FairNAD, using a multi-type mutation... mutation with fair idea sampling, Pareto-aware mutation, LLM-driven iterative mutation, and a feedback loop.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The top two levels (e.g., granularity and main category) were predefined based on expert knowledge, while the sub-attributes were generated by prompting an LLM.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

92 extracted references · 92 canonical work pages · 5 internal anchors

[1]

[Accessed 27-04-2026]

OpenMMLab — github.com.https://github.com/open-mmlab. [Accessed 27-04-2026]

work page 2026
[2]

Bergstra and Y

J. Bergstra and Y . Bengio. Random search for hyper-parameter optimization.Journal of machine learning research, 13(2), 2012

work page 2012
[3]

H. Cai, L. Zhu, and S. Han. ProxylessNAS: Direct neural architecture search on target task and hardware. InInternational Conference on Learning Representations, 2019

work page 2019
[4]

H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han. Once-for-all: Train one network and specialize it for efficient deployment. InInternational Conference on Learning Representations, 2020

work page 2020
[5]

H. Cai, J. Li, M. Hu, C. Gan, and S. Han. Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction. InProceedings of the IEEE/CVF international conference on computer vision, pages 17302–17313, 2023

work page 2023
[6]

A. Chen, D. Dohan, and D. So. Evoprompting: Language models for code-level neural architecture search. Advances in neural information processing systems, 36:7787–7817, 2023

work page 2023
[7]

M. Chen, H. Peng, J. Fu, and H. Ling. Autoformer: Searching transformers for visual recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12270–12280, 2021

work page 2021
[8]

M. Chen, K. Wu, B. Ni, H. Peng, B. Liu, J. Fu, H. Chao, and H. Ling. Searching the search space of vision transformer.Advances in Neural Information Processing Systems, 34:8714–8726, 2021

work page 2021
[9]

X. Chen, R. Wang, M. Cheng, X. Tang, and C.-J. Hsieh. Drnas: Dirichlet neural architecture search. In International Conference on Learning Representations, 2021

work page 2021
[10]

Cheng, P

J. Cheng, P. Clark, and K. Richardson. Language modeling by language models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025
[11]

A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets

P. Chrabaszcz, I. Loshchilov, and F. Hutter. A downsampled variant of imagenet as an alternative to the cifar datasets.arXiv preprint arXiv:1707.08819, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

X. Chu, B. Zhang, and R. Xu. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. InProceedings of the IEEE/CVF International Conference on computer vision, pages 12239–12248, 2021

work page 2021
[13]

K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii.IEEE transactions on evolutionary computation, 6(2):182–197, 2002

work page 2002
[14]

Dong and Y

X. Dong and Y . Yang. One-shot neural architecture search via self-evaluated template network. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3681–3690, 2019

work page 2019
[15]

Dong and Y

X. Dong and Y . Yang. Searching for a robust neural architecture in four gpu hours. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1761–1770, 2019

work page 2019
[16]

Dong and Y

X. Dong and Y . Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. In International Conference on Learning Representations (ICLR), 2020

work page 2020
[17]

Dosovitskiy, L

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Min- derer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

work page 2021
[18]

Falkner, A

S. Falkner, A. Klein, and F. Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. In International conference on machine learning, pages 1437–1446. PMLR, 2018

work page 2018
[19]

Graham, A

B. Graham, A. El-Nouby, H. Touvron, P. Stock, A. Joulin, H. Jégou, and M. Douze. Levit: a vision transformer in convnet’s clothing for faster inference. InProceedings of the IEEE/CVF international conference on computer vision, pages 12259–12269, 2021

work page 2021
[20]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[21]

Howard, M

A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y . Zhu, R. Pang, V . Vasudevan, et al. Searching for mobilenetv3. InProceedings of the IEEE/CVF international conference on computer vision, pages 1314–1324, 2019. 10

work page 2019
[22]

S. Hu, S. Xie, H. Zheng, C. Liu, J. Shi, X. Liu, and D. Lin. Dsnas: Direct neural architecture search without parameter retraining. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12084–12092, 2020

work page 2020
[23]

B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Dang, et al. Qwen2.5-coder technical report.arXiv preprint arXiv:2409.12186, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

Krizhevsky, G

A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images.Master’s thesis, University of Toronto, 2009

work page 2009
[25]

Y . Li, G. Yuan, Y . Wen, J. Hu, G. Evangelidis, S. Tulyakov, Y . Wang, and J. Ren. EfficientFormer: Vision transformers at mobilenet speed.Advances in neural information processing systems, 35:12934–12949, 2022

work page 2022
[26]

Z. Li, Z. Lin, and Y . Wang. CoLLM-NAS: Collaborative large language models for efficient knowledge- guided neural architecture search.arXiv preprint arXiv:2509.26037, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy. Progressive neural architecture search. InProceedings of the European conference on computer vision (ECCV), pages 19–34, 2018

work page 2018
[28]

H. Liu, K. Simonyan, and Y . Yang. DARTS: Differentiable architecture search. InInternational Conference on Learning Representations, 2019

work page 2019
[29]

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo. Swin Transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

work page 2021
[30]

Z. Liu, H. Mao, C.-Y . Wu, C. Feichtenhofer, T. Darrell, and S. Xie. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022

work page 2022
[31]

N. Ma, X. Zhang, H.-T. Zheng, and J. Sun. ShuffleNet V2: Practical guidelines for efficient cnn architecture design. InProceedings of the European conference on computer vision (ECCV), pages 116–131, 2018

work page 2018
[32]

Mehta and M

S. Mehta and M. Rastegari. MobileVit: Light-weight, general-purpose, and mobile-friendly vision transformer. InInternational Conference on Learning Representations, 2022

work page 2022
[33]

K. G. Mills, D. Niu, M. Salameh, W. Qiu, F. X. Han, P. Liu, J. Zhang, W. Lu, and S. Jui. Aio-p: Expanding neural performance predictors beyond image classification. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 9180–9189, 2023

work page 2023
[34]

K. G. Mills, F. X. Han, M. Salameh, S. Lu, C. Zhou, J. He, F. Sun, and D. Niu. Building optimal neural architectures using interpretable knowledge. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5726–5735, 2024

work page 2024
[35]

Movahedi, M

S. Movahedi, M. Adabinejad, A. Imani, A. Keshavarz, M. Dehghani, A. Shakery, and B. N. Araabi. λ-darts: Mitigating performance collapse by harmonizing operation selection among cells. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[36]

M. U. Nasir, S. Earle, J. Togelius, S. James, and C. Cleghorn. LLMatic: neural architecture search via large language models and quality diversity optimization. Inproceedings of the Genetic and Evolutionary Computation Conference, pages 1110–1118, 2024

work page 2024
[37]

H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean. Efficient neural architecture search via parameters sharing. InInternational conference on machine learning, pages 4095–4104. PMLR, 2018

work page 2018
[38]

M. H. Rahman and P. Chakraborty. LeMo-NADe: Multi-parameter neural architecture discovery with llms. arXiv preprint arXiv:2402.18443, 2024

work page arXiv 2024
[39]

E. Real, A. Aggarwal, Y . Huang, and Q. V . Le. Regularized evolution for image classifier architecture search. InProceedings of the aaai conference on artificial intelligence, volume 33, pages 4780–4789, 2019

work page 2019
[40]

Salameh, K

M. Salameh, K. Mills, N. Hassanpour, F. Han, S. Zhang, W. Lu, S. Jui, C. Zhou, F. Sun, and D. Niu. Autogo: Automated computation graph optimization for neural network evolution.Advances in Neural Information Processing Systems, 36:74455–74477, 2023. 11

work page 2023
[41]

Sandler, A

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. MobileNetV2: Inverted residuals and linear bottlenecks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018

work page 2018
[42]

D. So, Q. Le, and C. Liang. The evolved transformer. InInternational conference on machine learning, pages 5877–5886. PMLR, 2019

work page 2019
[43]

Stamoulis, R

D. Stamoulis, R. Ding, D. Wang, D. Lymberopoulos, B. Priyantha, J. Liu, and D. Marculescu. Single- path nas: Device-aware efficient convnet design. InJoint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations with Industrial Applications (ODML-CDNNRIA) in Conjunction with International Conference on Machine Learning, 2019

work page 2019
[44]

Suganuma, S

M. Suganuma, S. Shirakawa, and T. Nagao. A genetic programming approach to designing convolutional neural network architectures. InProceedings of the genetic and evolutionary computation conference, pages 497–504, 2017

work page 2017
[45]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015

work page 2015
[46]

Tan and Q

M. Tan and Q. Le. EfficientNet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019

work page 2019
[47]

I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, et al. MLP-Mixer: An all-mlp architecture for vision.Advances in neural information processing systems, 34:24261–24272, 2021

work page 2021
[48]

R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3):229–256, 1992

work page 1992
[49]

B. Wu, X. Dai, P. Zhang, Y . Wang, F. Sun, Y . Wu, Y . Tian, P. Vajda, Y . Jia, and K. Keutzer. FBNet: Hardware-aware efficient convnet design via differentiable neural architecture search. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10734–10742, 2019

work page 2019
[50]

S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1492–1500, 2017

work page 2017
[51]

S. Xie, H. Zheng, C. Liu, and L. Lin. SNAS: stochastic neural architecture search. InInternational Conference on Learning Representations, 2019

work page 2019
[52]

Y . Xu, L. Xie, X. Zhang, X. Chen, G.-J. Qi, Q. Tian, and H. Xiong. PC-DARTS: Partial channel connections for memory-efficient architecture search. InInternational Conference on Learning Representations, 2020

work page 2020
[53]

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[54]

Z. Yang, W. Zeng, S. Jin, C. Qian, P. Luo, and W. Liu. Nader: Neural architecture design via multi- agent collaboration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4452–4461, 2025

work page 2025
[55]

P. Ye, B. Li, Y . Li, T. Chen, J. Fan, and W. Ouyang. b-DARTS: Beta-decay regularization for differentiable architecture search. Inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10874–10883, 2022

work page 2022
[56]

LLM as a Tool, Not an Agent: Code-Mined Tree Transformations for Neural Architecture Search

M. Yoshimura, Z. Sun, Y . Sakuma, J. Otsuka, A. Irie, and T. Ohashi. Llm as a tool, not an agent: Code-mined tree transformations for neural architecture search.arXiv preprint arXiv:2604.16555, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[57]

J. Yu, P. Jin, H. Liu, G. Bender, P.-J. Kindermans, M. Tan, T. Huang, X. Song, R. Pang, and Q. Le. BigNAS: Scaling up neural architecture search with big single-stage models. InEuropean Conference on Computer Vision, pages 702–717. Springer, 2020

work page 2020
[58]

Zhang, S

M. Zhang, S. W. Su, S. Pan, X. Chang, E. M. Abbasnejad, and R. Haffari. iDARTS: Differentiable architecture search with stochastic implicit gradients. InInternational Conference on Machine Learning, pages 12557–12566. PMLR, 2021

work page 2021
[59]

Can GPT -4 Perform Neural Architecture Search ?, August 2023

M. Zheng, X. Su, S. You, F. Wang, C. Qian, C. Xu, and S. Albanie. Can gpt-4 perform neural architecture search?arXiv preprint arXiv:2304.10970, 2023. 12

work page arXiv 2023
[60]

X. Zhou, X. Wu, L. Feng, Z. Lu, and K. C. Tan. Design principle transfer in neural architecture search via large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 23000–23008, 2025

work page 2025
[61]

Zoph and Q

B. Zoph and Q. Le. Neural architecture search with reinforcement learning. InInternational Conference on Learning Representations, 2017

work page 2017
[62]

Arch. per stage

B. Zoph, V . Vasudevan, J. Shlens, and Q. V . Le. Learning transferable architectures for scalable image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018. 13 A Experimental Setup Details 15 A.1 NAS-Bench-201 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 A.2 Evol...

work page 2018
[63]

input subtraction pooling

Overly specific attributes: The LLM often fails to follow the instruction to extract general attributes and collects modules existing only in specific models (e.g., “input subtraction pooling”.)

work page
[64]

Inconsistent categorization: The LLM classifies the same attribute into different categories when analyzing different reference models (e.g., ”grouped convolution” appears in multiple categories.)

work page
[65]

dense connectivity for feature reuse

Missing attributes in specific categories: Although specific main categories exist in the manual design, no corresponding attributes exist when analyzing the reference models (e.g., no sub-categories are found for “dense connectivity for feature reuse” in Table 10.) We attribute failures (1) and (2) primarily to the LLM’s capability. Specifically, (1) is ...

work page
[66]

Incomplete generation: The LLM often truncates the output, failing to generate the complete code for complex architectures

work page
[67]

Component hallucination: The model substitutes unknown modules or functions with plausible but non-existent or incorrect alternatives

work page
[68]

Shape mismatch: Tensor shape mismatches frequently occur, particularly when integrating heterogeneous modules such as CNNs and Transformers. 26

work page
[69]

Model downscaling failure: The initially generated model becomes excessively large, causing the subsequent model downscaling step to fail

work page
[70]

Specifically, although the LLM performs well in determining whether the code has been modified, it often fails to determine whether the architecture is multi-layered

Structural verification failure: The LLM incorrectly identifies a valid model as invalid, or an invalid model as valid. Specifically, although the LLM performs well in determining whether the code has been modified, it often fails to determine whether the architecture is multi-layered. We attribute failures (1) and (2) primarily to the resource constraint...

work page
[71]

Model design attribute

and Genesys [ 10]. The graph-based representation defines the module classes or network structures. For example, Genesys [ 10] predefines the GPTblock, a meta module implemented in PyTorch. This module can be factorized into a tree structure of sub-modules to be explored for language models. Genesys builds a module library from external sources, and the m...

work page
[72]

Attributes which improves performance: {attribute_examples_for_performance_improvements}

work page
[73]

convolution

Attributes which improves efficiency: {attribute_examples_for_efficiency_improvements} Try to find attributes not in the above list as well. Constraints: • Be comprehensive • Ensure that each attribute is concise, specific, and clearly describes the model’s key innovations. For example, “convolution” is valid, but “a visual module” is too vague. • Avoid d...

work page
[74]

Feature extraction operators: Core operations used to extract features from data. For example: • Convolution: Improvements such as kernel size design, dilated convolution (expanded receptive field), deformable convolution (spatially adaptive kernels), etc • Self-attention: The core mechanism of Transformers. Includes multi-head atten- tion for multi-persp...

work page
[75]

For example: Batch Normalization, Layer Normalization, Group Normalization, Instance Normal- ization

Normalization: Normalization is essential for stabilizing and accelerating training. For example: Batch Normalization, Layer Normalization, Group Normalization, Instance Normal- ization

work page
[76]

For example: ReLU, Leaky ReLU, GeLU, Swish (SiLU) Block and connectivity level

Activation: Nonlinearity into the network. For example: ReLU, Leaky ReLU, GeLU, Swish (SiLU) Block and connectivity level

work page
[77]

For example: CNN stem, Patch embedding, Positional encoding

Input encoding: Methods to encode input data. For example: CNN stem, Patch embedding, Positional encoding

work page
[78]

For example: residual connections (ResNet), multi-branch structures (inception)

Residual connections and multi-branch architectures: Structures to enhance the diversity of feature extraction. For example: residual connections (ResNet), multi-branch structures (inception)

work page
[79]

For example: element-wise addition, concatenation along channels (DenseNet and Inception), multi-scale feature fusion (U-Net, FPN) 31

Feature fusion and aggregation: Methods to combine features from different network locations (layers or branches). For example: element-wise addition, concatenation along channels (DenseNet and Inception), multi-scale feature fusion (U-Net, FPN) 31

work page
[80]

For example: channel attention (SE block), spatial attention Network level

Adaptive feature recalibration: Attention mechanisms that dynamically learn which information is important. For example: channel attention (SE block), spatial attention Network level

work page

Showing first 80 references.

[1] [1]

[Accessed 27-04-2026]

OpenMMLab — github.com.https://github.com/open-mmlab. [Accessed 27-04-2026]

work page 2026

[2] [2]

Bergstra and Y

J. Bergstra and Y . Bengio. Random search for hyper-parameter optimization.Journal of machine learning research, 13(2), 2012

work page 2012

[3] [3]

H. Cai, L. Zhu, and S. Han. ProxylessNAS: Direct neural architecture search on target task and hardware. InInternational Conference on Learning Representations, 2019

work page 2019

[4] [4]

H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han. Once-for-all: Train one network and specialize it for efficient deployment. InInternational Conference on Learning Representations, 2020

work page 2020

[5] [5]

H. Cai, J. Li, M. Hu, C. Gan, and S. Han. Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction. InProceedings of the IEEE/CVF international conference on computer vision, pages 17302–17313, 2023

work page 2023

[6] [6]

A. Chen, D. Dohan, and D. So. Evoprompting: Language models for code-level neural architecture search. Advances in neural information processing systems, 36:7787–7817, 2023

work page 2023

[7] [7]

M. Chen, H. Peng, J. Fu, and H. Ling. Autoformer: Searching transformers for visual recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12270–12280, 2021

work page 2021

[8] [8]

M. Chen, K. Wu, B. Ni, H. Peng, B. Liu, J. Fu, H. Chao, and H. Ling. Searching the search space of vision transformer.Advances in Neural Information Processing Systems, 34:8714–8726, 2021

work page 2021

[9] [9]

X. Chen, R. Wang, M. Cheng, X. Tang, and C.-J. Hsieh. Drnas: Dirichlet neural architecture search. In International Conference on Learning Representations, 2021

work page 2021

[10] [10]

Cheng, P

J. Cheng, P. Clark, and K. Richardson. Language modeling by language models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

work page 2025

[11] [11]

A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets

P. Chrabaszcz, I. Loshchilov, and F. Hutter. A downsampled variant of imagenet as an alternative to the cifar datasets.arXiv preprint arXiv:1707.08819, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

X. Chu, B. Zhang, and R. Xu. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. InProceedings of the IEEE/CVF International Conference on computer vision, pages 12239–12248, 2021

work page 2021

[13] [13]

K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii.IEEE transactions on evolutionary computation, 6(2):182–197, 2002

work page 2002

[14] [14]

Dong and Y

X. Dong and Y . Yang. One-shot neural architecture search via self-evaluated template network. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3681–3690, 2019

work page 2019

[15] [15]

Dong and Y

X. Dong and Y . Yang. Searching for a robust neural architecture in four gpu hours. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1761–1770, 2019

work page 2019

[16] [16]

Dong and Y

X. Dong and Y . Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. In International Conference on Learning Representations (ICLR), 2020

work page 2020

[17] [17]

Dosovitskiy, L

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Min- derer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021

work page 2021

[18] [18]

Falkner, A

S. Falkner, A. Klein, and F. Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. In International conference on machine learning, pages 1437–1446. PMLR, 2018

work page 2018

[19] [19]

Graham, A

B. Graham, A. El-Nouby, H. Touvron, P. Stock, A. Joulin, H. Jégou, and M. Douze. Levit: a vision transformer in convnet’s clothing for faster inference. InProceedings of the IEEE/CVF international conference on computer vision, pages 12259–12269, 2021

work page 2021

[20] [20]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016

[21] [21]

Howard, M

A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y . Zhu, R. Pang, V . Vasudevan, et al. Searching for mobilenetv3. InProceedings of the IEEE/CVF international conference on computer vision, pages 1314–1324, 2019. 10

work page 2019

[22] [22]

S. Hu, S. Xie, H. Zheng, C. Liu, J. Shi, X. Liu, and D. Lin. Dsnas: Direct neural architecture search without parameter retraining. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12084–12092, 2020

work page 2020

[23] [23]

B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Dang, et al. Qwen2.5-coder technical report.arXiv preprint arXiv:2409.12186, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[24] [24]

Krizhevsky, G

A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images.Master’s thesis, University of Toronto, 2009

work page 2009

[25] [25]

Y . Li, G. Yuan, Y . Wen, J. Hu, G. Evangelidis, S. Tulyakov, Y . Wang, and J. Ren. EfficientFormer: Vision transformers at mobilenet speed.Advances in neural information processing systems, 35:12934–12949, 2022

work page 2022

[26] [26]

Z. Li, Z. Lin, and Y . Wang. CoLLM-NAS: Collaborative large language models for efficient knowledge- guided neural architecture search.arXiv preprint arXiv:2509.26037, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[27] [27]

C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy. Progressive neural architecture search. InProceedings of the European conference on computer vision (ECCV), pages 19–34, 2018

work page 2018

[28] [28]

H. Liu, K. Simonyan, and Y . Yang. DARTS: Differentiable architecture search. InInternational Conference on Learning Representations, 2019

work page 2019

[29] [29]

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo. Swin Transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021

work page 2021

[30] [30]

Z. Liu, H. Mao, C.-Y . Wu, C. Feichtenhofer, T. Darrell, and S. Xie. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022

work page 2022

[31] [31]

N. Ma, X. Zhang, H.-T. Zheng, and J. Sun. ShuffleNet V2: Practical guidelines for efficient cnn architecture design. InProceedings of the European conference on computer vision (ECCV), pages 116–131, 2018

work page 2018

[32] [32]

Mehta and M

S. Mehta and M. Rastegari. MobileVit: Light-weight, general-purpose, and mobile-friendly vision transformer. InInternational Conference on Learning Representations, 2022

work page 2022

[33] [33]

K. G. Mills, D. Niu, M. Salameh, W. Qiu, F. X. Han, P. Liu, J. Zhang, W. Lu, and S. Jui. Aio-p: Expanding neural performance predictors beyond image classification. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 9180–9189, 2023

work page 2023

[34] [34]

K. G. Mills, F. X. Han, M. Salameh, S. Lu, C. Zhou, J. He, F. Sun, and D. Niu. Building optimal neural architectures using interpretable knowledge. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5726–5735, 2024

work page 2024

[35] [35]

Movahedi, M

S. Movahedi, M. Adabinejad, A. Imani, A. Keshavarz, M. Dehghani, A. Shakery, and B. N. Araabi. λ-darts: Mitigating performance collapse by harmonizing operation selection among cells. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023

[36] [36]

M. U. Nasir, S. Earle, J. Togelius, S. James, and C. Cleghorn. LLMatic: neural architecture search via large language models and quality diversity optimization. Inproceedings of the Genetic and Evolutionary Computation Conference, pages 1110–1118, 2024

work page 2024

[37] [37]

H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean. Efficient neural architecture search via parameters sharing. InInternational conference on machine learning, pages 4095–4104. PMLR, 2018

work page 2018

[38] [38]

M. H. Rahman and P. Chakraborty. LeMo-NADe: Multi-parameter neural architecture discovery with llms. arXiv preprint arXiv:2402.18443, 2024

work page arXiv 2024

[39] [39]

E. Real, A. Aggarwal, Y . Huang, and Q. V . Le. Regularized evolution for image classifier architecture search. InProceedings of the aaai conference on artificial intelligence, volume 33, pages 4780–4789, 2019

work page 2019

[40] [40]

Salameh, K

M. Salameh, K. Mills, N. Hassanpour, F. Han, S. Zhang, W. Lu, S. Jui, C. Zhou, F. Sun, and D. Niu. Autogo: Automated computation graph optimization for neural network evolution.Advances in Neural Information Processing Systems, 36:74455–74477, 2023. 11

work page 2023

[41] [41]

Sandler, A

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. MobileNetV2: Inverted residuals and linear bottlenecks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018

work page 2018

[42] [42]

D. So, Q. Le, and C. Liang. The evolved transformer. InInternational conference on machine learning, pages 5877–5886. PMLR, 2019

work page 2019

[43] [43]

Stamoulis, R

D. Stamoulis, R. Ding, D. Wang, D. Lymberopoulos, B. Priyantha, J. Liu, and D. Marculescu. Single- path nas: Device-aware efficient convnet design. InJoint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations with Industrial Applications (ODML-CDNNRIA) in Conjunction with International Conference on Machine Learning, 2019

work page 2019

[44] [44]

Suganuma, S

M. Suganuma, S. Shirakawa, and T. Nagao. A genetic programming approach to designing convolutional neural network architectures. InProceedings of the genetic and evolutionary computation conference, pages 497–504, 2017

work page 2017

[45] [45]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015

work page 2015

[46] [46]

Tan and Q

M. Tan and Q. Le. EfficientNet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019

work page 2019

[47] [47]

I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, et al. MLP-Mixer: An all-mlp architecture for vision.Advances in neural information processing systems, 34:24261–24272, 2021

work page 2021

[48] [48]

R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3):229–256, 1992

work page 1992

[49] [49]

B. Wu, X. Dai, P. Zhang, Y . Wang, F. Sun, Y . Wu, Y . Tian, P. Vajda, Y . Jia, and K. Keutzer. FBNet: Hardware-aware efficient convnet design via differentiable neural architecture search. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10734–10742, 2019

work page 2019

[50] [50]

S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1492–1500, 2017

work page 2017

[51] [51]

S. Xie, H. Zheng, C. Liu, and L. Lin. SNAS: stochastic neural architecture search. InInternational Conference on Learning Representations, 2019

work page 2019

[52] [52]

Y . Xu, L. Xie, X. Zhang, X. Chen, G.-J. Qi, Q. Tian, and H. Xiong. PC-DARTS: Partial channel connections for memory-efficient architecture search. InInternational Conference on Learning Representations, 2020

work page 2020

[53] [53]

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[54] [54]

Z. Yang, W. Zeng, S. Jin, C. Qian, P. Luo, and W. Liu. Nader: Neural architecture design via multi- agent collaboration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4452–4461, 2025

work page 2025

[55] [55]

P. Ye, B. Li, Y . Li, T. Chen, J. Fan, and W. Ouyang. b-DARTS: Beta-decay regularization for differentiable architecture search. Inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10874–10883, 2022

work page 2022

[56] [56]

LLM as a Tool, Not an Agent: Code-Mined Tree Transformations for Neural Architecture Search

M. Yoshimura, Z. Sun, Y . Sakuma, J. Otsuka, A. Irie, and T. Ohashi. Llm as a tool, not an agent: Code-mined tree transformations for neural architecture search.arXiv preprint arXiv:2604.16555, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[57] [57]

J. Yu, P. Jin, H. Liu, G. Bender, P.-J. Kindermans, M. Tan, T. Huang, X. Song, R. Pang, and Q. Le. BigNAS: Scaling up neural architecture search with big single-stage models. InEuropean Conference on Computer Vision, pages 702–717. Springer, 2020

work page 2020

[58] [58]

Zhang, S

M. Zhang, S. W. Su, S. Pan, X. Chang, E. M. Abbasnejad, and R. Haffari. iDARTS: Differentiable architecture search with stochastic implicit gradients. InInternational Conference on Machine Learning, pages 12557–12566. PMLR, 2021

work page 2021

[59] [59]

Can GPT -4 Perform Neural Architecture Search ?, August 2023

M. Zheng, X. Su, S. You, F. Wang, C. Qian, C. Xu, and S. Albanie. Can gpt-4 perform neural architecture search?arXiv preprint arXiv:2304.10970, 2023. 12

work page arXiv 2023

[60] [60]

X. Zhou, X. Wu, L. Feng, Z. Lu, and K. C. Tan. Design principle transfer in neural architecture search via large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 23000–23008, 2025

work page 2025

[61] [61]

Zoph and Q

B. Zoph and Q. Le. Neural architecture search with reinforcement learning. InInternational Conference on Learning Representations, 2017

work page 2017

[62] [62]

Arch. per stage

B. Zoph, V . Vasudevan, J. Shlens, and Q. V . Le. Learning transferable architectures for scalable image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018. 13 A Experimental Setup Details 15 A.1 NAS-Bench-201 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 A.2 Evol...

work page 2018

[63] [63]

input subtraction pooling

Overly specific attributes: The LLM often fails to follow the instruction to extract general attributes and collects modules existing only in specific models (e.g., “input subtraction pooling”.)

work page

[64] [64]

Inconsistent categorization: The LLM classifies the same attribute into different categories when analyzing different reference models (e.g., ”grouped convolution” appears in multiple categories.)

work page

[65] [65]

dense connectivity for feature reuse

Missing attributes in specific categories: Although specific main categories exist in the manual design, no corresponding attributes exist when analyzing the reference models (e.g., no sub-categories are found for “dense connectivity for feature reuse” in Table 10.) We attribute failures (1) and (2) primarily to the LLM’s capability. Specifically, (1) is ...

work page

[66] [66]

Incomplete generation: The LLM often truncates the output, failing to generate the complete code for complex architectures

work page

[67] [67]

Component hallucination: The model substitutes unknown modules or functions with plausible but non-existent or incorrect alternatives

work page

[68] [68]

Shape mismatch: Tensor shape mismatches frequently occur, particularly when integrating heterogeneous modules such as CNNs and Transformers. 26

work page

[69] [69]

Model downscaling failure: The initially generated model becomes excessively large, causing the subsequent model downscaling step to fail

work page

[70] [70]

Specifically, although the LLM performs well in determining whether the code has been modified, it often fails to determine whether the architecture is multi-layered

Structural verification failure: The LLM incorrectly identifies a valid model as invalid, or an invalid model as valid. Specifically, although the LLM performs well in determining whether the code has been modified, it often fails to determine whether the architecture is multi-layered. We attribute failures (1) and (2) primarily to the resource constraint...

work page

[71] [71]

Model design attribute

and Genesys [ 10]. The graph-based representation defines the module classes or network structures. For example, Genesys [ 10] predefines the GPTblock, a meta module implemented in PyTorch. This module can be factorized into a tree structure of sub-modules to be explored for language models. Genesys builds a module library from external sources, and the m...

work page

[72] [72]

Attributes which improves performance: {attribute_examples_for_performance_improvements}

work page

[73] [73]

convolution

Attributes which improves efficiency: {attribute_examples_for_efficiency_improvements} Try to find attributes not in the above list as well. Constraints: • Be comprehensive • Ensure that each attribute is concise, specific, and clearly describes the model’s key innovations. For example, “convolution” is valid, but “a visual module” is too vague. • Avoid d...

work page

[74] [74]

Feature extraction operators: Core operations used to extract features from data. For example: • Convolution: Improvements such as kernel size design, dilated convolution (expanded receptive field), deformable convolution (spatially adaptive kernels), etc • Self-attention: The core mechanism of Transformers. Includes multi-head atten- tion for multi-persp...

work page

[75] [75]

For example: Batch Normalization, Layer Normalization, Group Normalization, Instance Normal- ization

Normalization: Normalization is essential for stabilizing and accelerating training. For example: Batch Normalization, Layer Normalization, Group Normalization, Instance Normal- ization

work page

[76] [76]

For example: ReLU, Leaky ReLU, GeLU, Swish (SiLU) Block and connectivity level

Activation: Nonlinearity into the network. For example: ReLU, Leaky ReLU, GeLU, Swish (SiLU) Block and connectivity level

work page

[77] [77]

For example: CNN stem, Patch embedding, Positional encoding

Input encoding: Methods to encode input data. For example: CNN stem, Patch embedding, Positional encoding

work page

[78] [78]

For example: residual connections (ResNet), multi-branch structures (inception)

Residual connections and multi-branch architectures: Structures to enhance the diversity of feature extraction. For example: residual connections (ResNet), multi-branch structures (inception)

work page

[79] [79]

For example: element-wise addition, concatenation along channels (DenseNet and Inception), multi-scale feature fusion (U-Net, FPN) 31

Feature fusion and aggregation: Methods to combine features from different network locations (layers or branches). For example: element-wise addition, concatenation along channels (DenseNet and Inception), multi-scale feature fusion (U-Net, FPN) 31

work page

[80] [80]

For example: channel attention (SE block), spatial attention Network level

Adaptive feature recalibration: Attention mechanisms that dynamically learn which information is important. For example: channel attention (SE block), spatial attention Network level

work page