Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search
Pith reviewed 2026-05-20 07:18 UTC · model grok-4.3
The pith
LLMs can structure design knowledge from papers into templates that enable more effective open-ended neural architecture search.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that semi-automated design knowledge structuring with LLMs creates a rich and diverse search space from a high-level template populated by analyzing papers. Exploring this space with FairNAD, which uses multi-type mutation including fair idea sampling, Pareto-aware mutation, LLM-driven iterative mutation, and a fine-grained feedback loop, discovers high-performing architectures that improve accuracy by 0.84 points on CIFAR-10, 2.17 on CIFAR-100, and 2.35 on ImageNet16-120 over state-of-the-art methods.
What carries the argument
The high-level structural template of architectural attributes populated by an LLM from papers, which structures the open-ended search space for FairNAD's multi-type mutation exploration.
If this is right
- Architectures discovered this way outperform current best methods on image classification tasks.
- The structured space reduces the bias and low quality issues in previous LLM-assisted NAS.
- Multi-type mutations allow broad and efficient exploration of the large space.
- Fine-grained feedback loop helps in refining the search process.
Where Pith is reading between the lines
- If the template captures design knowledge well, similar structuring could improve search in other AI domains like language models or vision transformers.
- Expanding the paper analysis to more recent or diverse sources might yield even better search spaces.
- Integrating this with hardware-aware search could lead to practical efficient models.
Load-bearing premise
The assumption that an LLM can reliably populate a high-level structural template by analyzing papers to produce a rich, diverse, and unbiased search space that actually contains superior architectures when explored by FairNAD.
What would settle it
A direct comparison where the same FairNAD is run on a manually designed restricted search space versus the LLM-populated one, measuring if the structured version consistently finds better architectures.
Figures
read the original abstract
Current neural architecture search (NAS) methods are often limited by their predefined, restrictive search spaces. While recent large language model (LLM)-assisted NAS methods enable open-ended search spaces, they often suffer from inefficient exploration due to biased or low-quality design ideas. To address these issues, we propose to semi-automatically structure model design knowledge to guide the search process. Our approach first defines a high-level structural template of architectural attributes. An LLM then populates this template by analyzing papers, creating a rich and diverse search space that embodies this structured design knowledge. To efficiently explore this vast space, we introduce FairNAD, using a multi-type mutation that enables broad exploration through mutation with fair idea sampling, Pareto-aware mutation, LLM-driven iterative mutation, and a fine-grained feedback loop. We demonstrate the effectiveness of FairNAD in discovering high-performing architectures that yield 0.84, 2.17, and 2.35 points improvement on CIFAR-10, CIFAR-100, and ImageNet16-120, respectively, compared to current state-of-the-art methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a semi-automated method to structure open-ended neural architecture search (NAS) by first defining a high-level structural template of architectural attributes and then using an LLM to populate it through analysis of research papers, thereby generating a rich and diverse search space. It introduces FairNAD, an exploration algorithm employing multi-type mutation (fair idea sampling, Pareto-aware mutation, LLM-driven iterative mutation) together with a fine-grained feedback loop. The central empirical claim is that architectures discovered by this pipeline yield accuracy improvements of 0.84, 2.17, and 2.35 points on CIFAR-10, CIFAR-100, and ImageNet16-120, respectively, relative to current state-of-the-art NAS methods.
Significance. If the performance claims are substantiated with appropriate controls and ablations, the work would represent a meaningful step toward practical open-ended NAS by combining LLM-based knowledge structuring with fairness-aware evolutionary search. The explicit handling of mutation-type probabilities and Pareto awareness addresses known biases in prior evolutionary NAS; the semi-automated template population is a novel angle that could reduce manual design effort while retaining interpretability.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): The headline improvements (0.84/2.17/2.35 points) are presented without any information on the number of independent runs, standard deviations, statistical significance tests, or controls for LLM stochasticity. This information is load-bearing for the central performance claim and must be supplied before the gains can be considered reliable.
- [§3 and §4] §3 (Method) and §4 (Experiments): No ablation or control experiment isolates the contribution of the LLM-populated structural template from the FairNAD search components. A baseline (e.g., random sampling or standard EA) run inside the identical LLM-structured space would quantify what fraction of the reported gains is due to space quality versus the multi-type mutation and feedback mechanisms; without it the attribution remains ambiguous.
minor comments (2)
- [§2] §2 (Related Work): The positioning against other recent LLM-assisted NAS methods could be sharpened by explicitly contrasting the semi-automated template population step with fully automated or prompt-only baselines.
- [§3.2] Notation in §3.2: The definitions of “mutation type probabilities” and “sampling fairness weights” are introduced as free parameters; a short sensitivity table or default values would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which has helped us strengthen the empirical rigor of the manuscript. We address each major comment below and have revised the manuscript to incorporate the requested information and additional controls.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): The headline improvements (0.84/2.17/2.35 points) are presented without any information on the number of independent runs, standard deviations, statistical significance tests, or controls for LLM stochasticity. This information is load-bearing for the central performance claim and must be supplied before the gains can be considered reliable.
Authors: We agree that details on run counts, variability, and statistical testing are necessary to substantiate the central claims. In the revised manuscript we will report all headline results as means over five independent runs, accompanied by standard deviations and p-values from paired t-tests against the cited baselines. For LLM stochasticity we used temperature 0.0 during template population and fixed random seeds throughout FairNAD; these controls will be documented explicitly in the updated §4 together with the new statistical summary. revision: yes
-
Referee: [§3 and §4] §3 (Method) and §4 (Experiments): No ablation or control experiment isolates the contribution of the LLM-populated structural template from the FairNAD search components. A baseline (e.g., random sampling or standard EA) run inside the identical LLM-structured space would quantify what fraction of the reported gains is due to space quality versus the multi-type mutation and feedback mechanisms; without it the attribution remains ambiguous.
Authors: We concur that an ablation isolating the structured space from the search algorithm would clarify attribution. Although the current experiments compare FairNAD against prior methods that employ different spaces, we will add, in the revision, results for both random search and a standard evolutionary algorithm executed inside the identical LLM-populated space. These new baselines will be presented alongside the existing FairNAD results to quantify the incremental benefit of the multi-type mutation and feedback mechanisms. revision: yes
Circularity Check
No circularity: results rest on external benchmark comparisons
full rationale
The paper's derivation chain consists of defining a structural template, using an LLM to populate a search space from analyzed papers, and applying the FairNAD algorithm (multi-type mutation, Pareto-aware selection, LLM-driven iteration) to explore it. Reported gains (0.84/2.17/2.35 points on CIFAR-10/100/ImageNet16-120) are obtained by direct comparison against external SOTA methods on fixed public benchmarks. No equations, parameter-fitting steps, or self-citations are shown that would make any claimed result equivalent to its own inputs by construction. The central claims therefore remain independent of the reported outcomes and do not reduce to self-definition or fitted-input renaming.
Axiom & Free-Parameter Ledger
free parameters (1)
- mutation type probabilities and sampling fairness weights
axioms (1)
- domain assumption LLMs can extract and organize architectural design knowledge from papers into a template without introducing systematic bias or hallucinated attributes.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce a semi-automated model design attribute structuring method that organizes design knowledge into a hierarchical attribute tree... FairNAD, using a multi-type mutation... mutation with fair idea sampling, Pareto-aware mutation, LLM-driven iterative mutation, and a feedback loop.
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The top two levels (e.g., granularity and main category) were predefined based on expert knowledge, while the sub-attributes were generated by prompting an LLM.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
OpenMMLab — github.com.https://github.com/open-mmlab. [Accessed 27-04-2026]
work page 2026
-
[2]
J. Bergstra and Y . Bengio. Random search for hyper-parameter optimization.Journal of machine learning research, 13(2), 2012
work page 2012
-
[3]
H. Cai, L. Zhu, and S. Han. ProxylessNAS: Direct neural architecture search on target task and hardware. InInternational Conference on Learning Representations, 2019
work page 2019
-
[4]
H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han. Once-for-all: Train one network and specialize it for efficient deployment. InInternational Conference on Learning Representations, 2020
work page 2020
-
[5]
H. Cai, J. Li, M. Hu, C. Gan, and S. Han. Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction. InProceedings of the IEEE/CVF international conference on computer vision, pages 17302–17313, 2023
work page 2023
-
[6]
A. Chen, D. Dohan, and D. So. Evoprompting: Language models for code-level neural architecture search. Advances in neural information processing systems, 36:7787–7817, 2023
work page 2023
-
[7]
M. Chen, H. Peng, J. Fu, and H. Ling. Autoformer: Searching transformers for visual recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12270–12280, 2021
work page 2021
-
[8]
M. Chen, K. Wu, B. Ni, H. Peng, B. Liu, J. Fu, H. Chao, and H. Ling. Searching the search space of vision transformer.Advances in Neural Information Processing Systems, 34:8714–8726, 2021
work page 2021
-
[9]
X. Chen, R. Wang, M. Cheng, X. Tang, and C.-J. Hsieh. Drnas: Dirichlet neural architecture search. In International Conference on Learning Representations, 2021
work page 2021
- [10]
-
[11]
A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets
P. Chrabaszcz, I. Loshchilov, and F. Hutter. A downsampled variant of imagenet as an alternative to the cifar datasets.arXiv preprint arXiv:1707.08819, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[12]
X. Chu, B. Zhang, and R. Xu. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. InProceedings of the IEEE/CVF International Conference on computer vision, pages 12239–12248, 2021
work page 2021
-
[13]
K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii.IEEE transactions on evolutionary computation, 6(2):182–197, 2002
work page 2002
-
[14]
X. Dong and Y . Yang. One-shot neural architecture search via self-evaluated template network. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3681–3690, 2019
work page 2019
-
[15]
X. Dong and Y . Yang. Searching for a robust neural architecture in four gpu hours. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1761–1770, 2019
work page 2019
-
[16]
X. Dong and Y . Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. In International Conference on Learning Representations (ICLR), 2020
work page 2020
-
[17]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Min- derer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021
work page 2021
-
[18]
S. Falkner, A. Klein, and F. Hutter. Bohb: Robust and efficient hyperparameter optimization at scale. In International conference on machine learning, pages 1437–1446. PMLR, 2018
work page 2018
- [19]
-
[20]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016
work page 2016
- [21]
-
[22]
S. Hu, S. Xie, H. Zheng, C. Liu, J. Shi, X. Liu, and D. Lin. Dsnas: Direct neural architecture search without parameter retraining. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12084–12092, 2020
work page 2020
-
[23]
B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Dang, et al. Qwen2.5-coder technical report.arXiv preprint arXiv:2409.12186, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[24]
A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images.Master’s thesis, University of Toronto, 2009
work page 2009
-
[25]
Y . Li, G. Yuan, Y . Wen, J. Hu, G. Evangelidis, S. Tulyakov, Y . Wang, and J. Ren. EfficientFormer: Vision transformers at mobilenet speed.Advances in neural information processing systems, 35:12934–12949, 2022
work page 2022
-
[26]
Z. Li, Z. Lin, and Y . Wang. CoLLM-NAS: Collaborative large language models for efficient knowledge- guided neural architecture search.arXiv preprint arXiv:2509.26037, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[27]
C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy. Progressive neural architecture search. InProceedings of the European conference on computer vision (ECCV), pages 19–34, 2018
work page 2018
-
[28]
H. Liu, K. Simonyan, and Y . Yang. DARTS: Differentiable architecture search. InInternational Conference on Learning Representations, 2019
work page 2019
-
[29]
Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo. Swin Transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021
work page 2021
-
[30]
Z. Liu, H. Mao, C.-Y . Wu, C. Feichtenhofer, T. Darrell, and S. Xie. A ConvNet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022
work page 2022
-
[31]
N. Ma, X. Zhang, H.-T. Zheng, and J. Sun. ShuffleNet V2: Practical guidelines for efficient cnn architecture design. InProceedings of the European conference on computer vision (ECCV), pages 116–131, 2018
work page 2018
-
[32]
S. Mehta and M. Rastegari. MobileVit: Light-weight, general-purpose, and mobile-friendly vision transformer. InInternational Conference on Learning Representations, 2022
work page 2022
-
[33]
K. G. Mills, D. Niu, M. Salameh, W. Qiu, F. X. Han, P. Liu, J. Zhang, W. Lu, and S. Jui. Aio-p: Expanding neural performance predictors beyond image classification. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 9180–9189, 2023
work page 2023
-
[34]
K. G. Mills, F. X. Han, M. Salameh, S. Lu, C. Zhou, J. He, F. Sun, and D. Niu. Building optimal neural architectures using interpretable knowledge. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5726–5735, 2024
work page 2024
-
[35]
S. Movahedi, M. Adabinejad, A. Imani, A. Keshavarz, M. Dehghani, A. Shakery, and B. N. Araabi. λ-darts: Mitigating performance collapse by harmonizing operation selection among cells. InThe Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[36]
M. U. Nasir, S. Earle, J. Togelius, S. James, and C. Cleghorn. LLMatic: neural architecture search via large language models and quality diversity optimization. Inproceedings of the Genetic and Evolutionary Computation Conference, pages 1110–1118, 2024
work page 2024
-
[37]
H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean. Efficient neural architecture search via parameters sharing. InInternational conference on machine learning, pages 4095–4104. PMLR, 2018
work page 2018
- [38]
-
[39]
E. Real, A. Aggarwal, Y . Huang, and Q. V . Le. Regularized evolution for image classifier architecture search. InProceedings of the aaai conference on artificial intelligence, volume 33, pages 4780–4789, 2019
work page 2019
-
[40]
M. Salameh, K. Mills, N. Hassanpour, F. Han, S. Zhang, W. Lu, S. Jui, C. Zhou, F. Sun, and D. Niu. Autogo: Automated computation graph optimization for neural network evolution.Advances in Neural Information Processing Systems, 36:74455–74477, 2023. 11
work page 2023
-
[41]
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. MobileNetV2: Inverted residuals and linear bottlenecks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018
work page 2018
-
[42]
D. So, Q. Le, and C. Liang. The evolved transformer. InInternational conference on machine learning, pages 5877–5886. PMLR, 2019
work page 2019
-
[43]
D. Stamoulis, R. Ding, D. Wang, D. Lymberopoulos, B. Priyantha, J. Liu, and D. Marculescu. Single- path nas: Device-aware efficient convnet design. InJoint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations with Industrial Applications (ODML-CDNNRIA) in Conjunction with International Conference on Machine Learning, 2019
work page 2019
-
[44]
M. Suganuma, S. Shirakawa, and T. Nagao. A genetic programming approach to designing convolutional neural network architectures. InProceedings of the genetic and evolutionary computation conference, pages 497–504, 2017
work page 2017
-
[45]
C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, and A. Rabinovich. Going deeper with convolutions. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015
work page 2015
- [46]
-
[47]
I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, et al. MLP-Mixer: An all-mlp architecture for vision.Advances in neural information processing systems, 34:24261–24272, 2021
work page 2021
-
[48]
R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3):229–256, 1992
work page 1992
-
[49]
B. Wu, X. Dai, P. Zhang, Y . Wang, F. Sun, Y . Wu, Y . Tian, P. Vajda, Y . Jia, and K. Keutzer. FBNet: Hardware-aware efficient convnet design via differentiable neural architecture search. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10734–10742, 2019
work page 2019
-
[50]
S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1492–1500, 2017
work page 2017
-
[51]
S. Xie, H. Zheng, C. Liu, and L. Lin. SNAS: stochastic neural architecture search. InInternational Conference on Learning Representations, 2019
work page 2019
-
[52]
Y . Xu, L. Xie, X. Zhang, X. Chen, G.-J. Qi, Q. Tian, and H. Xiong. PC-DARTS: Partial channel connections for memory-efficient architecture search. InInternational Conference on Learning Representations, 2020
work page 2020
-
[53]
A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[54]
Z. Yang, W. Zeng, S. Jin, C. Qian, P. Luo, and W. Liu. Nader: Neural architecture design via multi- agent collaboration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 4452–4461, 2025
work page 2025
-
[55]
P. Ye, B. Li, Y . Li, T. Chen, J. Fan, and W. Ouyang. b-DARTS: Beta-decay regularization for differentiable architecture search. Inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10874–10883, 2022
work page 2022
-
[56]
LLM as a Tool, Not an Agent: Code-Mined Tree Transformations for Neural Architecture Search
M. Yoshimura, Z. Sun, Y . Sakuma, J. Otsuka, A. Irie, and T. Ohashi. Llm as a tool, not an agent: Code-mined tree transformations for neural architecture search.arXiv preprint arXiv:2604.16555, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[57]
J. Yu, P. Jin, H. Liu, G. Bender, P.-J. Kindermans, M. Tan, T. Huang, X. Song, R. Pang, and Q. Le. BigNAS: Scaling up neural architecture search with big single-stage models. InEuropean Conference on Computer Vision, pages 702–717. Springer, 2020
work page 2020
- [58]
-
[59]
Can GPT -4 Perform Neural Architecture Search ?, August 2023
M. Zheng, X. Su, S. You, F. Wang, C. Qian, C. Xu, and S. Albanie. Can gpt-4 perform neural architecture search?arXiv preprint arXiv:2304.10970, 2023. 12
-
[60]
X. Zhou, X. Wu, L. Feng, Z. Lu, and K. C. Tan. Design principle transfer in neural architecture search via large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 23000–23008, 2025
work page 2025
-
[61]
B. Zoph and Q. Le. Neural architecture search with reinforcement learning. InInternational Conference on Learning Representations, 2017
work page 2017
-
[62]
B. Zoph, V . Vasudevan, J. Shlens, and Q. V . Le. Learning transferable architectures for scalable image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018. 13 A Experimental Setup Details 15 A.1 NAS-Bench-201 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 A.2 Evol...
work page 2018
-
[63]
Overly specific attributes: The LLM often fails to follow the instruction to extract general attributes and collects modules existing only in specific models (e.g., “input subtraction pooling”.)
-
[64]
Inconsistent categorization: The LLM classifies the same attribute into different categories when analyzing different reference models (e.g., ”grouped convolution” appears in multiple categories.)
-
[65]
dense connectivity for feature reuse
Missing attributes in specific categories: Although specific main categories exist in the manual design, no corresponding attributes exist when analyzing the reference models (e.g., no sub-categories are found for “dense connectivity for feature reuse” in Table 10.) We attribute failures (1) and (2) primarily to the LLM’s capability. Specifically, (1) is ...
-
[66]
Incomplete generation: The LLM often truncates the output, failing to generate the complete code for complex architectures
-
[67]
Component hallucination: The model substitutes unknown modules or functions with plausible but non-existent or incorrect alternatives
-
[68]
Shape mismatch: Tensor shape mismatches frequently occur, particularly when integrating heterogeneous modules such as CNNs and Transformers. 26
-
[69]
Model downscaling failure: The initially generated model becomes excessively large, causing the subsequent model downscaling step to fail
-
[70]
Structural verification failure: The LLM incorrectly identifies a valid model as invalid, or an invalid model as valid. Specifically, although the LLM performs well in determining whether the code has been modified, it often fails to determine whether the architecture is multi-layered. We attribute failures (1) and (2) primarily to the resource constraint...
-
[71]
and Genesys [ 10]. The graph-based representation defines the module classes or network structures. For example, Genesys [ 10] predefines the GPTblock, a meta module implemented in PyTorch. This module can be factorized into a tree structure of sub-modules to be explored for language models. Genesys builds a module library from external sources, and the m...
-
[72]
Attributes which improves performance: {attribute_examples_for_performance_improvements}
-
[73]
Attributes which improves efficiency: {attribute_examples_for_efficiency_improvements} Try to find attributes not in the above list as well. Constraints: • Be comprehensive • Ensure that each attribute is concise, specific, and clearly describes the model’s key innovations. For example, “convolution” is valid, but “a visual module” is too vague. • Avoid d...
-
[74]
Feature extraction operators: Core operations used to extract features from data. For example: • Convolution: Improvements such as kernel size design, dilated convolution (expanded receptive field), deformable convolution (spatially adaptive kernels), etc • Self-attention: The core mechanism of Transformers. Includes multi-head atten- tion for multi-persp...
-
[75]
For example: Batch Normalization, Layer Normalization, Group Normalization, Instance Normal- ization
Normalization: Normalization is essential for stabilizing and accelerating training. For example: Batch Normalization, Layer Normalization, Group Normalization, Instance Normal- ization
-
[76]
For example: ReLU, Leaky ReLU, GeLU, Swish (SiLU) Block and connectivity level
Activation: Nonlinearity into the network. For example: ReLU, Leaky ReLU, GeLU, Swish (SiLU) Block and connectivity level
-
[77]
For example: CNN stem, Patch embedding, Positional encoding
Input encoding: Methods to encode input data. For example: CNN stem, Patch embedding, Positional encoding
-
[78]
For example: residual connections (ResNet), multi-branch structures (inception)
Residual connections and multi-branch architectures: Structures to enhance the diversity of feature extraction. For example: residual connections (ResNet), multi-branch structures (inception)
-
[79]
Feature fusion and aggregation: Methods to combine features from different network locations (layers or branches). For example: element-wise addition, concatenation along channels (DenseNet and Inception), multi-scale feature fusion (U-Net, FPN) 31
-
[80]
For example: channel attention (SE block), spatial attention Network level
Adaptive feature recalibration: Attention mechanisms that dynamically learn which information is important. For example: channel attention (SE block), spatial attention Network level
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.