CoLLM-NAS: Collaborative Large Language Models for Efficient Knowledge-Guided Neural Architecture Search
Pith reviewed 2026-05-21 20:58 UTC · model grok-4.3
The pith
A pair of large language models, one steering search direction and one generating candidates, delivers state-of-the-art neural architectures at 4 to 10 times lower search cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CoLLM-NAS is a two-stage NAS framework that uses a stateful Navigator LLM to guide search direction, a stateless Generator LLM to synthesize high-quality candidates, and a Coordinator module to orchestrate inter-LLM communication and manage evaluation processes. The method efficiently guides the search by combining LLMs' inherent knowledge of structured neural architectures with progressive knowledge from iterative feedback and historical trajectory. Experimental results on ImageNet and NAS-Bench-201 show that CoLLM-NAS surpasses existing NAS methods and conventional search algorithms, achieving new state-of-the-art results while significantly reducing search costs by 4--10. Furthermore, CoL
What carries the argument
The CoLLM-NAS two-stage framework consisting of a stateful Navigator LLM for directional guidance, a stateless Generator LLM for candidate synthesis, and a Coordinator that manages communication and feedback integration.
If this is right
- Achieves new state-of-the-art results on ImageNet and NAS-Bench-201 while cutting search costs by a factor of 4 to 10.
- Consistently improves both accuracy and search efficiency when applied to existing two-stage NAS methods such as OFA, SPOS, and AutoFormer.
- Generalizes across multiple search spaces including those for MobileNet, ShuffleNet, and AutoFormer variants.
- Enables knowledge-guided search that avoids many of the invalid architectures produced by prior LLM-NAS approaches.
Where Pith is reading between the lines
- The same division of labor between one model that maintains search state and another that proposes concrete solutions might transfer to automated design problems outside neural networks, such as optimizing compiler passes or molecular structures.
- Groups with modest computing budgets could use the approach to generate competitive custom models without renting large GPU clusters for weeks.
- Explicit separation of directional guidance from candidate generation may offer a template for other multi-agent LLM systems that must balance exploration with concrete output.
Load-bearing premise
The assumption that pairing a stateful Navigator LLM with a stateless Generator LLM and feeding back evaluation results through a Coordinator will consistently produce valid, high-performing architectures without the invalidity or inefficiency problems of earlier LLM-based searches.
What would settle it
Running the method on NAS-Bench-201 and finding that the top architectures discovered do not exceed the accuracy of the best previously reported entries or that total search time remains within a factor of two of standard evolutionary or reinforcement-learning baselines.
Figures
read the original abstract
The integration of Large Language Models (LLMs) with Neural Architecture Search (NAS) has introduced new possibilities for automating the design of neural architectures. However, most existing methods face critical limitations, including architectural invalidity, computational inefficiency, and inferior performance compared to traditional NAS. In this work, we present Collaborative LLM-based NAS (CoLLM-NAS), a two-stage NAS framework with knowledge-guided search driven by two complementary LLMs. Specifically, we propose a stateful Navigator LLM to guide search direction, a stateless Generator LLM to synthesize high-quality candidates, and a Coordinator module to orchestrate inter-LLM communication and manage evaluation processes. CoLLM-NAS efficiently guides the search process by combining LLMs' inherent knowledge of structured neural architectures with progressive knowledge from iterative feedback and historical trajectory. Experimental results on ImageNet and NAS-Bench-201 show that CoLLM-NAS surpasses existing NAS methods and conventional search algorithms, achieving new state-of-the-art results while significantly reducing search costs by 4--10. Furthermore, CoLLM-NAS consistently enhances the performance and efficiency of various two-stage NAS methods (e.g., OFA, SPOS, and AutoFormer) across diverse search spaces (e.g., MobileNet, ShuffleNet, and AutoFormer), demonstrating its excellent generalization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents CoLLM-NAS, a two-stage collaborative LLM-based NAS framework consisting of a stateful Navigator LLM to guide search direction, a stateless Generator LLM to synthesize candidates, and a Coordinator module to orchestrate communication and evaluation. It claims to surpass existing NAS methods and conventional algorithms with new state-of-the-art results on ImageNet and NAS-Bench-201 while reducing search costs by a factor of 4-10, and to generalize by enhancing other two-stage NAS methods (OFA, SPOS, AutoFormer) across search spaces such as MobileNet, ShuffleNet, and AutoFormer.
Significance. If the performance and efficiency results hold under rigorous controls, the work would meaningfully advance LLM-guided NAS by addressing common issues of architectural invalidity and inefficiency through iterative knowledge feedback and inter-LLM coordination. The reported generalization across multiple search spaces and base methods is a positive aspect that could broaden practical applicability of automated architecture design.
major comments (2)
- [Abstract] Abstract: The central claim of 'significantly reducing search costs by 4--10' is load-bearing for the efficiency contribution. The description provides no information on the cost metric (e.g., whether cumulative LLM inference overhead from repeated Navigator-Generator-Coordinator cycles is included versus only architecture training/validation costs on ImageNet or NAS-Bench-201), which directly affects whether the 4-10x advantage over prior methods like SPOS or OFA can be substantiated.
- [Experimental Results] Experimental Results: The abstract asserts SOTA performance and superiority over existing NAS methods, yet supplies no details on baselines, number of runs, statistical significance, error bars, or controls for LLM output stochasticity. These omissions prevent evaluation of the reliability of the reported accuracy gains and generalization claims.
minor comments (2)
- [Abstract] Abstract: The phrasing 'reducing search costs by 4--10' is imprecise; it should read 'by a factor of 4 to 10' for clarity.
- [Abstract] Abstract: The roles of the Navigator, Generator, and Coordinator could be defined in one additional sentence to aid readers new to the collaborative setup.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and detailed comments on our manuscript. We address each major comment point by point below, agreeing that greater clarity on cost metrics and experimental reporting will strengthen the presentation. We commit to incorporating these revisions in the next version of the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim of 'significantly reducing search costs by 4--10' is load-bearing for the efficiency contribution. The description provides no information on the cost metric (e.g., whether cumulative LLM inference overhead from repeated Navigator-Generator-Coordinator cycles is included versus only architecture training/validation costs on ImageNet or NAS-Bench-201), which directly affects whether the 4-10x advantage over prior methods like SPOS or OFA can be substantiated.
Authors: We appreciate this observation on the cost metric. In CoLLM-NAS, the 4-10x reduction refers to the number of architecture evaluations (i.e., training and validation costs on ImageNet or NAS-Bench-201) enabled by the knowledge-guided iterative search, which is the standard metric in the NAS literature for comparing search efficiency against methods like SPOS and OFA. LLM inference overhead is not the primary component and is typically orders of magnitude smaller than GPU training costs, but we acknowledge the abstract does not explicitly clarify this distinction. We will revise the abstract to specify the cost metric and add a dedicated paragraph in the Experimental Results section providing a breakdown of total compute, including relative LLM overhead. revision: yes
-
Referee: [Experimental Results] Experimental Results: The abstract asserts SOTA performance and superiority over existing NAS methods, yet supplies no details on baselines, number of runs, statistical significance, error bars, or controls for LLM output stochasticity. These omissions prevent evaluation of the reliability of the reported accuracy gains and generalization claims.
Authors: Thank you for emphasizing the need for rigorous experimental details. The manuscript already compares against multiple baselines (SPOS, OFA, AutoFormer, and conventional algorithms) with results on ImageNet and NAS-Bench-201, and demonstrates generalization across search spaces. To improve reliability assessment, we will expand the Experimental Results section to report the number of independent runs, include mean performance with standard deviations (error bars), discuss controls for LLM stochasticity (e.g., fixed temperature and repeated sampling), and add statistical significance tests for key comparisons. These additions will be incorporated without altering the core claims. revision: yes
Circularity Check
Empirical NAS framework exhibits no circular derivation
full rationale
The paper presents CoLLM-NAS as an empirical two-stage search procedure combining stateful Navigator LLM, stateless Generator LLM, and Coordinator orchestration. No equations, first-principles derivations, or predictions are claimed that reduce by construction to fitted parameters or self-citations. Performance and cost results are reported from external benchmarks (ImageNet, NAS-Bench-201) and generalization tests on spaces like MobileNet. The method is self-contained against these benchmarks with no load-bearing self-citation chains or ansatz smuggling. This is the expected non-finding for an applied search algorithm rather than a closed mathematical claim.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs possess inherent knowledge of structured neural architectures that can be leveraged to guide search
invented entities (1)
-
Coordinator module
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a stateful Navigator LLM to guide search direction, a stateless Generator LLM to synthesize high-quality candidates, and a Coordinator module to orchestrate inter-LLM communication
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
combining LLMs’ inherent knowledge of structured neural architectures with progressive knowledge from iterative feedback and historical trajectory
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search
Authors structure architectural design knowledge with LLMs to create an open-ended NAS space and introduce FairNAD, which finds architectures improving 0.84, 2.17, and 2.35 points over SOTA on CIFAR-10, CIFAR-100, and...
-
LLM as a Tool, Not an Agent: Code-Mined Tree Transformations for Neural Architecture Search
LLMasTool improves neural architecture search by evolving code-mined hierarchical trees with diversity-guided Bayesian planning and targeted LLM assistance, reporting gains of 0.69, 1.83, and 2.68 points on CIFAR-10, ...
Reference graph
Works this paper leans on
-
[1]
Anthropic. Claude sonnet 4. https://www.anthropic.com/claude, 2025. Accessed: 2025-09-22
work page 2025
-
[2]
Once-for-all: Train one network and specialize it for efficient deployment
Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. Once-for-all: Train one network and specialize it for efficient deployment. InInternational Conference on Learning Representations, 2020
work page 2020
-
[3]
ProxylessNAS: Direct neural architecture search on target task and hardware
Han Cai, Ligeng Zhu, and Song Han. ProxylessNAS: Direct neural architecture search on target task and hardware. InInternational Conference on Learning Representations, 2019
work page 2019
-
[4]
Angelica Chen, David Dohan, and David So. Evoprompting: Language models for code-level neural architecture search.Advances in neural information processing systems, 36:7787–7817, 2023
work page 2023
-
[5]
Autoformer: Searching trans- formers for visual recognition
Minghao Chen, Houwen Peng, Jianlong Fu, and Haibin Ling. Autoformer: Searching trans- formers for visual recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 12270–12280, October 2021
work page 2021
-
[6]
{DARTS}- : Robustly stepping out of performance collapse without indicators
Xiangxiang Chu, Xiaoxing Wang, Bo Zhang, Shun Lu, Xiaolin Wei, and Junchi Yan. {DARTS}- : Robustly stepping out of performance collapse without indicators. InInternational Conference on Learning Representations, 2021
work page 2021
-
[7]
Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search
Xiangxiang Chu, Bo Zhang, and Ruijun Xu. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. InProceedings of the IEEE/CVF International Conference on computer vision, pages 12239–12248, 2021
work page 2021
-
[8]
Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning, 2025
DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning, 2025
work page 2025
-
[9]
Imagenet: A large- scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009
work page 2009
-
[10]
Nas-bench-201: Extending the scope of reproducible neural architecture search
Xuanyi Dong and Yi Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. InInternational Conference on Learning Representations, 2020
work page 2020
-
[11]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[12]
Single path one-shot neural architecture search with uniform sampling
Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. Single path one-shot neural architecture search with uniform sampling. InComputer vision–ECCV 2020: 16th European conference, glasgow, UK, August 23–28, 2020, proceedings, part XVI 16, pages 544–560. Springer, 2020
work page 2020
-
[13]
Sumnas: Supernet with unbiased meta-features for neural architecture search
Hyeonmin Ha, Ji-Hoon Kim, Semin Park, and Byung-Gon Chun. Sumnas: Supernet with unbiased meta-features for neural architecture search. InInternational Conference on Learning Representations, 2022
work page 2022
-
[14]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for mobilenetv3. InProceedings of the IEEE/CVF international conference on computer vision, pages 1314–1324, 2019
work page 2019
-
[15]
Greedynasv2: Greedier search with a greedy path filter
Tao Huang, Shan You, Fei Wang, Chen Qian, Changshui Zhang, Xiaogang Wang, and Chang Xu. Greedynasv2: Greedier search with a greedy path filter. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11902–11911, 2022
work page 2022
-
[16]
Subnet-aware dynamic supernet training for neural architecture search
Jeimin Jeon, Youngmin Oh, Junghyup Lee, Donghyeon Baek, Dohyung Kim, Chanho Eom, and Bumsub Ham. Subnet-aware dynamic supernet training for neural architecture search. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 30137–30146, 2025. 10
work page 2025
-
[17]
Rz-nas: Enhancing llm-guided neural architecture search via reflective zero-cost strategy
Zipeng Ji, Guanghui Zhu, Chunfeng Yuan, and Yihua Huang. Rz-nas: Enhancing llm-guided neural architecture search via reflective zero-cost strategy. InForty-second International Conference on Machine Learning, 2025
work page 2025
-
[18]
Gonzalez, Hao Zhang, and Ion Stoica
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large lan- guage model serving with pagedattention. InProceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023
work page 2023
-
[19]
Random search and reproducibility for neural architecture search
Liam Li and Ameet Talwalkar. Random search and reproducibility for neural architecture search. InUncertainty in artificial intelligence, pages 367–377. PMLR, 2020
work page 2020
-
[20]
DARTS: Differentiable architecture search
Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable architecture search. InInternational Conference on Learning Representations, 2019
work page 2019
-
[21]
Pa&da: Jointly sampling path and data for consistent nas
Shun Lu, Yu Hu, Longxing Yang, Zihao Sun, Jilin Mei, Jianchao Tan, and Chengru Song. Pa&da: Jointly sampling path and data for consistent nas. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11940–11949, 2023
work page 2023
-
[22]
Shufflenet v2: Practical guidelines for efficient cnn architecture design
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. Shufflenet v2: Practical guidelines for efficient cnn architecture design. InProceedings of the European conference on computer vision (ECCV), pages 116–131, 2018
work page 2018
-
[23]
Llmatic: neural architecture search via large language models and quality diversity optimization
Muhammad Umair Nasir, Sam Earle, Julian Togelius, Steven James, and Christopher Cleghorn. Llmatic: neural architecture search via large language models and quality diversity optimization. Inproceedings of the Genetic and Evolutionary Computation Conference, pages 1110–1118, 2024
work page 2024
-
[24]
Gpt-5.https://openai.com/gpt-5, 2025
OpenAI. Gpt-5.https://openai.com/gpt-5, 2025. Accessed: 2025-09-22
work page 2025
-
[25]
Introducing openai o3 and o4-mini
OpenAI. Introducing openai o3 and o4-mini. https://openai.com/index/ o3-o4-mini-system-card/, April 2025. Accessed: 2025-09-22
work page 2025
-
[26]
Regularized evolution for image classifier architecture search
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized evolution for image classifier architecture search. InProceedings of the aaai conference on artificial intelligence, volume 33, pages 4780–4789, 2019
work page 2019
-
[27]
Large-scale evolution of image classifiers
Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V Le, and Alexey Kurakin. Large-scale evolution of image classifiers. InInternational conference on machine learning, pages 2902–2911. PMLR, 2017
work page 2017
-
[28]
MobileNetV2: Inverted residuals and linear bottlenecks
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. MobileNetV2: Inverted residuals and linear bottlenecks. InCVPR, 2018
work page 2018
-
[29]
Mnasnet: Platform-aware neural architecture search for mobile
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V Le. Mnasnet: Platform-aware neural architecture search for mobile. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2820–2828, 2019
work page 2019
-
[30]
Mingxing Tan and Quoc V . Le. Mixconv: Mixed depthwise convolutional kernels. InBMVC, page 74, 2019
work page 2019
- [31]
-
[32]
Xingyu Wu, Sheng-hao Wu, Jibin Wu, Liang Feng, and Kay Chen Tan. Evolutionary com- putation in the era of large language model: Survey and roadmap.IEEE Transactions on Evolutionary Computation, 2024
work page 2024
-
[33]
Pc-darts: Partial channel connections for memory-efficient architecture search
Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and Hongkai Xiong. Pc-darts: Partial channel connections for memory-efficient architecture search. InInternational Conference on Learning Representations, 2020
work page 2020
-
[34]
Large language models as optimizers
Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. Large language models as optimizers. InThe Twelfth International Conference on Learning Representations, 2024. 11
work page 2024
-
[35]
Greedynas: Towards fast one-shot nas with greedy supernet
Shan You, Tao Huang, Mingmin Yang, Fei Wang, Chen Qian, and Changshui Zhang. Greedynas: Towards fast one-shot nas with greedy supernet. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1999–2008, 2020
work page 1999
-
[36]
Can GPT -4 Perform Neural Architecture Search ?, August 2023
Mingkai Zheng, Xiu Su, Shan You, Fei Wang, Chen Qian, Chang Xu, and Samuel Albanie. Can gpt-4 perform neural architecture search?arXiv preprint arXiv:2304.10970, 2023
-
[37]
Neural Architecture Search with Reinforcement Learning
Barret Zoph and Quoc V Le. Neural architecture search with reinforcement learning.arXiv preprint arXiv:1611.01578, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[38]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. Learning transferable architectures for scalable image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018. 12 A Key Experimental Settings Table 6: Key experimental settings. "same" indicates identical settings to the corresponding...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.