Towards Code-Oriented LM Embeddings for Surrogate-Assisted Neural Architecture Search
Pith reviewed 2026-05-20 21:01 UTC · model grok-4.3
The pith
Representing neural architectures as PyTorch code lets off-the-shelf language models extract competitive features for performance prediction without NAS-specific training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By turning neural architectures into PyTorch class definition text and extracting embeddings from frozen language models, the authors obtain Code-Oriented LM Embeddings (COLE) that serve as effective inputs to lightweight performance predictors, outperforming alternative text encodings and cutting the evaluation budget by 34 percent in BANANAS searches on CIFAR-100 within NAS-Bench-201.
What carries the argument
Code-Oriented LM Embeddings (COLE): vectors produced by feeding raw PyTorch code strings into a frozen off-the-shelf language model and then through a small regression head that maps the embeddings to predicted accuracy.
If this is right
- Surrogate models for NAS can be built at low cost using only existing language models and standard code representations.
- Any architecture expressible as code becomes immediately usable with the same embedding pipeline.
- Text encodings derived directly from code outperform other text conversions such as ONNX-to-text when the language model remains frozen.
Where Pith is reading between the lines
- The same code-to-embedding route could be tested on other structure-search problems where objects are naturally written as programs.
- Substituting larger or more recent language models might raise predictor accuracy without any additional NAS tuning.
- Combining COLE with search algorithms other than BANANAS could produce further reductions in total evaluations.
Load-bearing premise
Off-the-shelf language models already contain useful inductive bias for extracting performance-related features from plain PyTorch code of neural architectures, with no need for NAS-specific fine-tuning.
What would settle it
A head-to-head run of BANANAS on NAS-Bench-201 for CIFAR-100 in which the COLE-based surrogate requires the same or more evaluations than a structural path-encoding surrogate to reach within 1 percent of the highest test accuracy.
Figures
read the original abstract
Developing effective surrogates (performance predictors) for Neural Architecture Search (NAS) typically requires expensive fine-tuning or the engineering of complex representations. We propose a low-cost embedding strategy that leverages the inductive bias of Language Models (LMs) to eliminate these overheads. By representing architectures as PyTorch class definition text, we demonstrate that off-the-shelf LMs act as competitive feature extractors without NAS-specialized fine-tuning. The final predictor is constructed by passing the extracted Code-Oriented LM Embeddings (COLE) through a lightweight regression head. We also investigate strategies to improve embedding quality and utilization. Our experiments on the NAS-Bench-201 and einspace search spaces reveal that raw code inputs yield higher predictive performance than other text-based encodings (e.g., ONNX-to-text encodings) when using frozen LMs. We also observe COLE drives superior surrogate-assisted search using the BANANAS algorithm in NAS-Bench-201. When optimizing for CIFAR-100 performance, replacing structural path encodings with COLE for architecture representation allows for a 34% decrease in the evaluation budget required to reach within 1% of the fittest architecture in the search space (by test accuracy). As any neural architecture can be represented as code, these findings establish COLE as a versatile and efficient foundation for advancing NAS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Code-Oriented LM Embeddings (COLE) for surrogate-assisted Neural Architecture Search. Architectures are encoded as raw PyTorch class definition text and passed through off-the-shelf frozen language models to produce embeddings, which are then fed to a lightweight regression head for performance prediction. Experiments on NAS-Bench-201 and einspace show raw code inputs outperforming ONNX-to-text encodings, and integration with BANANAS yields a 34% reduction in evaluation budget to reach within 1% of the best test accuracy on CIFAR-100.
Significance. If the results hold under stricter controls, the work would be significant for providing a low-cost, fine-tuning-free method to leverage pre-trained LMs in NAS. Representing architectures as code offers broad applicability across search spaces, and the reported efficiency gain with BANANAS could reduce the computational burden of surrogate-assisted search.
major comments (2)
- [§4.3] §4.3 (BANANAS results on NAS-Bench-201 CIFAR-100): the 34% evaluation budget reduction to reach within 1% of the fittest architecture is reported without error bars, number of independent runs, or explicit description of the structural path encoding baseline and data splits used for surrogate training. This detail is load-bearing for the central empirical claim.
- [§3.1] §3.1 (Embedding quality): the claim that raw PyTorch code yields higher predictive performance than alternative encodings with frozen LMs is not supported by any analysis showing that the embeddings encode NAS-relevant attributes such as operation choice or connectivity rather than generic token statistics. This assumption underpins attribution of the surrogate advantage to COLE.
minor comments (2)
- [Abstract] Abstract: the phrase 'strategies to improve embedding quality and utilization' is mentioned without naming the strategies, which would aid reader comprehension.
- [Figures] Figure captions: several figures comparing encodings would benefit from explicit axis labels indicating whether performance is measured by validation or test accuracy.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive review. The comments highlight important aspects of reproducibility and attribution that we address below. We have revised the manuscript accordingly to strengthen the presentation of our results.
read point-by-point responses
-
Referee: [§4.3] §4.3 (BANANAS results on NAS-Bench-201 CIFAR-100): the 34% evaluation budget reduction to reach within 1% of the fittest architecture is reported without error bars, number of independent runs, or explicit description of the structural path encoding baseline and data splits used for surrogate training. This detail is load-bearing for the central empirical claim.
Authors: We agree that these experimental details are essential for supporting the central claim and ensuring reproducibility. In the revised manuscript, we now report results averaged over five independent runs with different random seeds and include error bars in the relevant figure and table in §4.3. We have also added an explicit description of the structural path encoding baseline (following the original BANANAS implementation) and clarified that the surrogate training data splits adhere to the standard NAS-Bench-201 protocol used in prior work. These revisions appear in Section 4.3 and the experimental setup. revision: yes
-
Referee: [§3.1] §3.1 (Embedding quality): the claim that raw PyTorch code yields higher predictive performance than alternative encodings with frozen LMs is not supported by any analysis showing that the embeddings encode NAS-relevant attributes such as operation choice or connectivity rather than generic token statistics. This assumption underpins attribution of the surrogate advantage to COLE.
Authors: We thank the referee for this observation. Our primary evidence is the consistent outperformance of raw PyTorch code over ONNX-to-text encodings when using identical frozen LMs; because both representations are textual, the performance gap suggests that the advantage arises from how architecture-specific information is captured rather than generic token-level statistics alone. To directly address the request for attribution analysis, the revised manuscript includes additional probing experiments in §3.1 that measure correlations between embedding features and NAS-relevant properties such as operation types and connectivity. These results support that COLE encodes structural attributes relevant to performance prediction. revision: yes
Circularity Check
No circularity: empirical claims rest on external LM embeddings and experimental validation
full rationale
The paper's core pipeline—representing architectures as raw PyTorch code, extracting frozen off-the-shelf LM embeddings (COLE), and training a lightweight regression head—is presented as a low-cost empirical strategy. The 34% evaluation-budget reduction on NAS-Bench-201 CIFAR-100 with BANANAS is reported as an experimental outcome, not derived by construction from fitted parameters or self-referential definitions. No equations, self-citation chains, or ansatz smuggling reduce the surrogate ranking advantage to quantities internal to the paper. The derivation chain is self-contained against external benchmarks (pre-trained LMs and standard NAS-Bench-201 splits).
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Language models possess inductive biases that make them effective feature extractors for neural architecture code without domain-specific fine-tuning.
invented entities (1)
-
Code-Oriented LM Embeddings (COLE)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
By representing architectures as PyTorch class definition text, we demonstrate that off-the-shelf LMs act as competitive feature extractors without NAS-specialized fine-tuning.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We adopt the Pairwise Hinge Loss to optimize for Kendall’s Tau rank correlation
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Mohamed S Abdelfattah, Abhinav Mehrotra, Łukasz Dudziak, and Nicholas Don- ald Lane. 2021. Zero-Cost Proxies for Lightweight NAS. InInternational Confer- ence on Learning Representations
work page 2021
-
[2]
Yash Akhauri and Mohamed S Abdelfattah. 2024. Encodings for Prediction-based Neural Architecture Search. InInternational Conference on Machine Learning. PMLR, 740–759
work page 2024
-
[3]
Xuanyi Dong and Yi Yang. 2020. NAS-Bench-201: Extending the Scope of Re- producible Neural Architecture Search. InInternational Conference on Learning Representations
work page 2020
-
[4]
Linus Ericsson, Miguel Espinosa Minano, Chenhongyi Yang, Antreas Antoniou, Amos J Storkey, Shay Cohen, Steven McDonagh, and Elliot J Crowley. 2024. einspace: Searching for neural architectures from fundamental operations.Ad- vances in Neural Information Processing Systems37 (2024), 1919–1953
work page 2024
-
[5]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Ab- hishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The Llama 3 herd of models. InNeural Information Processing Systems. Curran Associates
work page 2024
-
[6]
Yuxuan Hu, Jihao Liu, Ke Wang, Jinliang Zheng, Weikang Shi, Manyuan Zhang, Qi Dou, Rui Liu, Aojun Zhou, and Hongsheng Li. 2025. LM-Searcher: Cross- domain Neural Architecture Search with LLMs via Unified Numerical Encoding. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 9419–9432
work page 2025
-
[7]
Ganesh Jawahar, Muhammad Abdul-Mageed, Laks Lakshmanan, and Dujian Ding. 2024. Llm performance predictors are good initializers for architecture search. InFindings of the Association for Computational Linguistics: ACL 2024. 10540–10560
work page 2024
-
[8]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable Architecture Search. InInternational Conference on Learning Representations
work page 2019
-
[9]
Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, and Tie-Yan Liu. 2020. Semi-supervised neural architecture search.Advances in Neural Information Processing Systems33 (2020), 10547–10557
work page 2020
-
[10]
Jindi Lv, Yuhao Zhou, Yuxin Tian, Qing Ye, Wentao Feng, and Jiancheng Lv
- [11]
-
[12]
Joe Mellor, Jack Turner, Amos Storkey, and Elliot J Crowley. 2021. Neural architecture search without training. InInternational conference on machine learning. PMLR, 7588–7598
work page 2021
-
[13]
Clint Morris, Michael Jurado, and Jason Zutty. 2024. Llm guided evolution- the automation of models advancing models. InProceedings of the Genetic and Evolutionary Computation Conference. 377–384
work page 2024
-
[14]
Xuefei Ning, Yin Zheng, Tianchen Zhao, Yu Wang, and Huazhong Yang. 2020. A generic graph-based neural architecture encoding scheme for predictor-based nas. InEuropean Conference on Computer Vision. Springer, 189–204
work page 2020
- [15]
-
[16]
Crowley, Jovita Lukasik, and Linus Ericsson
Shiwen Qin, Gabriela Kadlecová, Martin Pilát, Shay B Cohen, Roman Neruda, Elliot J. Crowley, Jovita Lukasik, and Linus Ericsson. 2025. Transferrable Surrogates in Expressive Neural Architecture Search Spaces. InProceedings of the Fourth International Conference on Automated Machine Learning (Pro- ceedings of Machine Learning Research, Vol. 293), Leman Ako...
work page 2025
-
[17]
Simon Schrodi, Danny Stoll, Binxin Ru, Rhea Sukthanker, Thomas Brox, and Frank Hutter. 2023. Construction of hierarchical neural architecture search spaces based on context-free grammars.Advances in Neural Information Processing Systems36 (2023), 23172–23223
work page 2023
-
[18]
Hidenori Tanaka, Daniel Kunin, Daniel L Yamins, and Surya Ganguli. 2020. Pruning neural networks without any data by iteratively conserving synaptic flow.Advances in neural information processing systems33 (2020), 6377–6389
work page 2020
-
[19]
Eric Tang, Bangding Yang, and Xingyou Song. 2025. Understanding LLM Em- beddings for Regression.Transactions on Machine Learning Research(2025)
work page 2025
-
[20]
Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hall- ström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, et al. 2025. Smarter, better, faster, longer: A modern bidirectional en- coder for fast, memory efficient, and long context finetuning and inference. InProceedings of the 63rd Annual Meeting of the...
work page 2025
-
[21]
Colin White, Willie Neiswanger, and Yash Savani. 2021. Bananas: Bayesian opti- mization with neural architectures for neural architecture search. InProceedings of the AAAI conference on artificial intelligence, Vol. 35. 10293–10301
work page 2021
-
[22]
Colin White, Arber Zela, Robin Ru, Yang Liu, and Frank Hutter. 2021. How Powerful are Performance Predictors in Neural Architecture Search?Advances in Neural Information Processing Systems34 (2021)
work page 2021
-
[23]
Tomomasa Yamasaki, Zhehui Wang, Tao Luo, Niangjun Chen, and Bo Wang
-
[24]
RBFleX-NAS: Training-Free Neural Architecture Search Using Radial Basis Function Kernel and Hyperparameter Detection.IEEE Transactions on Neural Networks and Learning Systems(2025)
work page 2025
-
[25]
Shen Yan, Kaiqiang Song, Fei Liu, and Mi Zhang. 2021. Cate: Computation-aware neural architecture encoding with transformers. InInternational Conference on Machine Learning. PMLR, 11670–11681
work page 2021
-
[26]
Shen Yan, Colin White, Yash Savani, and Frank Hutter. 2021. Nas-bench-x11 and the power of learning curves.Advances in Neural information processing systems 34 (2021), 22534–22549
work page 2021
-
[27]
Shen Yan, Yu Zheng, Wei Ao, Xiao Zeng, and Mi Zhang. 2020. Does unsupervised architecture representation learning help neural architecture search?Advances in neural information processing systems33 (2020), 12486–12498
work page 2020
-
[28]
Chris Ying, Aaron Klein, Eric Christiansen, Esteban Real, Kevin Murphy, and Frank Hutter. 2019. Nas-bench-101: Towards reproducible neural architecture search. InInternational conference on machine learning. PMLR, 7105–7114
work page 2019
-
[29]
Arber Zela, Julien Niklas Siems, Lucas Zimmer, Jovita Lukasik, Margret Keuper, and Frank Hutter. 2022. Surrogate NAS Benchmarks: Going Beyond the Lim- ited Search Spaces of Tabular NAS Benchmarks. InInternational Conference on Learning Representations
work page 2022
-
[30]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2018. Learning transferable architectures for scalable image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 8697–8710. Pranav Somu, Advay Balakrishnan, Stepan Kravtsov, Aaron McDaniel, and Jason Zutty A Code Verbosity: Context Add-ons As discussed ...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.