pith. sign in

arxiv: 2605.15649 · v1 · pith:FOMW2GLCnew · submitted 2026-05-15 · 💻 cs.LG · cs.NE

Towards Code-Oriented LM Embeddings for Surrogate-Assisted Neural Architecture Search

Pith reviewed 2026-05-20 21:01 UTC · model grok-4.3

classification 💻 cs.LG cs.NE
keywords neural architecture searchlanguage model embeddingssurrogate performance predictorscode representationsNAS-Bench-201BANANAS algorithmfrozen language models
0
0 comments X

The pith

Representing neural architectures as PyTorch code lets off-the-shelf language models extract competitive features for performance prediction without NAS-specific training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that any neural architecture can be written as a short PyTorch class definition and passed through a frozen language model to produce embeddings that work as inputs to a simple regression head for predicting accuracy. This approach avoids the usual costs of fine-tuning models or hand-crafting structural encodings for surrogate-assisted neural architecture search. On the NAS-Bench-201 space the resulting predictors beat other text-based baselines and, when plugged into the BANANAS search algorithm for CIFAR-100, reach architectures within 1 percent of the best found accuracy using 34 percent fewer evaluations. The central idea is that the inductive bias already present in general-purpose language models is sufficient once the architecture is expressed as ordinary code. If this holds, surrogate construction becomes far cheaper and more portable across different search spaces.

Core claim

By turning neural architectures into PyTorch class definition text and extracting embeddings from frozen language models, the authors obtain Code-Oriented LM Embeddings (COLE) that serve as effective inputs to lightweight performance predictors, outperforming alternative text encodings and cutting the evaluation budget by 34 percent in BANANAS searches on CIFAR-100 within NAS-Bench-201.

What carries the argument

Code-Oriented LM Embeddings (COLE): vectors produced by feeding raw PyTorch code strings into a frozen off-the-shelf language model and then through a small regression head that maps the embeddings to predicted accuracy.

If this is right

  • Surrogate models for NAS can be built at low cost using only existing language models and standard code representations.
  • Any architecture expressible as code becomes immediately usable with the same embedding pipeline.
  • Text encodings derived directly from code outperform other text conversions such as ONNX-to-text when the language model remains frozen.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same code-to-embedding route could be tested on other structure-search problems where objects are naturally written as programs.
  • Substituting larger or more recent language models might raise predictor accuracy without any additional NAS tuning.
  • Combining COLE with search algorithms other than BANANAS could produce further reductions in total evaluations.

Load-bearing premise

Off-the-shelf language models already contain useful inductive bias for extracting performance-related features from plain PyTorch code of neural architectures, with no need for NAS-specific fine-tuning.

What would settle it

A head-to-head run of BANANAS on NAS-Bench-201 for CIFAR-100 in which the COLE-based surrogate requires the same or more evaluations than a structural path-encoding surrogate to reach within 1 percent of the highest test accuracy.

Figures

Figures reproduced from arXiv: 2605.15649 by Aaron McDaniel, Advay Balakrishnan, Jason Zutty, Pranav Somu, Stepan Kravtsov.

Figure 1
Figure 1. Figure 1: Default NAS-Bench-201 string specification repre [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A sample code representation under the Helper [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The Inline mode code representation. The logic [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: NAS trajectory on CIFAR-10 (top), CIFAR-100 (mid [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: The Comment add-on optionally used in COLE. A [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: An excerpt of the Backbone add-on optionally [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 8
Figure 8. Figure 8: The PyTorch code representation for the einspace [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: t-SNE visualizations comparing structural encodings with COLE on NAS-Bench-201. The left column shows the full [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: t-SNE visualizations comparing ONNX-to-text encodings with COLE on a 7,400-architecture subset of NAS-Bench-201. [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: t-SNE visualizations comparing derivation tree strings and COLE on a 2,837-architecture corpus. The layout follows [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
read the original abstract

Developing effective surrogates (performance predictors) for Neural Architecture Search (NAS) typically requires expensive fine-tuning or the engineering of complex representations. We propose a low-cost embedding strategy that leverages the inductive bias of Language Models (LMs) to eliminate these overheads. By representing architectures as PyTorch class definition text, we demonstrate that off-the-shelf LMs act as competitive feature extractors without NAS-specialized fine-tuning. The final predictor is constructed by passing the extracted Code-Oriented LM Embeddings (COLE) through a lightweight regression head. We also investigate strategies to improve embedding quality and utilization. Our experiments on the NAS-Bench-201 and einspace search spaces reveal that raw code inputs yield higher predictive performance than other text-based encodings (e.g., ONNX-to-text encodings) when using frozen LMs. We also observe COLE drives superior surrogate-assisted search using the BANANAS algorithm in NAS-Bench-201. When optimizing for CIFAR-100 performance, replacing structural path encodings with COLE for architecture representation allows for a 34% decrease in the evaluation budget required to reach within 1% of the fittest architecture in the search space (by test accuracy). As any neural architecture can be represented as code, these findings establish COLE as a versatile and efficient foundation for advancing NAS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Code-Oriented LM Embeddings (COLE) for surrogate-assisted Neural Architecture Search. Architectures are encoded as raw PyTorch class definition text and passed through off-the-shelf frozen language models to produce embeddings, which are then fed to a lightweight regression head for performance prediction. Experiments on NAS-Bench-201 and einspace show raw code inputs outperforming ONNX-to-text encodings, and integration with BANANAS yields a 34% reduction in evaluation budget to reach within 1% of the best test accuracy on CIFAR-100.

Significance. If the results hold under stricter controls, the work would be significant for providing a low-cost, fine-tuning-free method to leverage pre-trained LMs in NAS. Representing architectures as code offers broad applicability across search spaces, and the reported efficiency gain with BANANAS could reduce the computational burden of surrogate-assisted search.

major comments (2)
  1. [§4.3] §4.3 (BANANAS results on NAS-Bench-201 CIFAR-100): the 34% evaluation budget reduction to reach within 1% of the fittest architecture is reported without error bars, number of independent runs, or explicit description of the structural path encoding baseline and data splits used for surrogate training. This detail is load-bearing for the central empirical claim.
  2. [§3.1] §3.1 (Embedding quality): the claim that raw PyTorch code yields higher predictive performance than alternative encodings with frozen LMs is not supported by any analysis showing that the embeddings encode NAS-relevant attributes such as operation choice or connectivity rather than generic token statistics. This assumption underpins attribution of the surrogate advantage to COLE.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'strategies to improve embedding quality and utilization' is mentioned without naming the strategies, which would aid reader comprehension.
  2. [Figures] Figure captions: several figures comparing encodings would benefit from explicit axis labels indicating whether performance is measured by validation or test accuracy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments highlight important aspects of reproducibility and attribution that we address below. We have revised the manuscript accordingly to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [§4.3] §4.3 (BANANAS results on NAS-Bench-201 CIFAR-100): the 34% evaluation budget reduction to reach within 1% of the fittest architecture is reported without error bars, number of independent runs, or explicit description of the structural path encoding baseline and data splits used for surrogate training. This detail is load-bearing for the central empirical claim.

    Authors: We agree that these experimental details are essential for supporting the central claim and ensuring reproducibility. In the revised manuscript, we now report results averaged over five independent runs with different random seeds and include error bars in the relevant figure and table in §4.3. We have also added an explicit description of the structural path encoding baseline (following the original BANANAS implementation) and clarified that the surrogate training data splits adhere to the standard NAS-Bench-201 protocol used in prior work. These revisions appear in Section 4.3 and the experimental setup. revision: yes

  2. Referee: [§3.1] §3.1 (Embedding quality): the claim that raw PyTorch code yields higher predictive performance than alternative encodings with frozen LMs is not supported by any analysis showing that the embeddings encode NAS-relevant attributes such as operation choice or connectivity rather than generic token statistics. This assumption underpins attribution of the surrogate advantage to COLE.

    Authors: We thank the referee for this observation. Our primary evidence is the consistent outperformance of raw PyTorch code over ONNX-to-text encodings when using identical frozen LMs; because both representations are textual, the performance gap suggests that the advantage arises from how architecture-specific information is captured rather than generic token-level statistics alone. To directly address the request for attribution analysis, the revised manuscript includes additional probing experiments in §3.1 that measure correlations between embedding features and NAS-relevant properties such as operation types and connectivity. These results support that COLE encodes structural attributes relevant to performance prediction. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external LM embeddings and experimental validation

full rationale

The paper's core pipeline—representing architectures as raw PyTorch code, extracting frozen off-the-shelf LM embeddings (COLE), and training a lightweight regression head—is presented as a low-cost empirical strategy. The 34% evaluation-budget reduction on NAS-Bench-201 CIFAR-100 with BANANAS is reported as an experimental outcome, not derived by construction from fitted parameters or self-referential definitions. No equations, self-citation chains, or ansatz smuggling reduce the surrogate ranking advantage to quantities internal to the paper. The derivation chain is self-contained against external benchmarks (pre-trained LMs and standard NAS-Bench-201 splits).

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the assumption that general language models already encode useful inductive biases for code that transfer to architecture performance prediction; COLE itself is introduced as a derived representation rather than a new physical entity.

axioms (1)
  • domain assumption Language models possess inductive biases that make them effective feature extractors for neural architecture code without domain-specific fine-tuning.
    Invoked when claiming off-the-shelf LMs act as competitive extractors for PyTorch class definitions.
invented entities (1)
  • Code-Oriented LM Embeddings (COLE) no independent evidence
    purpose: To serve as input features for a lightweight regression head that predicts neural architecture performance.
    New term coined for the embeddings produced by passing PyTorch code text through frozen LMs.

pith-pipeline@v0.9.0 · 5781 in / 1314 out tokens · 46516 ms · 2026-05-20T21:01:37.310798+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

  1. [1]

    Mohamed S Abdelfattah, Abhinav Mehrotra, Łukasz Dudziak, and Nicholas Don- ald Lane. 2021. Zero-Cost Proxies for Lightweight NAS. InInternational Confer- ence on Learning Representations

  2. [2]

    Yash Akhauri and Mohamed S Abdelfattah. 2024. Encodings for Prediction-based Neural Architecture Search. InInternational Conference on Machine Learning. PMLR, 740–759

  3. [3]

    Xuanyi Dong and Yi Yang. 2020. NAS-Bench-201: Extending the Scope of Re- producible Neural Architecture Search. InInternational Conference on Learning Representations

  4. [4]

    Linus Ericsson, Miguel Espinosa Minano, Chenhongyi Yang, Antreas Antoniou, Amos J Storkey, Shay Cohen, Steven McDonagh, and Elliot J Crowley. 2024. einspace: Searching for neural architectures from fundamental operations.Ad- vances in Neural Information Processing Systems37 (2024), 1919–1953

  5. [5]

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Ab- hishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The Llama 3 herd of models. InNeural Information Processing Systems. Curran Associates

  6. [6]

    Yuxuan Hu, Jihao Liu, Ke Wang, Jinliang Zheng, Weikang Shi, Manyuan Zhang, Qi Dou, Rui Liu, Aojun Zhou, and Hongsheng Li. 2025. LM-Searcher: Cross- domain Neural Architecture Search with LLMs via Unified Numerical Encoding. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 9419–9432

  7. [7]

    Ganesh Jawahar, Muhammad Abdul-Mageed, Laks Lakshmanan, and Dujian Ding. 2024. Llm performance predictors are good initializers for architecture search. InFindings of the Association for Computational Linguistics: ACL 2024. 10540–10560

  8. [8]

    Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable Architecture Search. InInternational Conference on Learning Representations

  9. [9]

    Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, and Tie-Yan Liu. 2020. Semi-supervised neural architecture search.Advances in Neural Information Processing Systems33 (2020), 10547–10557

  10. [10]

    Jindi Lv, Yuhao Zhou, Yuxin Tian, Qing Ye, Wentao Feng, and Jiancheng Lv

  11. [11]

    HyperNAS: Enhancing Architecture Representation for NAS Predictor via Hypernetwork.arXiv preprint arXiv:2509.18151(2025)

  12. [12]

    Joe Mellor, Jack Turner, Amos Storkey, and Elliot J Crowley. 2021. Neural architecture search without training. InInternational conference on machine learning. PMLR, 7588–7598

  13. [13]

    Clint Morris, Michael Jurado, and Jason Zutty. 2024. Llm guided evolution- the automation of models advancing models. InProceedings of the Genetic and Evolutionary Computation Conference. 377–384

  14. [14]

    Xuefei Ning, Yin Zheng, Tianchen Zhao, Yu Wang, and Huazhong Yang. 2020. A generic graph-based neural architecture encoding scheme for predictor-based nas. InEuropean Conference on Computer Vision. Springer, 189–204

  15. [15]

    Shiwen Qin, Alexander Auras, Shay B Cohen, Elliot J Crowley, Michael Moeller, Linus Ericsson, and Jovita Lukasik. 2025. ONNX-Net: Towards Universal Repre- sentations and Instant Performance Prediction for Neural Architectures.arXiv preprint arXiv:2510.04938(2025)

  16. [16]

    Crowley, Jovita Lukasik, and Linus Ericsson

    Shiwen Qin, Gabriela Kadlecová, Martin Pilát, Shay B Cohen, Roman Neruda, Elliot J. Crowley, Jovita Lukasik, and Linus Ericsson. 2025. Transferrable Surrogates in Expressive Neural Architecture Search Spaces. InProceedings of the Fourth International Conference on Automated Machine Learning (Pro- ceedings of Machine Learning Research, Vol. 293), Leman Ako...

  17. [17]

    Simon Schrodi, Danny Stoll, Binxin Ru, Rhea Sukthanker, Thomas Brox, and Frank Hutter. 2023. Construction of hierarchical neural architecture search spaces based on context-free grammars.Advances in Neural Information Processing Systems36 (2023), 23172–23223

  18. [18]

    Hidenori Tanaka, Daniel Kunin, Daniel L Yamins, and Surya Ganguli. 2020. Pruning neural networks without any data by iteratively conserving synaptic flow.Advances in neural information processing systems33 (2020), 6377–6389

  19. [19]

    Eric Tang, Bangding Yang, and Xingyou Song. 2025. Understanding LLM Em- beddings for Regression.Transactions on Machine Learning Research(2025)

  20. [20]

    Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hall- ström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, et al. 2025. Smarter, better, faster, longer: A modern bidirectional en- coder for fast, memory efficient, and long context finetuning and inference. InProceedings of the 63rd Annual Meeting of the...

  21. [21]

    Colin White, Willie Neiswanger, and Yash Savani. 2021. Bananas: Bayesian opti- mization with neural architectures for neural architecture search. InProceedings of the AAAI conference on artificial intelligence, Vol. 35. 10293–10301

  22. [22]

    Colin White, Arber Zela, Robin Ru, Yang Liu, and Frank Hutter. 2021. How Powerful are Performance Predictors in Neural Architecture Search?Advances in Neural Information Processing Systems34 (2021)

  23. [23]

    Tomomasa Yamasaki, Zhehui Wang, Tao Luo, Niangjun Chen, and Bo Wang

  24. [24]

    RBFleX-NAS: Training-Free Neural Architecture Search Using Radial Basis Function Kernel and Hyperparameter Detection.IEEE Transactions on Neural Networks and Learning Systems(2025)

  25. [25]

    Shen Yan, Kaiqiang Song, Fei Liu, and Mi Zhang. 2021. Cate: Computation-aware neural architecture encoding with transformers. InInternational Conference on Machine Learning. PMLR, 11670–11681

  26. [26]

    Shen Yan, Colin White, Yash Savani, and Frank Hutter. 2021. Nas-bench-x11 and the power of learning curves.Advances in Neural information processing systems 34 (2021), 22534–22549

  27. [27]

    Shen Yan, Yu Zheng, Wei Ao, Xiao Zeng, and Mi Zhang. 2020. Does unsupervised architecture representation learning help neural architecture search?Advances in neural information processing systems33 (2020), 12486–12498

  28. [28]

    Chris Ying, Aaron Klein, Eric Christiansen, Esteban Real, Kevin Murphy, and Frank Hutter. 2019. Nas-bench-101: Towards reproducible neural architecture search. InInternational conference on machine learning. PMLR, 7105–7114

  29. [29]

    Arber Zela, Julien Niklas Siems, Lucas Zimmer, Jovita Lukasik, Margret Keuper, and Frank Hutter. 2022. Surrogate NAS Benchmarks: Going Beyond the Lim- ited Search Spaces of Tabular NAS Benchmarks. InInternational Conference on Learning Representations

  30. [30]

    Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2018. Learning transferable architectures for scalable image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 8697–8710. Pranav Somu, Advay Balakrishnan, Stepan Kravtsov, Aaron McDaniel, and Jason Zutty A Code Verbosity: Context Add-ons As discussed ...