Towards Code-Oriented LM Embeddings for Surrogate-Assisted Neural Architecture Search

Aaron McDaniel; Advay Balakrishnan; Jason Zutty; Pranav Somu; Stepan Kravtsov

arxiv: 2605.15649 · v1 · pith:FOMW2GLCnew · submitted 2026-05-15 · 💻 cs.LG · cs.NE

Towards Code-Oriented LM Embeddings for Surrogate-Assisted Neural Architecture Search

Pranav Somu , Advay Balakrishnan , Stepan Kravtsov , Aaron McDaniel , Jason Zutty This is my paper

Pith reviewed 2026-05-20 21:01 UTC · model grok-4.3

classification 💻 cs.LG cs.NE

keywords neural architecture searchlanguage model embeddingssurrogate performance predictorscode representationsNAS-Bench-201BANANAS algorithmfrozen language models

0 comments

The pith

Representing neural architectures as PyTorch code lets off-the-shelf language models extract competitive features for performance prediction without NAS-specific training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that any neural architecture can be written as a short PyTorch class definition and passed through a frozen language model to produce embeddings that work as inputs to a simple regression head for predicting accuracy. This approach avoids the usual costs of fine-tuning models or hand-crafting structural encodings for surrogate-assisted neural architecture search. On the NAS-Bench-201 space the resulting predictors beat other text-based baselines and, when plugged into the BANANAS search algorithm for CIFAR-100, reach architectures within 1 percent of the best found accuracy using 34 percent fewer evaluations. The central idea is that the inductive bias already present in general-purpose language models is sufficient once the architecture is expressed as ordinary code. If this holds, surrogate construction becomes far cheaper and more portable across different search spaces.

Core claim

By turning neural architectures into PyTorch class definition text and extracting embeddings from frozen language models, the authors obtain Code-Oriented LM Embeddings (COLE) that serve as effective inputs to lightweight performance predictors, outperforming alternative text encodings and cutting the evaluation budget by 34 percent in BANANAS searches on CIFAR-100 within NAS-Bench-201.

What carries the argument

Code-Oriented LM Embeddings (COLE): vectors produced by feeding raw PyTorch code strings into a frozen off-the-shelf language model and then through a small regression head that maps the embeddings to predicted accuracy.

If this is right

Surrogate models for NAS can be built at low cost using only existing language models and standard code representations.
Any architecture expressible as code becomes immediately usable with the same embedding pipeline.
Text encodings derived directly from code outperform other text conversions such as ONNX-to-text when the language model remains frozen.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same code-to-embedding route could be tested on other structure-search problems where objects are naturally written as programs.
Substituting larger or more recent language models might raise predictor accuracy without any additional NAS tuning.
Combining COLE with search algorithms other than BANANAS could produce further reductions in total evaluations.

Load-bearing premise

Off-the-shelf language models already contain useful inductive bias for extracting performance-related features from plain PyTorch code of neural architectures, with no need for NAS-specific fine-tuning.

What would settle it

A head-to-head run of BANANAS on NAS-Bench-201 for CIFAR-100 in which the COLE-based surrogate requires the same or more evaluations than a structural path-encoding surrogate to reach within 1 percent of the highest test accuracy.

Figures

Figures reproduced from arXiv: 2605.15649 by Aaron McDaniel, Advay Balakrishnan, Jason Zutty, Pranav Somu, Stepan Kravtsov.

**Figure 2.** Figure 2: A sample code representation under the Helper [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The Inline mode code representation. The logic [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: NAS trajectory on CIFAR-10 (top), CIFAR-100 (mid [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 6.** Figure 6: The Comment add-on optionally used in COLE. A [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 5.** Figure 5: An excerpt of the Backbone add-on optionally [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 8.** Figure 8: The PyTorch code representation for the einspace [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: t-SNE visualizations comparing structural encodings with COLE on NAS-Bench-201. The left column shows the full [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: t-SNE visualizations comparing ONNX-to-text encodings with COLE on a 7,400-architecture subset of NAS-Bench-201. [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

**Figure 11.** Figure 11: t-SNE visualizations comparing derivation tree strings and COLE on a 2,837-architecture corpus. The layout follows [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗

read the original abstract

Developing effective surrogates (performance predictors) for Neural Architecture Search (NAS) typically requires expensive fine-tuning or the engineering of complex representations. We propose a low-cost embedding strategy that leverages the inductive bias of Language Models (LMs) to eliminate these overheads. By representing architectures as PyTorch class definition text, we demonstrate that off-the-shelf LMs act as competitive feature extractors without NAS-specialized fine-tuning. The final predictor is constructed by passing the extracted Code-Oriented LM Embeddings (COLE) through a lightweight regression head. We also investigate strategies to improve embedding quality and utilization. Our experiments on the NAS-Bench-201 and einspace search spaces reveal that raw code inputs yield higher predictive performance than other text-based encodings (e.g., ONNX-to-text encodings) when using frozen LMs. We also observe COLE drives superior surrogate-assisted search using the BANANAS algorithm in NAS-Bench-201. When optimizing for CIFAR-100 performance, replacing structural path encodings with COLE for architecture representation allows for a 34% decrease in the evaluation budget required to reach within 1% of the fittest architecture in the search space (by test accuracy). As any neural architecture can be represented as code, these findings establish COLE as a versatile and efficient foundation for advancing NAS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Raw PyTorch code fed to frozen off-the-shelf LMs produces usable embeddings for NAS surrogates and yields a reported 34% evaluation budget cut on NAS-Bench-201, but the source of the gain is not yet pinned down.

read the letter

The main thing to know is that this paper treats neural architectures as literal PyTorch class definitions, runs the text through a frozen general-purpose language model, and uses the resulting embeddings as input to a simple regression head for performance prediction. On NAS-Bench-201 they replace structural path encodings with these COLE embeddings inside BANANAS and report a 34% reduction in the number of evaluations needed to reach within 1% of the best CIFAR-100 accuracy. They also show raw code text beats ONNX-to-text conversions on the same frozen models and test the approach on the einspace search space as well. The appeal is the low overhead: no LM fine-tuning and no custom architecture encoders required. That keeps the method cheap and portable to any code-representable network. The comparison to prior text-based encodings is direct and the surrogate-assisted search result is concrete, so the practical angle is clear. The soft spot is exactly where the stress-test note points. We still lack evidence that the LM embeddings are picking up operation choices, channel sizes, or connectivity rather than generic code syntax or token patterns. Without ablations that isolate the information the embeddings actually carry, it is hard to know whether the surrogate advantage will hold when the search space or task changes. The abstract also omits error bars, run counts, and exact training splits for the regression head, which leaves the 34% figure preliminary. This paper is aimed at people already working on surrogate models for NAS or on lightweight ways to borrow pre-trained models for structured prediction tasks. A reader who needs a cheap baseline to beat or extend would find it worth trying. The idea is simple enough and the empirical claim specific enough that it deserves a serious referee even if the experiments need tightening on controls and mechanistic checks. I would send it out for peer review.

Referee Report

2 major / 2 minor

Summary. The paper proposes Code-Oriented LM Embeddings (COLE) for surrogate-assisted Neural Architecture Search. Architectures are encoded as raw PyTorch class definition text and passed through off-the-shelf frozen language models to produce embeddings, which are then fed to a lightweight regression head for performance prediction. Experiments on NAS-Bench-201 and einspace show raw code inputs outperforming ONNX-to-text encodings, and integration with BANANAS yields a 34% reduction in evaluation budget to reach within 1% of the best test accuracy on CIFAR-100.

Significance. If the results hold under stricter controls, the work would be significant for providing a low-cost, fine-tuning-free method to leverage pre-trained LMs in NAS. Representing architectures as code offers broad applicability across search spaces, and the reported efficiency gain with BANANAS could reduce the computational burden of surrogate-assisted search.

major comments (2)

[§4.3] §4.3 (BANANAS results on NAS-Bench-201 CIFAR-100): the 34% evaluation budget reduction to reach within 1% of the fittest architecture is reported without error bars, number of independent runs, or explicit description of the structural path encoding baseline and data splits used for surrogate training. This detail is load-bearing for the central empirical claim.
[§3.1] §3.1 (Embedding quality): the claim that raw PyTorch code yields higher predictive performance than alternative encodings with frozen LMs is not supported by any analysis showing that the embeddings encode NAS-relevant attributes such as operation choice or connectivity rather than generic token statistics. This assumption underpins attribution of the surrogate advantage to COLE.

minor comments (2)

[Abstract] Abstract: the phrase 'strategies to improve embedding quality and utilization' is mentioned without naming the strategies, which would aid reader comprehension.
[Figures] Figure captions: several figures comparing encodings would benefit from explicit axis labels indicating whether performance is measured by validation or test accuracy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments highlight important aspects of reproducibility and attribution that we address below. We have revised the manuscript accordingly to strengthen the presentation of our results.

read point-by-point responses

Referee: [§4.3] §4.3 (BANANAS results on NAS-Bench-201 CIFAR-100): the 34% evaluation budget reduction to reach within 1% of the fittest architecture is reported without error bars, number of independent runs, or explicit description of the structural path encoding baseline and data splits used for surrogate training. This detail is load-bearing for the central empirical claim.

Authors: We agree that these experimental details are essential for supporting the central claim and ensuring reproducibility. In the revised manuscript, we now report results averaged over five independent runs with different random seeds and include error bars in the relevant figure and table in §4.3. We have also added an explicit description of the structural path encoding baseline (following the original BANANAS implementation) and clarified that the surrogate training data splits adhere to the standard NAS-Bench-201 protocol used in prior work. These revisions appear in Section 4.3 and the experimental setup. revision: yes
Referee: [§3.1] §3.1 (Embedding quality): the claim that raw PyTorch code yields higher predictive performance than alternative encodings with frozen LMs is not supported by any analysis showing that the embeddings encode NAS-relevant attributes such as operation choice or connectivity rather than generic token statistics. This assumption underpins attribution of the surrogate advantage to COLE.

Authors: We thank the referee for this observation. Our primary evidence is the consistent outperformance of raw PyTorch code over ONNX-to-text encodings when using identical frozen LMs; because both representations are textual, the performance gap suggests that the advantage arises from how architecture-specific information is captured rather than generic token-level statistics alone. To directly address the request for attribution analysis, the revised manuscript includes additional probing experiments in §3.1 that measure correlations between embedding features and NAS-relevant properties such as operation types and connectivity. These results support that COLE encodes structural attributes relevant to performance prediction. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external LM embeddings and experimental validation

full rationale

The paper's core pipeline—representing architectures as raw PyTorch code, extracting frozen off-the-shelf LM embeddings (COLE), and training a lightweight regression head—is presented as a low-cost empirical strategy. The 34% evaluation-budget reduction on NAS-Bench-201 CIFAR-100 with BANANAS is reported as an experimental outcome, not derived by construction from fitted parameters or self-referential definitions. No equations, self-citation chains, or ansatz smuggling reduce the surrogate ranking advantage to quantities internal to the paper. The derivation chain is self-contained against external benchmarks (pre-trained LMs and standard NAS-Bench-201 splits).

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the assumption that general language models already encode useful inductive biases for code that transfer to architecture performance prediction; COLE itself is introduced as a derived representation rather than a new physical entity.

axioms (1)

domain assumption Language models possess inductive biases that make them effective feature extractors for neural architecture code without domain-specific fine-tuning.
Invoked when claiming off-the-shelf LMs act as competitive extractors for PyTorch class definitions.

invented entities (1)

Code-Oriented LM Embeddings (COLE) no independent evidence
purpose: To serve as input features for a lightweight regression head that predicts neural architecture performance.
New term coined for the embeddings produced by passing PyTorch code text through frozen LMs.

pith-pipeline@v0.9.0 · 5781 in / 1314 out tokens · 46516 ms · 2026-05-20T21:01:37.310798+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By representing architectures as PyTorch class definition text, we demonstrate that off-the-shelf LMs act as competitive feature extractors without NAS-specialized fine-tuning.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We adopt the Pairwise Hinge Loss to optimize for Kendall’s Tau rank correlation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

Mohamed S Abdelfattah, Abhinav Mehrotra, Łukasz Dudziak, and Nicholas Don- ald Lane. 2021. Zero-Cost Proxies for Lightweight NAS. InInternational Confer- ence on Learning Representations

work page 2021
[2]

Yash Akhauri and Mohamed S Abdelfattah. 2024. Encodings for Prediction-based Neural Architecture Search. InInternational Conference on Machine Learning. PMLR, 740–759

work page 2024
[3]

Xuanyi Dong and Yi Yang. 2020. NAS-Bench-201: Extending the Scope of Re- producible Neural Architecture Search. InInternational Conference on Learning Representations

work page 2020
[4]

Linus Ericsson, Miguel Espinosa Minano, Chenhongyi Yang, Antreas Antoniou, Amos J Storkey, Shay Cohen, Steven McDonagh, and Elliot J Crowley. 2024. einspace: Searching for neural architectures from fundamental operations.Ad- vances in Neural Information Processing Systems37 (2024), 1919–1953

work page 2024
[5]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Ab- hishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The Llama 3 herd of models. InNeural Information Processing Systems. Curran Associates

work page 2024
[6]

Yuxuan Hu, Jihao Liu, Ke Wang, Jinliang Zheng, Weikang Shi, Manyuan Zhang, Qi Dou, Rui Liu, Aojun Zhou, and Hongsheng Li. 2025. LM-Searcher: Cross- domain Neural Architecture Search with LLMs via Unified Numerical Encoding. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 9419–9432

work page 2025
[7]

Ganesh Jawahar, Muhammad Abdul-Mageed, Laks Lakshmanan, and Dujian Ding. 2024. Llm performance predictors are good initializers for architecture search. InFindings of the Association for Computational Linguistics: ACL 2024. 10540–10560

work page 2024
[8]

Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable Architecture Search. InInternational Conference on Learning Representations

work page 2019
[9]

Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, and Tie-Yan Liu. 2020. Semi-supervised neural architecture search.Advances in Neural Information Processing Systems33 (2020), 10547–10557

work page 2020
[10]

Jindi Lv, Yuhao Zhou, Yuxin Tian, Qing Ye, Wentao Feng, and Jiancheng Lv

work page
[11]

HyperNAS: Enhancing Architecture Representation for NAS Predictor via Hypernetwork.arXiv preprint arXiv:2509.18151(2025)

work page arXiv 2025
[12]

Joe Mellor, Jack Turner, Amos Storkey, and Elliot J Crowley. 2021. Neural architecture search without training. InInternational conference on machine learning. PMLR, 7588–7598

work page 2021
[13]

Clint Morris, Michael Jurado, and Jason Zutty. 2024. Llm guided evolution- the automation of models advancing models. InProceedings of the Genetic and Evolutionary Computation Conference. 377–384

work page 2024
[14]

Xuefei Ning, Yin Zheng, Tianchen Zhao, Yu Wang, and Huazhong Yang. 2020. A generic graph-based neural architecture encoding scheme for predictor-based nas. InEuropean Conference on Computer Vision. Springer, 189–204

work page 2020
[15]

Shiwen Qin, Alexander Auras, Shay B Cohen, Elliot J Crowley, Michael Moeller, Linus Ericsson, and Jovita Lukasik. 2025. ONNX-Net: Towards Universal Repre- sentations and Instant Performance Prediction for Neural Architectures.arXiv preprint arXiv:2510.04938(2025)

work page arXiv 2025
[16]

Crowley, Jovita Lukasik, and Linus Ericsson

Shiwen Qin, Gabriela Kadlecová, Martin Pilát, Shay B Cohen, Roman Neruda, Elliot J. Crowley, Jovita Lukasik, and Linus Ericsson. 2025. Transferrable Surrogates in Expressive Neural Architecture Search Spaces. InProceedings of the Fourth International Conference on Automated Machine Learning (Pro- ceedings of Machine Learning Research, Vol. 293), Leman Ako...

work page 2025
[17]

Simon Schrodi, Danny Stoll, Binxin Ru, Rhea Sukthanker, Thomas Brox, and Frank Hutter. 2023. Construction of hierarchical neural architecture search spaces based on context-free grammars.Advances in Neural Information Processing Systems36 (2023), 23172–23223

work page 2023
[18]

Hidenori Tanaka, Daniel Kunin, Daniel L Yamins, and Surya Ganguli. 2020. Pruning neural networks without any data by iteratively conserving synaptic flow.Advances in neural information processing systems33 (2020), 6377–6389

work page 2020
[19]

Eric Tang, Bangding Yang, and Xingyou Song. 2025. Understanding LLM Em- beddings for Regression.Transactions on Machine Learning Research(2025)

work page 2025
[20]

Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hall- ström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, et al. 2025. Smarter, better, faster, longer: A modern bidirectional en- coder for fast, memory efficient, and long context finetuning and inference. InProceedings of the 63rd Annual Meeting of the...

work page 2025
[21]

Colin White, Willie Neiswanger, and Yash Savani. 2021. Bananas: Bayesian opti- mization with neural architectures for neural architecture search. InProceedings of the AAAI conference on artificial intelligence, Vol. 35. 10293–10301

work page 2021
[22]

Colin White, Arber Zela, Robin Ru, Yang Liu, and Frank Hutter. 2021. How Powerful are Performance Predictors in Neural Architecture Search?Advances in Neural Information Processing Systems34 (2021)

work page 2021
[23]

Tomomasa Yamasaki, Zhehui Wang, Tao Luo, Niangjun Chen, and Bo Wang

work page
[24]

RBFleX-NAS: Training-Free Neural Architecture Search Using Radial Basis Function Kernel and Hyperparameter Detection.IEEE Transactions on Neural Networks and Learning Systems(2025)

work page 2025
[25]

Shen Yan, Kaiqiang Song, Fei Liu, and Mi Zhang. 2021. Cate: Computation-aware neural architecture encoding with transformers. InInternational Conference on Machine Learning. PMLR, 11670–11681

work page 2021
[26]

Shen Yan, Colin White, Yash Savani, and Frank Hutter. 2021. Nas-bench-x11 and the power of learning curves.Advances in Neural information processing systems 34 (2021), 22534–22549

work page 2021
[27]

Shen Yan, Yu Zheng, Wei Ao, Xiao Zeng, and Mi Zhang. 2020. Does unsupervised architecture representation learning help neural architecture search?Advances in neural information processing systems33 (2020), 12486–12498

work page 2020
[28]

Chris Ying, Aaron Klein, Eric Christiansen, Esteban Real, Kevin Murphy, and Frank Hutter. 2019. Nas-bench-101: Towards reproducible neural architecture search. InInternational conference on machine learning. PMLR, 7105–7114

work page 2019
[29]

Arber Zela, Julien Niklas Siems, Lucas Zimmer, Jovita Lukasik, Margret Keuper, and Frank Hutter. 2022. Surrogate NAS Benchmarks: Going Beyond the Lim- ited Search Spaces of Tabular NAS Benchmarks. InInternational Conference on Learning Representations

work page 2022
[30]

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2018. Learning transferable architectures for scalable image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 8697–8710. Pranav Somu, Advay Balakrishnan, Stepan Kravtsov, Aaron McDaniel, and Jason Zutty A Code Verbosity: Context Add-ons As discussed ...

work page 2018

[1] [1]

Mohamed S Abdelfattah, Abhinav Mehrotra, Łukasz Dudziak, and Nicholas Don- ald Lane. 2021. Zero-Cost Proxies for Lightweight NAS. InInternational Confer- ence on Learning Representations

work page 2021

[2] [2]

Yash Akhauri and Mohamed S Abdelfattah. 2024. Encodings for Prediction-based Neural Architecture Search. InInternational Conference on Machine Learning. PMLR, 740–759

work page 2024

[3] [3]

Xuanyi Dong and Yi Yang. 2020. NAS-Bench-201: Extending the Scope of Re- producible Neural Architecture Search. InInternational Conference on Learning Representations

work page 2020

[4] [4]

Linus Ericsson, Miguel Espinosa Minano, Chenhongyi Yang, Antreas Antoniou, Amos J Storkey, Shay Cohen, Steven McDonagh, and Elliot J Crowley. 2024. einspace: Searching for neural architectures from fundamental operations.Ad- vances in Neural Information Processing Systems37 (2024), 1919–1953

work page 2024

[5] [5]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Ab- hishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The Llama 3 herd of models. InNeural Information Processing Systems. Curran Associates

work page 2024

[6] [6]

Yuxuan Hu, Jihao Liu, Ke Wang, Jinliang Zheng, Weikang Shi, Manyuan Zhang, Qi Dou, Rui Liu, Aojun Zhou, and Hongsheng Li. 2025. LM-Searcher: Cross- domain Neural Architecture Search with LLMs via Unified Numerical Encoding. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 9419–9432

work page 2025

[7] [7]

Ganesh Jawahar, Muhammad Abdul-Mageed, Laks Lakshmanan, and Dujian Ding. 2024. Llm performance predictors are good initializers for architecture search. InFindings of the Association for Computational Linguistics: ACL 2024. 10540–10560

work page 2024

[8] [8]

Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable Architecture Search. InInternational Conference on Learning Representations

work page 2019

[9] [9]

Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, and Tie-Yan Liu. 2020. Semi-supervised neural architecture search.Advances in Neural Information Processing Systems33 (2020), 10547–10557

work page 2020

[10] [10]

Jindi Lv, Yuhao Zhou, Yuxin Tian, Qing Ye, Wentao Feng, and Jiancheng Lv

work page

[11] [11]

HyperNAS: Enhancing Architecture Representation for NAS Predictor via Hypernetwork.arXiv preprint arXiv:2509.18151(2025)

work page arXiv 2025

[12] [12]

Joe Mellor, Jack Turner, Amos Storkey, and Elliot J Crowley. 2021. Neural architecture search without training. InInternational conference on machine learning. PMLR, 7588–7598

work page 2021

[13] [13]

Clint Morris, Michael Jurado, and Jason Zutty. 2024. Llm guided evolution- the automation of models advancing models. InProceedings of the Genetic and Evolutionary Computation Conference. 377–384

work page 2024

[14] [14]

Xuefei Ning, Yin Zheng, Tianchen Zhao, Yu Wang, and Huazhong Yang. 2020. A generic graph-based neural architecture encoding scheme for predictor-based nas. InEuropean Conference on Computer Vision. Springer, 189–204

work page 2020

[15] [15]

Shiwen Qin, Alexander Auras, Shay B Cohen, Elliot J Crowley, Michael Moeller, Linus Ericsson, and Jovita Lukasik. 2025. ONNX-Net: Towards Universal Repre- sentations and Instant Performance Prediction for Neural Architectures.arXiv preprint arXiv:2510.04938(2025)

work page arXiv 2025

[16] [16]

Crowley, Jovita Lukasik, and Linus Ericsson

Shiwen Qin, Gabriela Kadlecová, Martin Pilát, Shay B Cohen, Roman Neruda, Elliot J. Crowley, Jovita Lukasik, and Linus Ericsson. 2025. Transferrable Surrogates in Expressive Neural Architecture Search Spaces. InProceedings of the Fourth International Conference on Automated Machine Learning (Pro- ceedings of Machine Learning Research, Vol. 293), Leman Ako...

work page 2025

[17] [17]

Simon Schrodi, Danny Stoll, Binxin Ru, Rhea Sukthanker, Thomas Brox, and Frank Hutter. 2023. Construction of hierarchical neural architecture search spaces based on context-free grammars.Advances in Neural Information Processing Systems36 (2023), 23172–23223

work page 2023

[18] [18]

Hidenori Tanaka, Daniel Kunin, Daniel L Yamins, and Surya Ganguli. 2020. Pruning neural networks without any data by iteratively conserving synaptic flow.Advances in neural information processing systems33 (2020), 6377–6389

work page 2020

[19] [19]

Eric Tang, Bangding Yang, and Xingyou Song. 2025. Understanding LLM Em- beddings for Regression.Transactions on Machine Learning Research(2025)

work page 2025

[20] [20]

Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hall- ström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, et al. 2025. Smarter, better, faster, longer: A modern bidirectional en- coder for fast, memory efficient, and long context finetuning and inference. InProceedings of the 63rd Annual Meeting of the...

work page 2025

[21] [21]

Colin White, Willie Neiswanger, and Yash Savani. 2021. Bananas: Bayesian opti- mization with neural architectures for neural architecture search. InProceedings of the AAAI conference on artificial intelligence, Vol. 35. 10293–10301

work page 2021

[22] [22]

Colin White, Arber Zela, Robin Ru, Yang Liu, and Frank Hutter. 2021. How Powerful are Performance Predictors in Neural Architecture Search?Advances in Neural Information Processing Systems34 (2021)

work page 2021

[23] [23]

Tomomasa Yamasaki, Zhehui Wang, Tao Luo, Niangjun Chen, and Bo Wang

work page

[24] [24]

RBFleX-NAS: Training-Free Neural Architecture Search Using Radial Basis Function Kernel and Hyperparameter Detection.IEEE Transactions on Neural Networks and Learning Systems(2025)

work page 2025

[25] [25]

Shen Yan, Kaiqiang Song, Fei Liu, and Mi Zhang. 2021. Cate: Computation-aware neural architecture encoding with transformers. InInternational Conference on Machine Learning. PMLR, 11670–11681

work page 2021

[26] [26]

Shen Yan, Colin White, Yash Savani, and Frank Hutter. 2021. Nas-bench-x11 and the power of learning curves.Advances in Neural information processing systems 34 (2021), 22534–22549

work page 2021

[27] [27]

Shen Yan, Yu Zheng, Wei Ao, Xiao Zeng, and Mi Zhang. 2020. Does unsupervised architecture representation learning help neural architecture search?Advances in neural information processing systems33 (2020), 12486–12498

work page 2020

[28] [28]

Chris Ying, Aaron Klein, Eric Christiansen, Esteban Real, Kevin Murphy, and Frank Hutter. 2019. Nas-bench-101: Towards reproducible neural architecture search. InInternational conference on machine learning. PMLR, 7105–7114

work page 2019

[29] [29]

Arber Zela, Julien Niklas Siems, Lucas Zimmer, Jovita Lukasik, Margret Keuper, and Frank Hutter. 2022. Surrogate NAS Benchmarks: Going Beyond the Lim- ited Search Spaces of Tabular NAS Benchmarks. InInternational Conference on Learning Representations

work page 2022

[30] [30]

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2018. Learning transferable architectures for scalable image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition. 8697–8710. Pranav Somu, Advay Balakrishnan, Stepan Kravtsov, Aaron McDaniel, and Jason Zutty A Code Verbosity: Context Add-ons As discussed ...

work page 2018