pith. sign in

arxiv: 2601.08517 · v2 · submitted 2026-01-13 · 💻 cs.CV

Closed-Loop LLM Discovery of Non-Standard Channel Priors in Vision Models

Pith reviewed 2026-05-16 14:30 UTC · model grok-4.3

classification 💻 cs.CV
keywords neural architecture searchlarge language modelschannel configurationCIFAR-100abstract syntax treeperformance feedbackvision models
0
0 comments X

The pith

Closed-loop LLM refines channel configurations to outperform AST-generated vision models on CIFAR-100

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether large language models can assist neural architecture search by reasoning over code-level channel specifications in deep networks. It treats the task as conditional code generation in which the LLM iteratively refines architectures using performance feedback from proxy evaluations. To supply training examples, the authors first create a population of valid, shape-consistent networks through abstract syntax tree mutations that are not themselves optimized for accuracy. Experiments demonstrate that the resulting closed-loop process yields architectures that improve on the initial population under identical proxy conditions on CIFAR-100. The generated models also exhibit recurring patterns such as non-standard channel widths and late-stage expansion, suggesting that language models can surface design heuristics that complement traditional search methods.

Core claim

We formulate channel-configuration search as conditional code generation in which an LLM refines architectural specifications using performance feedback. A corpus of valid architectures is first produced through abstract syntax tree mutations to overcome data scarcity. The closed-loop system then produces networks that outperform the initial AST-generated population on CIFAR-100 under the same proxy-evaluation protocol. Analysis of the outputs shows that the models reflect domain-specific patterns including non-standard channel widths and late-stage expansion.

What carries the argument

Closed-loop LLM refinement of architectural code structures using iterative performance feedback from proxy evaluations

Load-bearing premise

Short proxy evaluations on CIFAR-100 supply reliable enough signals to steer the LLM toward architectures that generalize beyond the proxy setting.

What would settle it

If the top architectures discovered by the closed-loop LLM are retrained from scratch on ImageNet and show no accuracy improvement over standard baselines such as ResNet or EfficientNet under matched computational budgets, the claim of useful discovery would be falsified.

Figures

Figures reproduced from arXiv: 2601.08517 by Dmitry Ignatov, Radu Timofte, Tolgay Atinc Uzun.

Figure 1
Figure 1. Figure 1: The general overview of the experimental cycle. Neural network codes are re￾trieved from LEMUR database and went through mutations in abstract syntax tree bootstrapper to generate valid models and exemplify the problem specification to LLM. Iterative inference-training cycle according structured prompts, including ac￾curacy metric signal, self improve the channel predictions done by the LLM. 3.2 The LEMUR … view at source ↗
Figure 2
Figure 2. Figure 2: The AST-based bootstrapping pipeline. It illustrates the process of AST pars￾ing (extracting layer definitions and offsets), dependency analysis using the TorchFX symbolic graph, the constraint-aware editing and repairing phase, and the verification protocol using dummy tensor checks to generate valid seed models. Whenever a ran￾dom mutation is chosen and applied, the network must be corrected. The invalid… view at source ↗
Figure 3
Figure 3. Figure 3: CIFAR-100 validation accuracy (one training epoch) of LLM-generated image classification models across LLM fine-tuning epochs. Left: mean accuracy of all valid models; fluctuations reflect the exploration of diverse configurations. Right: rolling av￾erage of validation-set classification accuracy (window size = 3) for image classification models generated by an LLM across fine-tuning epochs. 4.4 Statistica… view at source ↗
Figure 4
Figure 4. Figure 4: Search dynamics. Left: generation success rate (valid/total); low generation success stability here indicates the LLM maintains metric focused improvement during exploration rather than memorizing the specifics of the evaluation constraints. Right: best-so-far trajectory (high-water mark), where step-function jumps highlight discrete breakthroughs in the generative process for the obtained max accuracies. … view at source ↗
Figure 5
Figure 5. Figure 5: Efficiency and flow analysis. Left: accuracy vs. parameter count. The red dashed line indicates the Pareto frontier of efficient models. The LLM discovers models that maximize accuracy for a given parameter budget. The color and the size of the points remark the epoch that the models were generated. Right: architectural flow (parallel co￾ordinates) for all models, colored by accuracy. High-performing model… view at source ↗
read the original abstract

Channel-configuration search, the optimization of layer specifications such as channel widths in deep neural networks, presents a combinatorial challenge constrained by tensor-shape compatibility and computational budgets. We investigate whether large language models (LLMs) can support neural architecture search (NAS) by reasoning over architectural code structures in ways that complement traditional search heuristics. We apply an LLM-driven NAS framework to channel-configuration search, formulating the task as conditional code generation in which the LLM refines architectural specifications using performance feedback. To address data scarcity, we generate a corpus of valid, shape-consistent architectures through abstract syntax tree (AST) mutations. Although these mutated networks are not necessarily optimized for performance, they provide structural examples that help the LLM learn executable architectural patterns and relate channel configurations to model performance. Experimental results on CIFAR-100 show that the closed-loop LLM improves upon the initial AST-generated architecture population under the same proxy-evaluation protocol. Our analysis further shows that the generated architectures reflect domain-specific design patterns, including non-standard channel widths and late-stage expansion, highlighting the potential of language-driven design for code-level NAS. The code and prompts are publicly available at https://github.com/ABrain-One/NN-GPT, and the generated deep neural networks are published at https://github.com/ABrain-One/NN-Dataset under model names with the prefix ast-dimension-.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a closed-loop LLM framework for neural architecture search (NAS) targeting channel-configuration optimization in vision models. An initial population of shape-consistent architectures is generated via abstract syntax tree (AST) mutations to supply structural examples; the LLM then iteratively refines channel widths using performance feedback from proxy evaluations. Experiments on CIFAR-100 report that the LLM-refined architectures outperform the initial AST-generated population under the same proxy protocol, and the resulting models exhibit non-standard design patterns such as late-stage channel expansion. Code and generated models are released publicly.

Significance. If the proxy signals prove reliable, the work would illustrate how LLM reasoning over executable code can complement heuristic NAS methods in combinatorially constrained spaces, potentially surfacing novel channel priors that standard search overlooks. The public release of code, prompts, and the full set of generated networks (prefixed ast-dimension-) is a clear strength that supports reproducibility and further analysis.

major comments (2)
  1. [§4] §4 (Experimental results): The central claim that closed-loop LLM refinement improves proxy scores over the initial AST population is presented without any description of the proxy protocol (training epochs, optimizer settings, number of seeds, or statistical tests), baseline strength, or run-to-run variance. This leaves the empirical improvement difficult to interpret or reproduce.
  2. [§3.2] §3.2 (Closed-loop refinement): No ablation or measurement is supplied on the rank correlation between the short proxy evaluations used for LLM feedback and full-training accuracy on CIFAR-100. Without this, it is impossible to confirm that the LLM is being steered toward genuine architectural improvements rather than proxy-specific artifacts.
minor comments (2)
  1. [Abstract] The abstract and §4 refer to 'non-standard channel widths' without a precise definition or quantitative comparison to standard ResNet-style progressions; a table listing the most frequent deviations would improve clarity.
  2. Figure captions and the GitHub links could explicitly state the exact model names (ast-dimension-*) used in the reported experiments to facilitate direct inspection of the published networks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We will revise the manuscript to address the concerns regarding experimental details and proxy validation.

read point-by-point responses
  1. Referee: [§4] §4 (Experimental results): The central claim that closed-loop LLM refinement improves proxy scores over the initial AST population is presented without any description of the proxy protocol (training epochs, optimizer settings, number of seeds, or statistical tests), baseline strength, or run-to-run variance. This leaves the empirical improvement difficult to interpret or reproduce.

    Authors: We agree that the proxy protocol was not described in sufficient detail. In the revised manuscript, we will add a subsection in §4 detailing the proxy evaluation: architectures are trained for 20 epochs using SGD optimizer with learning rate 0.1, momentum 0.9, weight decay 5e-4, batch size 256 on CIFAR-100. Results are averaged over 3 independent runs with different random seeds, and we report mean proxy accuracy along with standard deviation. The baseline is the mean performance of the initial population, and we will include a statistical comparison using a t-test to confirm the significance of the improvement. revision: yes

  2. Referee: [§3.2] §3.2 (Closed-loop refinement): No ablation or measurement is supplied on the rank correlation between the short proxy evaluations used for LLM feedback and full-training accuracy on CIFAR-100. Without this, it is impossible to confirm that the LLM is being steered toward genuine architectural improvements rather than proxy-specific artifacts.

    Authors: We partially concur with the need for such validation. The core contribution is demonstrating improvement within the proxy setting, which is consistent for both populations. However, to strengthen the claim, we will perform and report in the revision a rank correlation analysis on a held-out set of architectures. We will train 15 randomly selected models from the search to full accuracy (200 epochs) and compute the Spearman correlation between their proxy scores and full accuracies. This will be added to §3.2. We note that full training for the entire loop is infeasible due to computational cost, but this targeted ablation will help address the concern. revision: partial

Circularity Check

0 steps flagged

No circularity: purely experimental NAS framework with no derivations or self-referential reductions

full rationale

The paper describes an empirical pipeline for LLM-driven channel-configuration search: an initial population is created via AST mutations to produce valid architectures, then refined in closed-loop fashion by feeding proxy-evaluation scores back to the LLM for code-level edits. All claims are experimental comparisons of proxy accuracy on CIFAR-100 between the initial and refined populations. No equations, parameter fittings, uniqueness theorems, or ansatzes appear in the text; the reported improvement is measured directly under the stated protocol rather than derived from prior results by construction. Public code and datasets further allow external reproduction, confirming the work is self-contained and contains no load-bearing self-citation chains or definitional loops.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that LLM reasoning over code plus proxy feedback can discover useful non-standard priors; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption AST mutations produce sufficiently diverse yet valid architectural examples for LLM training
    Invoked to solve data scarcity for the LLM

pith-pipeline@v0.9.0 · 5547 in / 1062 out tokens · 65482 ms · 2026-05-16T14:30:53.180598+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs

    cs.LG 2026-05 unverdicted novelty 7.0

    Fine-tuned 7B LLMs generating unified diffs for neural architecture refinement achieve 66-75% valid rates and 64-66% mean first-epoch accuracy, outperforming full-generation baselines by large margins while cutting ou...

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · cited by 1 Pith paper · 6 internal anchors

  1. [1]

    In: International Conference on Learning Representations (2017)

    Zoph, B., Le, Q.V.: Neural Architecture Search with Reinforcement Learning. In: International Conference on Learning Representations (2017)

  2. [2]

    In: International Conference on Learning Representations (2019)

    Liu, H., Simonyan, K., Yang, Y.: DARTS: Differentiable Architecture Search. In: International Conference on Learning Representations (2019)

  3. [3]

    5664–5674 (2025)

    Kochnev, R., et al.: Optuna vs Code Llama: Are LLMs a New Paradigm for Hy- perparameter Tuning? In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 5664–5674 (2025)

  4. [4]

    2025, doi:10.20944/preprints202512.1276.v1

    Rupani B., et al.: Exploring the Collaboration Between Vision Mod- els and LLMs for Enhanced Image Classification, Preprints, Dec. 2025, doi:10.20944/preprints202512.1276.v1

  5. [5]

    GadoM.,etal.:VIST-GPT:UsheringintheEraofVisualStorytellingwithLLMs?, arXiv preprint arXiv:2504.19267 (2025)

  6. [6]

    Khalid W., et al.: A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks, arXiv preprint arXiv:2512.04329 (2025)

  7. [7]

    In: European Con- ference on Computer Vision, pp

    Wang, Y., Zhang, X., Xie, L., Zhou, J., Su, H., Zhang, B., Hu, X.: ChannelNet: Channel Configuration Search for Efficient Neural Networks. In: European Con- ference on Computer Vision, pp. 581–597 (2020)

  8. [8]

    Evaluating Large Language Models Trained on Code

    Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.D.O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., et al.: Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:2107.03374 (2021)

  9. [9]

    In: Advances in Neural Information Processing Systems, vol

    So, D.R., Mańke, W., Liu, H., Dai, Z., Shazeer, N., Le, Q.V.: Primer: Searching for Efficient Transformers for Language Modeling. In: Advances in Neural Information Processing Systems, vol. 34, pp. 275–289 (2021)

  10. [10]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp

    Wang, C., Zhang, Y., Liu, Y., Chen, H., Li, Y., Xie, Y., Tian, Q.: LLM-NAS: Large Language Model as Neural Architecture Search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12345–12355 (2023)

  11. [11]

    In: International Conference on Learning Representations (2022)

    Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, L., Wang, W., Chen, W.: LoRA: Low-Rank Adaptation of Large Language Models. In: International Conference on Learning Representations (2022)

  12. [12]

    In: Advances in Neural Information Processing Systems (2023)

    Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: QLoRA: Efficient Fine- tuning of Quantized LLMs. In: Advances in Neural Information Processing Systems (2023)

  13. [13]

    In: Advances in Neural Information Processing Systems, vol

    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet Classification with Deep Convolutional Neural Networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012)

  14. [14]

    OpenAI Blog (2019)

    Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language Models are Unsupervised Multitask Learners. OpenAI Blog (2019)

  15. [15]

    In: Advances in Neural Information Processing Systems (2023)

    Xu,A.,etal.:EvoPrompting:LanguageModelsforCode-LevelNeuralArchitecture Search. In: Advances in Neural Information Processing Systems (2023)

  16. [16]

    Olausson, T.X., et al.: Is Self-Repair a Silver Bullet for Code Generation? arXiv preprint arXiv:2306.09896 (2023)

  17. [17]

    Nature (2024) LLM-Guided Channel NAS 15

    Romera-Paredes, B., et al.: Mathematical discoveries from program search with large language models. Nature (2024) LLM-Guided Channel NAS 15

  18. [18]

    arXiv preprint arXiv:2306.01102 (2023)

    Nasir, M., et al.: LLMatic: Neural Architecture Search via Large Language Models and Quality-Diversity Optimization. arXiv preprint arXiv:2306.01102 (2023)

  19. [19]

    arXiv preprint arXiv:2305.05351 (2023)

    Zhang, Y., et al.: GPT-NAS: Neural Architecture Search with the Generative Pre- trained Transformer. arXiv preprint arXiv:2305.05351 (2023)

  20. [20]

    arXiv preprint arXiv:2511.01234 (2025)

    Kochnev, R., et al.: NNGPT: Rethinking AutoML with Large Language Models. arXiv preprint arXiv:2511.01234 (2025)

  21. [21]

    Jesani K., et al.: LLM as a Neural Architect: Controlled Generation of Image Captioning Models Under Strict API Contracts, arXiv preprint arXiv:2512.14706 (2025)

  22. [22]

    Mittal Y., et al.: Preparation of Fractal-Inspired Computational Architectures for AdvancedLargeLanguageModelAnalysis,arXivpreprintarXiv:2511.07329(2025)

  23. [23]

    Shrestha U., et al.: From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs, arXiv preprint arXiv:2601.03808 (2026)

  24. [24]

    Khalid W., et al.: From Memorization to Creativity: LLM as a Designer of Novel Neural Architectures, arXiv preprint arXiv:2601.02997 (2026)

  25. [25]

    Vysyaraju C., et al.: Enhancing LLM-Based Neural Network Generation: Few- ShotPromptingandEfficientValidationforAutomatedArchitectureDesign,arXiv preprint arXiv:2512.24120 (2025)

  26. [26]

    arXiv preprint arXiv:2406.09876 (2024)

    Aglietti, V., et al.: FunBO: Discovering Acquisition Functions for Bayesian Opti- mization with FunSearch. arXiv preprint arXiv:2406.09876 (2024)

  27. [27]

    arXiv preprint arXiv:2402.03456 (2024)

    Rahman, A., et al.: LeMo-NADe: Multi-Parameter Neural Architecture Discovery with LLMs. arXiv preprint arXiv:2402.03456 (2024)

  28. [28]

    In: Proceedings of the European Conference on Computer Vision (ECCV), pp

    He, Y., Lin, J., Liu, Z., Wang, H., Li, L.J., Han, S.: AMC: AutoML for Model Compression and Acceleration on Mobile Devices. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–800 (2018)

  29. [29]

    In: Proceedings of the IEEE Inter- national Conference on Computer Vision (ICCV), pp

    Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning Efficient Convo- lutional Networks through Network Slimming. In: Proceedings of the IEEE Inter- national Conference on Computer Vision (ICCV), pp. 2736–2744 (2017)

  30. [30]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp

    Liu, Z., Mu, H., Zhang, X., Guo, Z., Yang, X., Cheng, K.T., Sun, J.: MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3296– 3305 (2019)

  31. [31]

    Textbooks Are All You Need

    Gunasekar, S., Zhang, Y., Aneja, J., Mendes, C.C.T., Del Giorno, A., Gopi, S., Javaheripi, M., Kauffmann, P., de Rosa, G., Saarikivi, O., et al.: Textbooks Are All You Need. arXiv preprint arXiv:2306.11644 (2023)

  32. [32]

    GitHub repository (2023)

    Chaudhary, S.: Code Alpaca: An Instruction-following LLaMA model for code generation. GitHub repository (2023)

  33. [33]

    Lemur neural net- work dataset: Towards seamless automl.arXiv preprint arXiv:2504.10552, 2025

    Goodarzi, A.T., Kochnev, R., Khalid, W., Goudarzi, H.T., Qin, F., Uzun, T.A., Dhameliya, Y.S., et al.: LEMUR Neural Network Dataset: Towards Seamless Au- toML. arXiv preprint arXiv:2504.10552 (2025)

  34. [34]

    A., et al.: LEMUR 2: Unlocking Neural Network Diversity for AI, arXiv preprint (2026)

    Uzun T. A., et al.: LEMUR 2: Unlocking Neural Network Diversity for AI, arXiv preprint (2026)

  35. [35]

    U., et al.: AI on the Edge: An Automated Pipeline for PyTorch- to-Android Deployment and Benchmarking, Preprints, Nov

    Din S. U., et al.: AI on the Edge: An Automated Pipeline for PyTorch- to-Android Deployment and Benchmarking, Preprints, Nov. 2025, doi:10.20944/preprints202511.1831.v1

  36. [36]

    Aboudeshish N., et al.: AUGMENTGEST: Can Random Data Cropping Augmen- tation Boost Gesture Recognition Performance?, arXiv preprint arXiv:2506.07216 (2025)