Closed-Loop LLM Discovery of Non-Standard Channel Priors in Vision Models
Pith reviewed 2026-05-16 14:30 UTC · model grok-4.3
The pith
Closed-loop LLM refines channel configurations to outperform AST-generated vision models on CIFAR-100
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We formulate channel-configuration search as conditional code generation in which an LLM refines architectural specifications using performance feedback. A corpus of valid architectures is first produced through abstract syntax tree mutations to overcome data scarcity. The closed-loop system then produces networks that outperform the initial AST-generated population on CIFAR-100 under the same proxy-evaluation protocol. Analysis of the outputs shows that the models reflect domain-specific patterns including non-standard channel widths and late-stage expansion.
What carries the argument
Closed-loop LLM refinement of architectural code structures using iterative performance feedback from proxy evaluations
Load-bearing premise
Short proxy evaluations on CIFAR-100 supply reliable enough signals to steer the LLM toward architectures that generalize beyond the proxy setting.
What would settle it
If the top architectures discovered by the closed-loop LLM are retrained from scratch on ImageNet and show no accuracy improvement over standard baselines such as ResNet or EfficientNet under matched computational budgets, the claim of useful discovery would be falsified.
Figures
read the original abstract
Channel-configuration search, the optimization of layer specifications such as channel widths in deep neural networks, presents a combinatorial challenge constrained by tensor-shape compatibility and computational budgets. We investigate whether large language models (LLMs) can support neural architecture search (NAS) by reasoning over architectural code structures in ways that complement traditional search heuristics. We apply an LLM-driven NAS framework to channel-configuration search, formulating the task as conditional code generation in which the LLM refines architectural specifications using performance feedback. To address data scarcity, we generate a corpus of valid, shape-consistent architectures through abstract syntax tree (AST) mutations. Although these mutated networks are not necessarily optimized for performance, they provide structural examples that help the LLM learn executable architectural patterns and relate channel configurations to model performance. Experimental results on CIFAR-100 show that the closed-loop LLM improves upon the initial AST-generated architecture population under the same proxy-evaluation protocol. Our analysis further shows that the generated architectures reflect domain-specific design patterns, including non-standard channel widths and late-stage expansion, highlighting the potential of language-driven design for code-level NAS. The code and prompts are publicly available at https://github.com/ABrain-One/NN-GPT, and the generated deep neural networks are published at https://github.com/ABrain-One/NN-Dataset under model names with the prefix ast-dimension-.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a closed-loop LLM framework for neural architecture search (NAS) targeting channel-configuration optimization in vision models. An initial population of shape-consistent architectures is generated via abstract syntax tree (AST) mutations to supply structural examples; the LLM then iteratively refines channel widths using performance feedback from proxy evaluations. Experiments on CIFAR-100 report that the LLM-refined architectures outperform the initial AST-generated population under the same proxy protocol, and the resulting models exhibit non-standard design patterns such as late-stage channel expansion. Code and generated models are released publicly.
Significance. If the proxy signals prove reliable, the work would illustrate how LLM reasoning over executable code can complement heuristic NAS methods in combinatorially constrained spaces, potentially surfacing novel channel priors that standard search overlooks. The public release of code, prompts, and the full set of generated networks (prefixed ast-dimension-) is a clear strength that supports reproducibility and further analysis.
major comments (2)
- [§4] §4 (Experimental results): The central claim that closed-loop LLM refinement improves proxy scores over the initial AST population is presented without any description of the proxy protocol (training epochs, optimizer settings, number of seeds, or statistical tests), baseline strength, or run-to-run variance. This leaves the empirical improvement difficult to interpret or reproduce.
- [§3.2] §3.2 (Closed-loop refinement): No ablation or measurement is supplied on the rank correlation between the short proxy evaluations used for LLM feedback and full-training accuracy on CIFAR-100. Without this, it is impossible to confirm that the LLM is being steered toward genuine architectural improvements rather than proxy-specific artifacts.
minor comments (2)
- [Abstract] The abstract and §4 refer to 'non-standard channel widths' without a precise definition or quantitative comparison to standard ResNet-style progressions; a table listing the most frequent deviations would improve clarity.
- Figure captions and the GitHub links could explicitly state the exact model names (ast-dimension-*) used in the reported experiments to facilitate direct inspection of the published networks.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We will revise the manuscript to address the concerns regarding experimental details and proxy validation.
read point-by-point responses
-
Referee: [§4] §4 (Experimental results): The central claim that closed-loop LLM refinement improves proxy scores over the initial AST population is presented without any description of the proxy protocol (training epochs, optimizer settings, number of seeds, or statistical tests), baseline strength, or run-to-run variance. This leaves the empirical improvement difficult to interpret or reproduce.
Authors: We agree that the proxy protocol was not described in sufficient detail. In the revised manuscript, we will add a subsection in §4 detailing the proxy evaluation: architectures are trained for 20 epochs using SGD optimizer with learning rate 0.1, momentum 0.9, weight decay 5e-4, batch size 256 on CIFAR-100. Results are averaged over 3 independent runs with different random seeds, and we report mean proxy accuracy along with standard deviation. The baseline is the mean performance of the initial population, and we will include a statistical comparison using a t-test to confirm the significance of the improvement. revision: yes
-
Referee: [§3.2] §3.2 (Closed-loop refinement): No ablation or measurement is supplied on the rank correlation between the short proxy evaluations used for LLM feedback and full-training accuracy on CIFAR-100. Without this, it is impossible to confirm that the LLM is being steered toward genuine architectural improvements rather than proxy-specific artifacts.
Authors: We partially concur with the need for such validation. The core contribution is demonstrating improvement within the proxy setting, which is consistent for both populations. However, to strengthen the claim, we will perform and report in the revision a rank correlation analysis on a held-out set of architectures. We will train 15 randomly selected models from the search to full accuracy (200 epochs) and compute the Spearman correlation between their proxy scores and full accuracies. This will be added to §3.2. We note that full training for the entire loop is infeasible due to computational cost, but this targeted ablation will help address the concern. revision: partial
Circularity Check
No circularity: purely experimental NAS framework with no derivations or self-referential reductions
full rationale
The paper describes an empirical pipeline for LLM-driven channel-configuration search: an initial population is created via AST mutations to produce valid architectures, then refined in closed-loop fashion by feeding proxy-evaluation scores back to the LLM for code-level edits. All claims are experimental comparisons of proxy accuracy on CIFAR-100 between the initial and refined populations. No equations, parameter fittings, uniqueness theorems, or ansatzes appear in the text; the reported improvement is measured directly under the stated protocol rather than derived from prior results by construction. Public code and datasets further allow external reproduction, confirming the work is self-contained and contains no load-bearing self-citation chains or definitional loops.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption AST mutations produce sufficiently diverse yet valid architectural examples for LLM training
Forward citations
Cited by 1 Pith paper
-
Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs
Fine-tuned 7B LLMs generating unified diffs for neural architecture refinement achieve 66-75% valid rates and 64-66% mean first-epoch accuracy, outperforming full-generation baselines by large margins while cutting ou...
Reference graph
Works this paper leans on
-
[1]
In: International Conference on Learning Representations (2017)
Zoph, B., Le, Q.V.: Neural Architecture Search with Reinforcement Learning. In: International Conference on Learning Representations (2017)
work page 2017
-
[2]
In: International Conference on Learning Representations (2019)
Liu, H., Simonyan, K., Yang, Y.: DARTS: Differentiable Architecture Search. In: International Conference on Learning Representations (2019)
work page 2019
-
[3]
Kochnev, R., et al.: Optuna vs Code Llama: Are LLMs a New Paradigm for Hy- perparameter Tuning? In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), pp. 5664–5674 (2025)
work page 2025
-
[4]
2025, doi:10.20944/preprints202512.1276.v1
Rupani B., et al.: Exploring the Collaboration Between Vision Mod- els and LLMs for Enhanced Image Classification, Preprints, Dec. 2025, doi:10.20944/preprints202512.1276.v1
- [5]
-
[6]
Khalid W., et al.: A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks, arXiv preprint arXiv:2512.04329 (2025)
work page internal anchor Pith review arXiv 2025
-
[7]
In: European Con- ference on Computer Vision, pp
Wang, Y., Zhang, X., Xie, L., Zhou, J., Su, H., Zhang, B., Hu, X.: ChannelNet: Channel Configuration Search for Efficient Neural Networks. In: European Con- ference on Computer Vision, pp. 581–597 (2020)
work page 2020
-
[8]
Evaluating Large Language Models Trained on Code
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.D.O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., et al.: Evaluating Large Language Models Trained on Code. arXiv preprint arXiv:2107.03374 (2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[9]
In: Advances in Neural Information Processing Systems, vol
So, D.R., Mańke, W., Liu, H., Dai, Z., Shazeer, N., Le, Q.V.: Primer: Searching for Efficient Transformers for Language Modeling. In: Advances in Neural Information Processing Systems, vol. 34, pp. 275–289 (2021)
work page 2021
-
[10]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp
Wang, C., Zhang, Y., Liu, Y., Chen, H., Li, Y., Xie, Y., Tian, Q.: LLM-NAS: Large Language Model as Neural Architecture Search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12345–12355 (2023)
work page 2023
-
[11]
In: International Conference on Learning Representations (2022)
Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, L., Wang, W., Chen, W.: LoRA: Low-Rank Adaptation of Large Language Models. In: International Conference on Learning Representations (2022)
work page 2022
-
[12]
In: Advances in Neural Information Processing Systems (2023)
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: QLoRA: Efficient Fine- tuning of Quantized LLMs. In: Advances in Neural Information Processing Systems (2023)
work page 2023
-
[13]
In: Advances in Neural Information Processing Systems, vol
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet Classification with Deep Convolutional Neural Networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012)
work page 2012
-
[14]
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language Models are Unsupervised Multitask Learners. OpenAI Blog (2019)
work page 2019
-
[15]
In: Advances in Neural Information Processing Systems (2023)
Xu,A.,etal.:EvoPrompting:LanguageModelsforCode-LevelNeuralArchitecture Search. In: Advances in Neural Information Processing Systems (2023)
work page 2023
- [16]
-
[17]
Nature (2024) LLM-Guided Channel NAS 15
Romera-Paredes, B., et al.: Mathematical discoveries from program search with large language models. Nature (2024) LLM-Guided Channel NAS 15
work page 2024
-
[18]
arXiv preprint arXiv:2306.01102 (2023)
Nasir, M., et al.: LLMatic: Neural Architecture Search via Large Language Models and Quality-Diversity Optimization. arXiv preprint arXiv:2306.01102 (2023)
-
[19]
arXiv preprint arXiv:2305.05351 (2023)
Zhang, Y., et al.: GPT-NAS: Neural Architecture Search with the Generative Pre- trained Transformer. arXiv preprint arXiv:2305.05351 (2023)
-
[20]
arXiv preprint arXiv:2511.01234 (2025)
Kochnev, R., et al.: NNGPT: Rethinking AutoML with Large Language Models. arXiv preprint arXiv:2511.01234 (2025)
- [21]
-
[22]
Mittal Y., et al.: Preparation of Fractal-Inspired Computational Architectures for AdvancedLargeLanguageModelAnalysis,arXivpreprintarXiv:2511.07329(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [23]
-
[24]
Khalid W., et al.: From Memorization to Creativity: LLM as a Designer of Novel Neural Architectures, arXiv preprint arXiv:2601.02997 (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[25]
Vysyaraju C., et al.: Enhancing LLM-Based Neural Network Generation: Few- ShotPromptingandEfficientValidationforAutomatedArchitectureDesign,arXiv preprint arXiv:2512.24120 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[26]
arXiv preprint arXiv:2406.09876 (2024)
Aglietti, V., et al.: FunBO: Discovering Acquisition Functions for Bayesian Opti- mization with FunSearch. arXiv preprint arXiv:2406.09876 (2024)
-
[27]
arXiv preprint arXiv:2402.03456 (2024)
Rahman, A., et al.: LeMo-NADe: Multi-Parameter Neural Architecture Discovery with LLMs. arXiv preprint arXiv:2402.03456 (2024)
-
[28]
In: Proceedings of the European Conference on Computer Vision (ECCV), pp
He, Y., Lin, J., Liu, Z., Wang, H., Li, L.J., Han, S.: AMC: AutoML for Model Compression and Acceleration on Mobile Devices. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–800 (2018)
work page 2018
-
[29]
In: Proceedings of the IEEE Inter- national Conference on Computer Vision (ICCV), pp
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning Efficient Convo- lutional Networks through Network Slimming. In: Proceedings of the IEEE Inter- national Conference on Computer Vision (ICCV), pp. 2736–2744 (2017)
work page 2017
-
[30]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp
Liu, Z., Mu, H., Zhang, X., Guo, Z., Yang, X., Cheng, K.T., Sun, J.: MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3296– 3305 (2019)
work page 2019
-
[31]
Gunasekar, S., Zhang, Y., Aneja, J., Mendes, C.C.T., Del Giorno, A., Gopi, S., Javaheripi, M., Kauffmann, P., de Rosa, G., Saarikivi, O., et al.: Textbooks Are All You Need. arXiv preprint arXiv:2306.11644 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[32]
Chaudhary, S.: Code Alpaca: An Instruction-following LLaMA model for code generation. GitHub repository (2023)
work page 2023
-
[33]
Lemur neural net- work dataset: Towards seamless automl.arXiv preprint arXiv:2504.10552, 2025
Goodarzi, A.T., Kochnev, R., Khalid, W., Goudarzi, H.T., Qin, F., Uzun, T.A., Dhameliya, Y.S., et al.: LEMUR Neural Network Dataset: Towards Seamless Au- toML. arXiv preprint arXiv:2504.10552 (2025)
-
[34]
A., et al.: LEMUR 2: Unlocking Neural Network Diversity for AI, arXiv preprint (2026)
Uzun T. A., et al.: LEMUR 2: Unlocking Neural Network Diversity for AI, arXiv preprint (2026)
work page 2026
-
[35]
Din S. U., et al.: AI on the Edge: An Automated Pipeline for PyTorch- to-Android Deployment and Benchmarking, Preprints, Nov. 2025, doi:10.20944/preprints202511.1831.v1
- [36]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.