Genetic Network Architecture Search
Pith reviewed 2026-05-25 01:35 UTC · model grok-4.3
The pith
A genetic algorithm integrated with SGD evolves convolution cell sub-graphs by sharing weights to maximize validation accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our approach uses a genetic algorithm integrated with standard Stochastic Gradient Descent which allows the sharing of weights across all architecture solutions; the method uses GA to design a sub-graph of Convolution cell which maximizes the accuracy on the validation-set.
What carries the argument
Genetic algorithm that evolves sub-graphs of convolution cells while reusing weights trained jointly via SGD to compute fitness scores.
If this is right
- Many candidate cells can be evaluated inside a single SGD training run instead of requiring independent trainings.
- The GA can iterate over architecture populations using validation accuracy as the fitness signal.
- The resulting cells reach 96 percent accuracy on CIFAR-10 and 80.1 percent on CIFAR-100.
- The search focuses on sub-graphs inside convolution cells rather than full network topologies.
Where Pith is reading between the lines
- The same weight-sharing GA loop could be applied to search spaces beyond convolution cells, such as recurrent or attention blocks.
- If the fitness estimates remain stable, the method lowers the compute barrier for running architecture search on modest hardware.
- One could measure whether cells found on CIFAR transfer to other image datasets without further evolution.
Load-bearing premise
Weight sharing across different genetically generated architectures produces fitness estimates reliable enough for the GA to select cells that are genuinely better rather than artifacts of the sharing scheme.
What would settle it
Run the same genetic search twice, once with weight sharing and once by training every candidate architecture from scratch on its own, then check whether the final selected cells match.
Figures
read the original abstract
We propose a method for learning the neural network architecture that based on Genetic Algorithm (GA). Our approach uses a genetic algorithm integrated with standard Stochastic Gradient Descent(SGD) which allows the sharing of weights across all architecture solutions. The method uses GA to design a sub-graph of Convolution cell which maximizes the accuracy on the validation-set. Through experiments, we demonstrate this methods performance on both CIFAR10 and CIFAR100 dataset with an accuracy of 96% and 80.1%. The code and result of this work available in GitHub:https://github.com/haihabi/GeneticNAS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a genetic algorithm (GA) integrated with SGD for neural architecture search, enabling weight sharing across candidate solutions. The GA evolves sub-graphs of convolution cells to maximize validation-set accuracy. Experiments on CIFAR-10 and CIFAR-100 are reported to achieve 96% and 80.1% accuracy, with code linked on GitHub.
Significance. If the shared-weight fitness estimates prove reliable for guiding GA selection, the method could offer a lightweight evolutionary NAS alternative that avoids per-candidate retraining. The reported accuracies would then represent a meaningful data point in the evolutionary NAS literature, but the current presentation provides no evidence that the central assumption holds.
major comments (2)
- [Abstract and Results] Abstract and Results section: accuracies of 96% (CIFAR-10) and 80.1% (CIFAR-100) are stated without any baseline comparisons, standard deviations, number of independent runs, or training protocol details, rendering the central empirical claim unevaluable.
- [Method] Method section on GA+SGD integration: the assumption that validation accuracy under a single set of shared weights yields reliable fitness rankings for genetically varied convolution-cell subgraphs is not supported by any ablation, ranking-correlation study, or comparison to independently trained cells; this directly undermines the validity of the GA selection step.
minor comments (2)
- The GitHub link is given but the manuscript contains no hyperparameter table, cell encoding details, or population-size/mutation-rate values, hindering reproducibility.
- [Method] Notation for the convolution-cell sub-graph representation and how crossover/mutation operate on it is introduced without an accompanying diagram or pseudocode.
Simulated Author's Rebuttal
We thank the referee for the detailed feedback. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract and Results] Abstract and Results section: accuracies of 96% (CIFAR-10) and 80.1% (CIFAR-100) are stated without any baseline comparisons, standard deviations, number of independent runs, or training protocol details, rendering the central empirical claim unevaluable.
Authors: We agree that the reported accuracies require additional context to be fully evaluable. In the revised manuscript we will add comparisons against standard baselines (e.g., ResNet, VGG, and prior NAS methods), report standard deviations across multiple independent runs, state the number of runs performed, and provide complete training-protocol details including optimizer, learning-rate schedule, batch size, epochs, and data augmentation. revision: yes
-
Referee: [Method] Method section on GA+SGD integration: the assumption that validation accuracy under a single set of shared weights yields reliable fitness rankings for genetically varied convolution-cell subgraphs is not supported by any ablation, ranking-correlation study, or comparison to independently trained cells; this directly undermines the validity of the GA selection step.
Authors: The manuscript treats shared-weight validation accuracy as a practical proxy for fitness, consistent with other one-shot NAS approaches. The current version contains no explicit ablation or ranking-correlation analysis. We will revise the method section to state this assumption explicitly, discuss its potential limitations, and note that the final accuracies provide only indirect evidence of ranking quality. A dedicated correlation study would require new experiments and is therefore outside the scope of a minor revision. revision: partial
Circularity Check
No circularity: empirical GA+SGD method with externally verifiable claims
full rationale
The paper describes an empirical architecture search procedure that integrates a genetic algorithm with SGD-based weight sharing to evolve convolution-cell subgraphs, reporting validation accuracies on CIFAR-10/100. No equations, fitted parameters renamed as predictions, self-definitional relations, or load-bearing self-citations appear in the abstract or described method. The central performance claims are directly testable on public datasets without reducing to the method's own inputs by construction. This is the normal non-circular case for a search heuristic paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Neural Architecture Search with Reinforcement Learning
Barret Zoph and Quoc V Le, “Neural architec- ture search with reinforcement learning,”arXiv preprint arXiv:1611.01578, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[2]
Learning transferable architec- tures for scalable image recognition,
BarretZoph, VijayVasudevan, JonathonShlens, and Quoc V Le, “Learning transferable architec- tures for scalable image recognition,” inProceed- ings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8697–8710
work page 2018
-
[3]
Efficient architecture search by network transformation,
Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang, “Efficient architecture search by network transformation,” AAAI, 2018
work page 2018
-
[4]
Progressive neural architecture search,
Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Mur- phy, “Progressive neural architecture search,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 19–34
work page 2018
-
[5]
DARTS: Differentiable Architecture Search
Hanxiao Liu, Karen Simonyan, and Yiming Yang, “Darts: Differentiable architecture search,” arXiv preprint arXiv:1806.09055, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
Efficient Neural Architecture Search via Parameter Sharing
Hieu Pham, Melody Y Guan, Barret Zoph, Quoc V Le, and Jeff Dean, “Efficient neural ar- chitecture search via parameter sharing,”arXiv preprint arXiv:1802.03268, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[7]
L. Xie and A. Yuille, “Genetic cnn,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct 2017, pp. 1388–1397
work page 2017
-
[8]
Hierarchical Representations for Efficient Architecture Search
Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu, “Hierarchical representations for efficient architecture search,” arXiv preprint arXiv:1711.00436, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[9]
Regularized Evolution for Image Classifier Architecture Search
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le, “Regularized evolution for im- ageclassifierarchitecturesearch,” arXiv preprint arXiv:1802.01548, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[10]
Learn- ing multiple layers of features from tiny images,
Alex Krizhevsky and Geoffrey Hinton, “Learn- ing multiple layers of features from tiny images,” Tech. Rep., Citeseer, 2009
work page 2009
-
[11]
Deep residual learning for image recognition,
KaimingHe, XiangyuZhang, ShaoqingRen, and Jian Sun, “Deep residual learning for image recognition,” inThe IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), June 2016
work page 2016
-
[12]
Improved Regularization of Convolutional Neural Networks with Cutout
Terrance DeVries and Graham W Taylor, “Im- proved regularization of convolutional neu- ral networks with cutout,” arXiv preprint arXiv:1708.04552, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[13]
Aggregated residual transformations for deep neural net- works,
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He, “Aggregated residual transformations for deep neural net- works,” inComputer Vision and Pattern Recog- nition (CVPR), 2017 IEEE Conference on . IEEE, 2017, pp. 5987–5995
work page 2017
-
[14]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam, “Mobilenets: Efficient convolutional neural net- works for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[15]
Inception-v4, inception-resnet and the impact of residual con- nections on learning.,
Christian Szegedy, Sergey Ioffe, Vincent Van- houcke, and Alexander A Alemi, “Inception-v4, inception-resnet and the impact of residual con- nections on learning.,” in AAAI, 2017, vol. 4, p. 12
work page 2017
-
[16]
MobileNetV2: Inverted Residuals and Linear Bottlenecks
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen, “Mobilenetv2: Inverted residuals and linear bot- tlenecks,” arXiv preprint arXiv:1801.04381 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[17]
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe and Christian Szegedy, “Batch nor- malization: Accelerating deep network train- ing by reducing internal covariate shift,”arXiv preprint arXiv:1502.03167, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[18]
Squeeze- and-excitation networks,
Jie Hu, Li Shen, and Gang Sun, “Squeeze- and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition, 2018, pp. 7132–7141
work page 2018
-
[19]
Edmund K Burke, Graham Kendall, et al., Search methodologies, Springer, 2005
work page 2005
-
[20]
Dropout: a simple way to prevent neural networks from overfitting,
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,”The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014
work page 1929
-
[21]
On the importance of initialization and momentum in deep learning,
Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton, “On the importance of initialization and momentum in deep learning,” inInternational conference on machine learning, 2013, pp. 1139–1147
work page 2013
-
[22]
Xavier Gastaldi, “Shake-shake regularization,” arXiv preprint arXiv:1705.07485, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[23]
FractalNet: Ultra-Deep Neural Networks without Residuals
Gustav Larsson, Michael Maire, and Gregory Shakhnarovich, “Fractalnet: Ultra-deep neu- ral networks without residuals,”arXiv preprint arXiv:1605.07648, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[24]
Min Lin, Qiang Chen, and Shuicheng Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[25]
Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, and Zhuowen Tu, “Deeply- supervised nets,” in Artificial Intelligence and Statistics, 2015, pp. 562–570
work page 2015
-
[26]
Rupesh Kumar Srivastava, Klaus Greff, and Jür- gen Schmidhuber, “Highway networks,” arXiv preprint arXiv:1505.00387, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[27]
DenseNet: Implementing Efficient ConvNet Descriptor Pyramids
Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, and Kurt Keutzer, “Densenet: Implementing ef- ficient convnet descriptor pyramids,” arXiv preprint arXiv:1404.1869, 2014. 7 Appendix 7.1 Code and Implantation The code and implantation of GeneticNAS which used for running the models is available online in the flowing github repo...
work page internal anchor Pith review Pith/arXiv arXiv 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.