pith. sign in

arxiv: 1907.02871 · v1 · pith:YBFJEZ5Fnew · submitted 2019-07-05 · 💻 cs.NE · cs.GT

Genetic Network Architecture Search

Pith reviewed 2026-05-25 01:35 UTC · model grok-4.3

classification 💻 cs.NE cs.GT
keywords genetic algorithmneural architecture searchconvolutional cellsweight sharingCIFAR-10CIFAR-100stochastic gradient descent
0
0 comments X

The pith

A genetic algorithm integrated with SGD evolves convolution cell sub-graphs by sharing weights to maximize validation accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a neural architecture search technique that runs a genetic algorithm on top of standard SGD training. Weight sharing across all candidate solutions lets the GA evaluate many architectures without separate full trainings for each. The search targets sub-graphs inside convolution cells and selects those that perform best on a held-out validation set. Experiments report 96 percent accuracy on CIFAR-10 and 80.1 percent on CIFAR-100. A reader cares because the method reduces the usual cost of architecture search while still producing competitive image-classification models.

Core claim

Our approach uses a genetic algorithm integrated with standard Stochastic Gradient Descent which allows the sharing of weights across all architecture solutions; the method uses GA to design a sub-graph of Convolution cell which maximizes the accuracy on the validation-set.

What carries the argument

Genetic algorithm that evolves sub-graphs of convolution cells while reusing weights trained jointly via SGD to compute fitness scores.

If this is right

  • Many candidate cells can be evaluated inside a single SGD training run instead of requiring independent trainings.
  • The GA can iterate over architecture populations using validation accuracy as the fitness signal.
  • The resulting cells reach 96 percent accuracy on CIFAR-10 and 80.1 percent on CIFAR-100.
  • The search focuses on sub-graphs inside convolution cells rather than full network topologies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same weight-sharing GA loop could be applied to search spaces beyond convolution cells, such as recurrent or attention blocks.
  • If the fitness estimates remain stable, the method lowers the compute barrier for running architecture search on modest hardware.
  • One could measure whether cells found on CIFAR transfer to other image datasets without further evolution.

Load-bearing premise

Weight sharing across different genetically generated architectures produces fitness estimates reliable enough for the GA to select cells that are genuinely better rather than artifacts of the sharing scheme.

What would settle it

Run the same genetic search twice, once with weight sharing and once by training every candidate architecture from scratch on its own, then check whether the final selected cells match.

Figures

Figures reproduced from arXiv: 1907.02871 by Gil Rafalovich, Hai Victor Habi.

Figure 1
Figure 1. Figure 1: Image after data augmentation with CutOut 4 Method The main idea of GeneticNAS is the combination of ENAS with an evolution search. Additionally we have used the ENAS search space that represents a network cell as DAG where each node represents an operation and the edges represent a tensors. The key property of ENAS - weight sharing across all child networks is applied in GeneticNAS as well [PITH_FULL_IMA… view at source ↗
Figure 3
Figure 3. Figure 3: The block structure with two inputs, two op￾eration and merge operator [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: An example DAG with 5 block [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Left Figure present the network structure with three cell types, Right Figure present the cell struc￾ture with residual connection and Squeeze and Excitation block 4.2 Architecture Search The architecture search is built on a GA integrated into stochastic gradient descent (SGD) that utilizes [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Uniform crossover method Block Cross-over Block cross-over utilizes the fact that a cell type is built with Nb block. We be￾gin by defining the cross-over blocks by generating a random binary status vector with size Nb. The pro￾cess for producing the first offspring is as follows: If the status equals one, the first parent swaps its block with the second parent, otherwise the block is taken from the first … view at source ↗
Figure 7
Figure 7. Figure 7: Block crossover method 4.2.3 Mutation Mutation modifies the individual representation ran￾domly. For each generation a random value, zero or one, is generated, where the probability for one is pm. This random value indicates which gene is to be mu￾tated. If a gene is selected to be mutated then we randomly add +1 or −1 to its representation with equal probability [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Offspring mutation example 4.2.4 Replacement One of the key properties of GA is a population, where in each iteration a set of a new generation of individuals is inserted in to the population and a cur￾rent set of the individuals, (within the current pop￾ulation) is removed. This ensures the same popula￾tion size of Np. There are several known replacement methods for GA as shown in [19]. We have chosen a s… view at source ↗
Figure 10
Figure 10. Figure 10: Final search result over different mutation probability [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Number of update to the population during training on CIFAR10 dataset [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗
Figure 11
Figure 11. Figure 11: The population mean accuracy during the search on CIFAR10 dataset [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: Comparing the accuracy of the population on a subset of the validation-set vs the accuracy of the best model on the entire validation-set, performed on CI￾FAR100 dataset [PITH_FULL_IMAGE:figures/full_fig_p009_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Reduce Cell, Normal Cell and Input Cell Search Result on CIFAR10 dataset [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Reduce Cell, Normal Cell and Input Cell Search Result on CIFAR100 dataset [PITH_FULL_IMAGE:figures/full_fig_p012_15.png] view at source ↗
read the original abstract

We propose a method for learning the neural network architecture that based on Genetic Algorithm (GA). Our approach uses a genetic algorithm integrated with standard Stochastic Gradient Descent(SGD) which allows the sharing of weights across all architecture solutions. The method uses GA to design a sub-graph of Convolution cell which maximizes the accuracy on the validation-set. Through experiments, we demonstrate this methods performance on both CIFAR10 and CIFAR100 dataset with an accuracy of 96% and 80.1%. The code and result of this work available in GitHub:https://github.com/haihabi/GeneticNAS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a genetic algorithm (GA) integrated with SGD for neural architecture search, enabling weight sharing across candidate solutions. The GA evolves sub-graphs of convolution cells to maximize validation-set accuracy. Experiments on CIFAR-10 and CIFAR-100 are reported to achieve 96% and 80.1% accuracy, with code linked on GitHub.

Significance. If the shared-weight fitness estimates prove reliable for guiding GA selection, the method could offer a lightweight evolutionary NAS alternative that avoids per-candidate retraining. The reported accuracies would then represent a meaningful data point in the evolutionary NAS literature, but the current presentation provides no evidence that the central assumption holds.

major comments (2)
  1. [Abstract and Results] Abstract and Results section: accuracies of 96% (CIFAR-10) and 80.1% (CIFAR-100) are stated without any baseline comparisons, standard deviations, number of independent runs, or training protocol details, rendering the central empirical claim unevaluable.
  2. [Method] Method section on GA+SGD integration: the assumption that validation accuracy under a single set of shared weights yields reliable fitness rankings for genetically varied convolution-cell subgraphs is not supported by any ablation, ranking-correlation study, or comparison to independently trained cells; this directly undermines the validity of the GA selection step.
minor comments (2)
  1. The GitHub link is given but the manuscript contains no hyperparameter table, cell encoding details, or population-size/mutation-rate values, hindering reproducibility.
  2. [Method] Notation for the convolution-cell sub-graph representation and how crossover/mutation operate on it is introduced without an accompanying diagram or pseudocode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results section: accuracies of 96% (CIFAR-10) and 80.1% (CIFAR-100) are stated without any baseline comparisons, standard deviations, number of independent runs, or training protocol details, rendering the central empirical claim unevaluable.

    Authors: We agree that the reported accuracies require additional context to be fully evaluable. In the revised manuscript we will add comparisons against standard baselines (e.g., ResNet, VGG, and prior NAS methods), report standard deviations across multiple independent runs, state the number of runs performed, and provide complete training-protocol details including optimizer, learning-rate schedule, batch size, epochs, and data augmentation. revision: yes

  2. Referee: [Method] Method section on GA+SGD integration: the assumption that validation accuracy under a single set of shared weights yields reliable fitness rankings for genetically varied convolution-cell subgraphs is not supported by any ablation, ranking-correlation study, or comparison to independently trained cells; this directly undermines the validity of the GA selection step.

    Authors: The manuscript treats shared-weight validation accuracy as a practical proxy for fitness, consistent with other one-shot NAS approaches. The current version contains no explicit ablation or ranking-correlation analysis. We will revise the method section to state this assumption explicitly, discuss its potential limitations, and note that the final accuracies provide only indirect evidence of ranking quality. A dedicated correlation study would require new experiments and is therefore outside the scope of a minor revision. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical GA+SGD method with externally verifiable claims

full rationale

The paper describes an empirical architecture search procedure that integrates a genetic algorithm with SGD-based weight sharing to evolve convolution-cell subgraphs, reporting validation accuracies on CIFAR-10/100. No equations, fitted parameters renamed as predictions, self-definitional relations, or load-bearing self-citations appear in the abstract or described method. The central performance claims are directly testable on public datasets without reducing to the method's own inputs by construction. This is the normal non-circular case for a search heuristic paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract contains no derivations, free parameters, axioms, or invented entities; all content is empirical description.

pith-pipeline@v0.9.0 · 5612 in / 960 out tokens · 23190 ms · 2026-05-25T01:35:38.859576+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 14 internal anchors

  1. [1]

    Neural Architecture Search with Reinforcement Learning

    Barret Zoph and Quoc V Le, “Neural architec- ture search with reinforcement learning,”arXiv preprint arXiv:1611.01578, 2016

  2. [2]

    Learning transferable architec- tures for scalable image recognition,

    BarretZoph, VijayVasudevan, JonathonShlens, and Quoc V Le, “Learning transferable architec- tures for scalable image recognition,” inProceed- ings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8697–8710

  3. [3]

    Efficient architecture search by network transformation,

    Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang, “Efficient architecture search by network transformation,” AAAI, 2018

  4. [4]

    Progressive neural architecture search,

    Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Mur- phy, “Progressive neural architecture search,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 19–34

  5. [5]

    DARTS: Differentiable Architecture Search

    Hanxiao Liu, Karen Simonyan, and Yiming Yang, “Darts: Differentiable architecture search,” arXiv preprint arXiv:1806.09055, 2018

  6. [6]

    Efficient Neural Architecture Search via Parameter Sharing

    Hieu Pham, Melody Y Guan, Barret Zoph, Quoc V Le, and Jeff Dean, “Efficient neural ar- chitecture search via parameter sharing,”arXiv preprint arXiv:1802.03268, 2018

  7. [7]

    Genetic cnn,

    L. Xie and A. Yuille, “Genetic cnn,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct 2017, pp. 1388–1397

  8. [8]

    Hierarchical Representations for Efficient Architecture Search

    Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu, “Hierarchical representations for efficient architecture search,” arXiv preprint arXiv:1711.00436, 2017

  9. [9]

    Regularized Evolution for Image Classifier Architecture Search

    Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le, “Regularized evolution for im- ageclassifierarchitecturesearch,” arXiv preprint arXiv:1802.01548, 2018

  10. [10]

    Learn- ing multiple layers of features from tiny images,

    Alex Krizhevsky and Geoffrey Hinton, “Learn- ing multiple layers of features from tiny images,” Tech. Rep., Citeseer, 2009

  11. [11]

    Deep residual learning for image recognition,

    KaimingHe, XiangyuZhang, ShaoqingRen, and Jian Sun, “Deep residual learning for image recognition,” inThe IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), June 2016

  12. [12]

    Improved Regularization of Convolutional Neural Networks with Cutout

    Terrance DeVries and Graham W Taylor, “Im- proved regularization of convolutional neu- ral networks with cutout,” arXiv preprint arXiv:1708.04552, 2017

  13. [13]

    Aggregated residual transformations for deep neural net- works,

    Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He, “Aggregated residual transformations for deep neural net- works,” inComputer Vision and Pattern Recog- nition (CVPR), 2017 IEEE Conference on . IEEE, 2017, pp. 5987–5995

  14. [14]

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam, “Mobilenets: Efficient convolutional neural net- works for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017

  15. [15]

    Inception-v4, inception-resnet and the impact of residual con- nections on learning.,

    Christian Szegedy, Sergey Ioffe, Vincent Van- houcke, and Alexander A Alemi, “Inception-v4, inception-resnet and the impact of residual con- nections on learning.,” in AAAI, 2017, vol. 4, p. 12

  16. [16]

    MobileNetV2: Inverted Residuals and Linear Bottlenecks

    Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen, “Mobilenetv2: Inverted residuals and linear bot- tlenecks,” arXiv preprint arXiv:1801.04381 , 2018

  17. [17]

    Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

    Sergey Ioffe and Christian Szegedy, “Batch nor- malization: Accelerating deep network train- ing by reducing internal covariate shift,”arXiv preprint arXiv:1502.03167, 2015

  18. [18]

    Squeeze- and-excitation networks,

    Jie Hu, Li Shen, and Gang Sun, “Squeeze- and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition, 2018, pp. 7132–7141

  19. [19]

    Edmund K Burke, Graham Kendall, et al., Search methodologies, Springer, 2005

  20. [20]

    Dropout: a simple way to prevent neural networks from overfitting,

    Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,”The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014

  21. [21]

    On the importance of initialization and momentum in deep learning,

    Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton, “On the importance of initialization and momentum in deep learning,” inInternational conference on machine learning, 2013, pp. 1139–1147

  22. [22]

    Shake-Shake regularization

    Xavier Gastaldi, “Shake-shake regularization,” arXiv preprint arXiv:1705.07485, 2017

  23. [23]

    FractalNet: Ultra-Deep Neural Networks without Residuals

    Gustav Larsson, Michael Maire, and Gregory Shakhnarovich, “Fractalnet: Ultra-deep neu- ral networks without residuals,”arXiv preprint arXiv:1605.07648, 2016

  24. [24]

    Network In Network

    Min Lin, Qiang Chen, and Shuicheng Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013

  25. [25]

    Deeply- supervised nets,

    Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, and Zhuowen Tu, “Deeply- supervised nets,” in Artificial Intelligence and Statistics, 2015, pp. 562–570

  26. [26]

    Highway Networks

    Rupesh Kumar Srivastava, Klaus Greff, and Jür- gen Schmidhuber, “Highway networks,” arXiv preprint arXiv:1505.00387, 2015

  27. [27]

    DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

    Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, and Kurt Keutzer, “Densenet: Implementing ef- ficient convnet descriptor pyramids,” arXiv preprint arXiv:1404.1869, 2014. 7 Appendix 7.1 Code and Implantation The code and implantation of GeneticNAS which used for running the models is available online in the flowing github repo...