Genetic Network Architecture Search

Gil Rafalovich; Hai Victor Habi

arxiv: 1907.02871 · v1 · pith:YBFJEZ5Fnew · submitted 2019-07-05 · 💻 cs.NE · cs.GT

Genetic Network Architecture Search

Hai Victor Habi , Gil Rafalovich This is my paper

Pith reviewed 2026-05-25 01:35 UTC · model grok-4.3

classification 💻 cs.NE cs.GT

keywords genetic algorithmneural architecture searchconvolutional cellsweight sharingCIFAR-10CIFAR-100stochastic gradient descent

0 comments

The pith

A genetic algorithm integrated with SGD evolves convolution cell sub-graphs by sharing weights to maximize validation accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a neural architecture search technique that runs a genetic algorithm on top of standard SGD training. Weight sharing across all candidate solutions lets the GA evaluate many architectures without separate full trainings for each. The search targets sub-graphs inside convolution cells and selects those that perform best on a held-out validation set. Experiments report 96 percent accuracy on CIFAR-10 and 80.1 percent on CIFAR-100. A reader cares because the method reduces the usual cost of architecture search while still producing competitive image-classification models.

Core claim

Our approach uses a genetic algorithm integrated with standard Stochastic Gradient Descent which allows the sharing of weights across all architecture solutions; the method uses GA to design a sub-graph of Convolution cell which maximizes the accuracy on the validation-set.

What carries the argument

Genetic algorithm that evolves sub-graphs of convolution cells while reusing weights trained jointly via SGD to compute fitness scores.

If this is right

Many candidate cells can be evaluated inside a single SGD training run instead of requiring independent trainings.
The GA can iterate over architecture populations using validation accuracy as the fitness signal.
The resulting cells reach 96 percent accuracy on CIFAR-10 and 80.1 percent on CIFAR-100.
The search focuses on sub-graphs inside convolution cells rather than full network topologies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same weight-sharing GA loop could be applied to search spaces beyond convolution cells, such as recurrent or attention blocks.
If the fitness estimates remain stable, the method lowers the compute barrier for running architecture search on modest hardware.
One could measure whether cells found on CIFAR transfer to other image datasets without further evolution.

Load-bearing premise

Weight sharing across different genetically generated architectures produces fitness estimates reliable enough for the GA to select cells that are genuinely better rather than artifacts of the sharing scheme.

What would settle it

Run the same genetic search twice, once with weight sharing and once by training every candidate architecture from scratch on its own, then check whether the final selected cells match.

Figures

Figures reproduced from arXiv: 1907.02871 by Gil Rafalovich, Hai Victor Habi.

**Figure 1.** Figure 1: Image after data augmentation with CutOut 4 Method The main idea of GeneticNAS is the combination of ENAS with an evolution search. Additionally we have used the ENAS search space that represents a network cell as DAG where each node represents an operation and the edges represent a tensors. The key property of ENAS - weight sharing across all child networks is applied in GeneticNAS as well [PITH_FULL_IMA… view at source ↗

**Figure 3.** Figure 3: The block structure with two inputs, two operation and merge operator [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: An example DAG with 5 block [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Left Figure present the network structure with three cell types, Right Figure present the cell structure with residual connection and Squeeze and Excitation block 4.2 Architecture Search The architecture search is built on a GA integrated into stochastic gradient descent (SGD) that utilizes [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: Uniform crossover method Block Cross-over Block cross-over utilizes the fact that a cell type is built with Nb block. We begin by defining the cross-over blocks by generating a random binary status vector with size Nb. The process for producing the first offspring is as follows: If the status equals one, the first parent swaps its block with the second parent, otherwise the block is taken from the first … view at source ↗

**Figure 7.** Figure 7: Block crossover method 4.2.3 Mutation Mutation modifies the individual representation randomly. For each generation a random value, zero or one, is generated, where the probability for one is pm. This random value indicates which gene is to be mutated. If a gene is selected to be mutated then we randomly add +1 or −1 to its representation with equal probability [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 8.** Figure 8: Offspring mutation example 4.2.4 Replacement One of the key properties of GA is a population, where in each iteration a set of a new generation of individuals is inserted in to the population and a current set of the individuals, (within the current population) is removed. This ensures the same population size of Np. There are several known replacement methods for GA as shown in [19]. We have chosen a s… view at source ↗

**Figure 10.** Figure 10: Final search result over different mutation probability [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 12.** Figure 12: Number of update to the population during training on CIFAR10 dataset [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗

**Figure 11.** Figure 11: The population mean accuracy during the search on CIFAR10 dataset [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

**Figure 13.** Figure 13: Comparing the accuracy of the population on a subset of the validation-set vs the accuracy of the best model on the entire validation-set, performed on CIFAR100 dataset [PITH_FULL_IMAGE:figures/full_fig_p009_13.png] view at source ↗

**Figure 14.** Figure 14: Reduce Cell, Normal Cell and Input Cell Search Result on CIFAR10 dataset [PITH_FULL_IMAGE:figures/full_fig_p012_14.png] view at source ↗

**Figure 15.** Figure 15: Reduce Cell, Normal Cell and Input Cell Search Result on CIFAR100 dataset [PITH_FULL_IMAGE:figures/full_fig_p012_15.png] view at source ↗

read the original abstract

We propose a method for learning the neural network architecture that based on Genetic Algorithm (GA). Our approach uses a genetic algorithm integrated with standard Stochastic Gradient Descent(SGD) which allows the sharing of weights across all architecture solutions. The method uses GA to design a sub-graph of Convolution cell which maximizes the accuracy on the validation-set. Through experiments, we demonstrate this methods performance on both CIFAR10 and CIFAR100 dataset with an accuracy of 96% and 80.1%. The code and result of this work available in GitHub:https://github.com/haihabi/GeneticNAS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A basic GA-plus-weight-sharing NAS implementation from 2019 that adds little beyond prior work and supplies almost no experimental support for its claims.

read the letter

This paper runs a genetic algorithm over convolution-cell subgraphs inside a supernet whose weights are updated by ordinary SGD, so every candidate shares parameters with the rest of the population. It reports 96% on CIFAR-10 and 80.1% on CIFAR-100 and links to a GitHub repo. That is the whole contribution in a nutshell. By the time this preprint appeared the same combination of evolutionary search, weight sharing, and cell-based search spaces had already been published, so the work is an application rather than a new framework. The code release is the one concrete positive; anyone who wants to inspect the exact GA operators or cell encoding can do so without asking the authors. Everything else is thin. The abstract gives raw accuracies with no baselines, no standard deviations, no ablation on the sharing scheme, and no check that the shared-weight rankings actually predict performance when the selected cells are trained from scratch. The stress-test worry about path-specific biases and under-trained edges in the supernet is therefore left unexamined. A reader who already knows the one-shot NAS literature will see nothing that changes their view of the method class. Someone looking for a ready example to run on CIFAR might find the repo useful, but the paper itself does not reward close study. I would not send it to peer review; the empirical claims need the missing controls and comparisons before they are worth a referee's time.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a genetic algorithm (GA) integrated with SGD for neural architecture search, enabling weight sharing across candidate solutions. The GA evolves sub-graphs of convolution cells to maximize validation-set accuracy. Experiments on CIFAR-10 and CIFAR-100 are reported to achieve 96% and 80.1% accuracy, with code linked on GitHub.

Significance. If the shared-weight fitness estimates prove reliable for guiding GA selection, the method could offer a lightweight evolutionary NAS alternative that avoids per-candidate retraining. The reported accuracies would then represent a meaningful data point in the evolutionary NAS literature, but the current presentation provides no evidence that the central assumption holds.

major comments (2)

[Abstract and Results] Abstract and Results section: accuracies of 96% (CIFAR-10) and 80.1% (CIFAR-100) are stated without any baseline comparisons, standard deviations, number of independent runs, or training protocol details, rendering the central empirical claim unevaluable.
[Method] Method section on GA+SGD integration: the assumption that validation accuracy under a single set of shared weights yields reliable fitness rankings for genetically varied convolution-cell subgraphs is not supported by any ablation, ranking-correlation study, or comparison to independently trained cells; this directly undermines the validity of the GA selection step.

minor comments (2)

The GitHub link is given but the manuscript contains no hyperparameter table, cell encoding details, or population-size/mutation-rate values, hindering reproducibility.
[Method] Notation for the convolution-cell sub-graph representation and how crossover/mutation operate on it is introduced without an accompanying diagram or pseudocode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback. We address each major comment below.

read point-by-point responses

Referee: [Abstract and Results] Abstract and Results section: accuracies of 96% (CIFAR-10) and 80.1% (CIFAR-100) are stated without any baseline comparisons, standard deviations, number of independent runs, or training protocol details, rendering the central empirical claim unevaluable.

Authors: We agree that the reported accuracies require additional context to be fully evaluable. In the revised manuscript we will add comparisons against standard baselines (e.g., ResNet, VGG, and prior NAS methods), report standard deviations across multiple independent runs, state the number of runs performed, and provide complete training-protocol details including optimizer, learning-rate schedule, batch size, epochs, and data augmentation. revision: yes
Referee: [Method] Method section on GA+SGD integration: the assumption that validation accuracy under a single set of shared weights yields reliable fitness rankings for genetically varied convolution-cell subgraphs is not supported by any ablation, ranking-correlation study, or comparison to independently trained cells; this directly undermines the validity of the GA selection step.

Authors: The manuscript treats shared-weight validation accuracy as a practical proxy for fitness, consistent with other one-shot NAS approaches. The current version contains no explicit ablation or ranking-correlation analysis. We will revise the method section to state this assumption explicitly, discuss its potential limitations, and note that the final accuracies provide only indirect evidence of ranking quality. A dedicated correlation study would require new experiments and is therefore outside the scope of a minor revision. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical GA+SGD method with externally verifiable claims

full rationale

The paper describes an empirical architecture search procedure that integrates a genetic algorithm with SGD-based weight sharing to evolve convolution-cell subgraphs, reporting validation accuracies on CIFAR-10/100. No equations, fitted parameters renamed as predictions, self-definitional relations, or load-bearing self-citations appear in the abstract or described method. The central performance claims are directly testable on public datasets without reducing to the method's own inputs by construction. This is the normal non-circular case for a search heuristic paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract contains no derivations, free parameters, axioms, or invented entities; all content is empirical description.

pith-pipeline@v0.9.0 · 5612 in / 960 out tokens · 23190 ms · 2026-05-25T01:35:38.859576+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 14 internal anchors

[1]

Neural Architecture Search with Reinforcement Learning

Barret Zoph and Quoc V Le, “Neural architec- ture search with reinforcement learning,”arXiv preprint arXiv:1611.01578, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[2]

Learning transferable architec- tures for scalable image recognition,

BarretZoph, VijayVasudevan, JonathonShlens, and Quoc V Le, “Learning transferable architec- tures for scalable image recognition,” inProceed- ings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8697–8710

work page 2018
[3]

Eﬃcient architecture search by network transformation,

Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang, “Eﬃcient architecture search by network transformation,” AAAI, 2018

work page 2018
[4]

Progressive neural architecture search,

Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Mur- phy, “Progressive neural architecture search,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 19–34

work page 2018
[5]

DARTS: Differentiable Architecture Search

Hanxiao Liu, Karen Simonyan, and Yiming Yang, “Darts: Diﬀerentiable architecture search,” arXiv preprint arXiv:1806.09055, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Efficient Neural Architecture Search via Parameter Sharing

Hieu Pham, Melody Y Guan, Barret Zoph, Quoc V Le, and Jeﬀ Dean, “Eﬃcient neural ar- chitecture search via parameter sharing,”arXiv preprint arXiv:1802.03268, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

Genetic cnn,

L. Xie and A. Yuille, “Genetic cnn,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct 2017, pp. 1388–1397

work page 2017
[8]

Hierarchical Representations for Efficient Architecture Search

Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu, “Hierarchical representations for eﬃcient architecture search,” arXiv preprint arXiv:1711.00436, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[9]

Regularized Evolution for Image Classifier Architecture Search

Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le, “Regularized evolution for im- ageclassiﬁerarchitecturesearch,” arXiv preprint arXiv:1802.01548, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Learn- ing multiple layers of features from tiny images,

Alex Krizhevsky and Geoﬀrey Hinton, “Learn- ing multiple layers of features from tiny images,” Tech. Rep., Citeseer, 2009

work page 2009
[11]

Deep residual learning for image recognition,

KaimingHe, XiangyuZhang, ShaoqingRen, and Jian Sun, “Deep residual learning for image recognition,” inThe IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), June 2016

work page 2016
[12]

Improved Regularization of Convolutional Neural Networks with Cutout

Terrance DeVries and Graham W Taylor, “Im- proved regularization of convolutional neu- ral networks with cutout,” arXiv preprint arXiv:1708.04552, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

Aggregated residual transformations for deep neural net- works,

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He, “Aggregated residual transformations for deep neural net- works,” inComputer Vision and Pattern Recog- nition (CVPR), 2017 IEEE Conference on . IEEE, 2017, pp. 5987–5995

work page 2017
[14]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam, “Mobilenets: Eﬃcient convolutional neural net- works for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Inception-v4, inception-resnet and the impact of residual con- nections on learning.,

Christian Szegedy, Sergey Ioﬀe, Vincent Van- houcke, and Alexander A Alemi, “Inception-v4, inception-resnet and the impact of residual con- nections on learning.,” in AAAI, 2017, vol. 4, p. 12

work page 2017
[16]

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen, “Mobilenetv2: Inverted residuals and linear bot- tlenecks,” arXiv preprint arXiv:1801.04381 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[17]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioﬀe and Christian Szegedy, “Batch nor- malization: Accelerating deep network train- ing by reducing internal covariate shift,”arXiv preprint arXiv:1502.03167, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[18]

Squeeze- and-excitation networks,

Jie Hu, Li Shen, and Gang Sun, “Squeeze- and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition, 2018, pp. 7132–7141

work page 2018
[19]

Edmund K Burke, Graham Kendall, et al., Search methodologies, Springer, 2005

work page 2005
[20]

Dropout: a simple way to prevent neural networks from overﬁtting,

Nitish Srivastava, Geoﬀrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout: a simple way to prevent neural networks from overﬁtting,”The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014

work page 1929
[21]

On the importance of initialization and momentum in deep learning,

Ilya Sutskever, James Martens, George Dahl, and Geoﬀrey Hinton, “On the importance of initialization and momentum in deep learning,” inInternational conference on machine learning, 2013, pp. 1139–1147

work page 2013
[22]

Shake-Shake regularization

Xavier Gastaldi, “Shake-shake regularization,” arXiv preprint arXiv:1705.07485, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[23]

FractalNet: Ultra-Deep Neural Networks without Residuals

Gustav Larsson, Michael Maire, and Gregory Shakhnarovich, “Fractalnet: Ultra-deep neu- ral networks without residuals,”arXiv preprint arXiv:1605.07648, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[24]

Network In Network

Min Lin, Qiang Chen, and Shuicheng Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[25]

Deeply- supervised nets,

Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, and Zhuowen Tu, “Deeply- supervised nets,” in Artiﬁcial Intelligence and Statistics, 2015, pp. 562–570

work page 2015
[26]

Highway Networks

Rupesh Kumar Srivastava, Klaus Greﬀ, and Jür- gen Schmidhuber, “Highway networks,” arXiv preprint arXiv:1505.00387, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[27]

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, and Kurt Keutzer, “Densenet: Implementing ef- ﬁcient convnet descriptor pyramids,” arXiv preprint arXiv:1404.1869, 2014. 7 Appendix 7.1 Code and Implantation The code and implantation of GeneticNAS which used for running the models is available online in the ﬂowing github repo...

work page internal anchor Pith review Pith/arXiv arXiv 2014

[1] [1]

Neural Architecture Search with Reinforcement Learning

Barret Zoph and Quoc V Le, “Neural architec- ture search with reinforcement learning,”arXiv preprint arXiv:1611.01578, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[2] [2]

Learning transferable architec- tures for scalable image recognition,

BarretZoph, VijayVasudevan, JonathonShlens, and Quoc V Le, “Learning transferable architec- tures for scalable image recognition,” inProceed- ings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8697–8710

work page 2018

[3] [3]

Eﬃcient architecture search by network transformation,

Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang, “Eﬃcient architecture search by network transformation,” AAAI, 2018

work page 2018

[4] [4]

Progressive neural architecture search,

Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Mur- phy, “Progressive neural architecture search,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 19–34

work page 2018

[5] [5]

DARTS: Differentiable Architecture Search

Hanxiao Liu, Karen Simonyan, and Yiming Yang, “Darts: Diﬀerentiable architecture search,” arXiv preprint arXiv:1806.09055, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

Efficient Neural Architecture Search via Parameter Sharing

Hieu Pham, Melody Y Guan, Barret Zoph, Quoc V Le, and Jeﬀ Dean, “Eﬃcient neural ar- chitecture search via parameter sharing,”arXiv preprint arXiv:1802.03268, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

Genetic cnn,

L. Xie and A. Yuille, “Genetic cnn,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct 2017, pp. 1388–1397

work page 2017

[8] [8]

Hierarchical Representations for Efficient Architecture Search

Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu, “Hierarchical representations for eﬃcient architecture search,” arXiv preprint arXiv:1711.00436, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[9] [9]

Regularized Evolution for Image Classifier Architecture Search

Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le, “Regularized evolution for im- ageclassiﬁerarchitecturesearch,” arXiv preprint arXiv:1802.01548, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Learn- ing multiple layers of features from tiny images,

Alex Krizhevsky and Geoﬀrey Hinton, “Learn- ing multiple layers of features from tiny images,” Tech. Rep., Citeseer, 2009

work page 2009

[11] [11]

Deep residual learning for image recognition,

KaimingHe, XiangyuZhang, ShaoqingRen, and Jian Sun, “Deep residual learning for image recognition,” inThe IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), June 2016

work page 2016

[12] [12]

Improved Regularization of Convolutional Neural Networks with Cutout

Terrance DeVries and Graham W Taylor, “Im- proved regularization of convolutional neu- ral networks with cutout,” arXiv preprint arXiv:1708.04552, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[13] [13]

Aggregated residual transformations for deep neural net- works,

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He, “Aggregated residual transformations for deep neural net- works,” inComputer Vision and Pattern Recog- nition (CVPR), 2017 IEEE Conference on . IEEE, 2017, pp. 5987–5995

work page 2017

[14] [14]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam, “Mobilenets: Eﬃcient convolutional neural net- works for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Inception-v4, inception-resnet and the impact of residual con- nections on learning.,

Christian Szegedy, Sergey Ioﬀe, Vincent Van- houcke, and Alexander A Alemi, “Inception-v4, inception-resnet and the impact of residual con- nections on learning.,” in AAAI, 2017, vol. 4, p. 12

work page 2017

[16] [16]

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen, “Mobilenetv2: Inverted residuals and linear bot- tlenecks,” arXiv preprint arXiv:1801.04381 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[17] [17]

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioﬀe and Christian Szegedy, “Batch nor- malization: Accelerating deep network train- ing by reducing internal covariate shift,”arXiv preprint arXiv:1502.03167, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[18] [18]

Squeeze- and-excitation networks,

Jie Hu, Li Shen, and Gang Sun, “Squeeze- and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition, 2018, pp. 7132–7141

work page 2018

[19] [19]

Edmund K Burke, Graham Kendall, et al., Search methodologies, Springer, 2005

work page 2005

[20] [20]

Dropout: a simple way to prevent neural networks from overﬁtting,

Nitish Srivastava, Geoﬀrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout: a simple way to prevent neural networks from overﬁtting,”The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014

work page 1929

[21] [21]

On the importance of initialization and momentum in deep learning,

Ilya Sutskever, James Martens, George Dahl, and Geoﬀrey Hinton, “On the importance of initialization and momentum in deep learning,” inInternational conference on machine learning, 2013, pp. 1139–1147

work page 2013

[22] [22]

Shake-Shake regularization

Xavier Gastaldi, “Shake-shake regularization,” arXiv preprint arXiv:1705.07485, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[23] [23]

FractalNet: Ultra-Deep Neural Networks without Residuals

Gustav Larsson, Michael Maire, and Gregory Shakhnarovich, “Fractalnet: Ultra-deep neu- ral networks without residuals,”arXiv preprint arXiv:1605.07648, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[24] [24]

Network In Network

Min Lin, Qiang Chen, and Shuicheng Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[25] [25]

Deeply- supervised nets,

Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, and Zhuowen Tu, “Deeply- supervised nets,” in Artiﬁcial Intelligence and Statistics, 2015, pp. 562–570

work page 2015

[26] [26]

Highway Networks

Rupesh Kumar Srivastava, Klaus Greﬀ, and Jür- gen Schmidhuber, “Highway networks,” arXiv preprint arXiv:1505.00387, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[27] [27]

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

Forrest Iandola, Matt Moskewicz, Sergey Karayev, Ross Girshick, Trevor Darrell, and Kurt Keutzer, “Densenet: Implementing ef- ﬁcient convnet descriptor pyramids,” arXiv preprint arXiv:1404.1869, 2014. 7 Appendix 7.1 Code and Implantation The code and implantation of GeneticNAS which used for running the models is available online in the ﬂowing github repo...

work page internal anchor Pith review Pith/arXiv arXiv 2014