Neural Network Architecture Search with Differentiable Cartesian Genetic Programming for Regression

Dario Izzo; Marcus M\"artens

arxiv: 1907.01939 · v1 · pith:GC7E6QIHnew · submitted 2019-07-03 · 💻 cs.NE

Neural Network Architecture Search with Differentiable Cartesian Genetic Programming for Regression

Marcus M\"artens , Dario Izzo This is my paper

Pith reviewed 2026-05-25 09:35 UTC · model grok-4.3

classification 💻 cs.NE

keywords neural architecture searchcartesian genetic programmingdifferentiable CGPmemetic algorithmregressionnetwork pruningskip connectionsactivation function adaptation

0 comments

The pith

Differentiable Cartesian Genetic Programming evolves neural network topologies that use fewer parameters and reach lower error on regression tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper encodes neural networks using a differentiable variant of Cartesian Genetic Programming called dCGPANN. A memetic algorithm then lets gradient descent tune the parameters while evolutionary operators on the genes rewire links, prune connections, adapt activation functions, and add skip connections. On chosen regression tasks this process improves the initial feed-forward topology, producing networks that occupy less parameter space and achieve lower error within the same training time.

Core claim

Encoding neural networks in dCGPANN permits a memetic search in which local gradient-based learning optimizes weights while evolutionary operators reshape the architecture, yielding evolved networks that require less space for parameters and reach significantly lower error on regression tasks than the starting feed-forward topology.

What carries the argument

The dCGPANN encoding, which represents network topology as genes that support both gradient descent on parameters and evolutionary modification of structure.

If this is right

Pruning and rewiring links reduces the number of parameters needed while maintaining or improving accuracy.
Adapting activation functions and inserting skip connections can be learned as part of the same evolutionary process.
The combined gradient-plus-evolution loop runs in the same wall-clock time as standard training yet yields lower final error.
The approach starts from a simple feed-forward network and discovers improvements without manual architecture design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same encoding might allow architecture search on tasks beyond regression if the gradient signal remains informative.
Because the evolutionary step acts on a compact gene representation, the method could scale to larger networks than exhaustive search methods.
If the evolved topologies generalize across datasets, the search could be performed once and reused rather than repeated for each new problem.

Load-bearing premise

Evolutionary changes to the dCGPANN genes produce architectures that meaningfully improve training speed and final error beyond the initial feed-forward topology.

What would settle it

A controlled comparison in which randomly rewired and pruned networks of the same size reach error levels statistically indistinguishable from or better than the evolved dCGPANN networks on the same regression tasks within the same training budget.

Figures

Figures reproduced from arXiv: 1907.01939 by Dario Izzo, Marcus M\"artens.

**Figure 2.** Figure 2: Differences between the i-th node in a CGP expression and in a dCGPANN expression. We define xR as the vector of real numbers: xR = θ = [b0,w0,0,w0,1, ...,w0,a0 ,b1,w1,0,w1,1, ...,w1,a1 , ...] The two vectors xI and xR form the chromosome of our dCGPANN and suffice for the evaluation of the terminal values Oi ,i = 1..m. Note that we also have introduced the symbol θ to indicate xR as the network parameter… view at source ↗

**Figure 4.** Figure 4: Each block shows one of the four operations of the [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Template network for a problem with 10 input fea [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 7.** Figure 7: Numerical values for training and test error of dCGPANNs [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 6.** Figure 6: Final evolved topology after 10 iterations of LSMF [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Learning curves for selected regression problems, starting with a 4 layer feed-forward neural network (dark blue) [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Evolution of dCGPANN topology by running the LSMF-algorithm on the problem 344_mv. Leftmost: initial topology [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

read the original abstract

The ability to design complex neural network architectures which enable effective training by stochastic gradient descent has been the key for many achievements in the field of deep learning. However, developing such architectures remains a challenging and resourceintensive process full of trial-and-error iterations. All in all, the relation between the network topology and its ability to model the data remains poorly understood. We propose to encode neural networks with a differentiable variant of Cartesian Genetic Programming (dCGPANN) and present a memetic algorithm for architecture design: local searches with gradient descent learn the network parameters while evolutionary operators act on the dCGPANN genes shaping the network architecture towards faster learning. Studying a particular instance of such a learning scheme, we are able to improve the starting feed forward topology by learning how to rewire and prune links, adapt activation functions and introduce skip connections for chosen regression tasks. The evolved network architectures require less space for network parameters and reach, given the same amount of time, a significantly lower error on average.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

dCGPANN pairs a differentiable CGP encoding with memetic evolution for regression nets, but the time-equivalence claim needs explicit controls on total gradient steps.

read the letter

The main takeaway is that this work encodes neural nets via a differentiable Cartesian Genetic Programming variant (dCGPANN) and runs a memetic loop: evolutionary operators reshape the genes for topology while local gradient descent tunes the weights. They apply it to regression by evolving rewiring, pruning, activation changes, and skip connections from a starting feed-forward net, claiming the results are more compact and reach lower error in the same time budget. That hybrid setup is the concrete new element here; prior CGP work on nets existed, but the differentiable encoding plus this specific memetic combination for regression is not just a restatement. The approach has the practical merit of letting evolution handle structure while keeping parameter learning cheap via gradients, which could matter for finding smaller models without full black-box search. The experiments appear focused on regression tasks, which fits the method. The soft spot is the central comparison. The abstract ties the error drop to the same amount of time, yet the memetic process runs multiple local GD searches interleaved with evolutionary steps. Without a clear statement that the baseline feed-forward net receives an equivalent total number of parameter updates or wall-clock budget, the reported gain could trace to extra optimization effort rather than the architectural changes. If the full paper includes ablations on individual operators and reports total gradient counts, that would address it directly; otherwise the claim stays hard to interpret. This is aimed at people already working in neuroevolution or hybrid NAS for regression and similar tasks. A reader looking for concrete encodings that mix evolution and gradients could extract usable ideas even if the results need tighter verification. It is worth sending to peer review so referees can check the budget controls and dataset details rather than desk-rejecting on the abstract alone.

Referee Report

1 major / 1 minor

Summary. The paper proposes encoding neural networks via a differentiable Cartesian Genetic Programming representation (dCGPANN) and a memetic algorithm that interleaves gradient-descent parameter optimization with evolutionary operators acting on the genes to rewire, prune, adapt activations, and add skip connections. It claims that, starting from a feed-forward topology, the resulting architectures use fewer parameters and achieve significantly lower error on regression tasks when both are allotted the same amount of training time.

Significance. If the time-budget comparison is shown to be fair, the work would demonstrate a practical hybrid evolutionary-gradient method for architecture search that can improve upon fixed topologies while reducing parameter count; the dCGPANN encoding itself is a potentially reusable contribution for making CGP differentiable.

major comments (1)

[Abstract] Abstract: the central claim that evolved architectures reach 'significantly lower error on average' given 'the same amount of time' is load-bearing yet unsupported without an explicit accounting of total gradient evaluations (or wall-clock budget) allocated to the baseline feed-forward network versus the aggregate of all local GD searches performed during the memetic run. Because the algorithm interleaves multiple evolutionary rewirings with separate GD optimizations, any reported improvement could be explained by extra optimization steps rather than by the architectural changes.

minor comments (1)

[Abstract] The abstract contains the compound word 'resourceintensive' without a space.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on ensuring the time-budget comparison is rigorously supported. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that evolved architectures reach 'significantly lower error on average' given 'the same amount of time' is load-bearing yet unsupported without an explicit accounting of total gradient evaluations (or wall-clock budget) allocated to the baseline feed-forward network versus the aggregate of all local GD searches performed during the memetic run. Because the algorithm interleaves multiple evolutionary rewirings with separate GD optimizations, any reported improvement could be explained by extra optimization steps rather than by the architectural changes.

Authors: We agree that an explicit accounting of total gradient evaluations is required to substantiate the claim and rule out unequal optimization effort. In the revised manuscript we will add a dedicated paragraph (and accompanying table) that reports the aggregate number of gradient steps used by the baseline feed-forward networks and by every local GD search performed inside the memetic runs. This will make clear whether the reported error reductions arise from architectural changes or from additional optimization steps. If the original experiments did not enforce strict budget equivalence, we will either re-run the comparisons under matched budgets or explicitly state the actual budgets employed. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical memetic search results are not definitionally forced

full rationale

The paper describes a memetic algorithm that interleaves evolutionary operators on dCGPANN genes with local gradient-descent parameter updates. Reported improvements in error and parameter count are presented as experimental outcomes on regression tasks, not as quantities derived by algebraic identity or by fitting a parameter that is then renamed a prediction. No equations reduce the final error metric to a function of the search procedure itself, no uniqueness theorem is invoked via self-citation to force the architecture class, and the comparison to the initial feed-forward topology is framed as an empirical test rather than a tautology. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms or invented entities; all such elements remain unknown.

pith-pipeline@v0.9.0 · 5699 in / 1032 out tokens · 34510 ms · 2026-05-25T09:35:17.851852+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 6 internal anchors

[1]

Laurence Ashmore and Julian Francis Miller. 2003. Evolutionary Art with Carte- sian Genetic Programming. Technical Report (2003)

work page 2003
[2]

Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. 2016. Designing Neural Network Architectures using Reinforcement Learning. arXiv:cs.LG/1611.02167

work page internal anchor Pith review Pith/arXiv arXiv 2016
[3]

Mina Basirat and Peter M. Roth. 2018. The Quest for the Golden Activation Function. arXiv:cs.NE/1808.00783

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Efficient Architecture Search by Network Transformation. AAAI

work page 2018
[5]

Bogdan Draganski, Christian Gaser, Volker Busch, Gerhard Schuierer, Ulrich Bogdahn, and Arne May. 2004. Neuroplasticity: changes in grey matter induced by training. Nature 427, 6972 (2004), 311

work page 2004
[6]

Z Emigdio, Leonardo Trujillo, Oliver Schütze, Pierrick Legrand, et al. 2015. A local search approach to genetic programming for binary classification. In Proceedings of the 2015 on Genetic and Evolutionary Computation Conference-GECCO’15

work page 2015
[7]

Jiemin Fang, Yukang Chen, Xinbang Zhang, Qian Zhang, Chang Huang, Gaofeng Meng, Wenyu Liu, and Xinggang Wang. 2019. EAT-NAS: Elastic Ar- chitecture Transfer for Accelerating Large-scale Neural Architecture Search. arXiv:cs.CV/1901.05884

work page internal anchor Pith review Pith/arXiv arXiv 2019
[8]

Isabelle Guyon, Imad Chaabane, Hugo Jair Escalante, Sergio Escalera, Damir Jajetic, James Robert Lloyd, Núria Macià, Bisakha Ray, Lukasz Romaszko, Michèle Sebag, et al. 2016. A brief review of the ChaLearn AutoML challenge: any-time any-dataset learning without human intervention. In Workshop on Automatic Machine Learning. 21–30

work page 2016
[9]

Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems. 1135–1143

work page 2015
[10]

Simon Harding and Julian Francis Miller. 2005. Evolution of Robot Controller Using Cartesian Genetic Programming. In European Conference on Genetic Pro- gramming. Springer, 62–73

work page 2005
[11]

Demis Hassabis, Dharshan Kumaran, Christopher Summerfield, and Matthew Botvinick. 2017. Neuroscience-inspired artificial intelligence. Neuron 95, 2 (2017), 245–258

work page 2017
[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition . 770–778

work page 2016
[13]

Sepp Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6, 02 (1998), 107–116

work page 1998
[14]

Weinberger

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger

work page
[15]

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 2261–2269

Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 2261–2269

work page 2017
[16]

Dario Izzo, Francesco Biscani, and Alessio Mereta. 2017. Differentiable genetic programming. In European Conference on Genetic Programming . Springer, 35–51

work page 2017
[17]

Maryam Mahsal Khan, Arbab Masood Ahmad, Gul Muhammad Khan, and Julian F Miller. 2013. Fast learning neural networks using Cartesian genetic programming. Neurocomputing 121 (2013), 274–289

work page 2013
[18]

Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Daniel Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duffy, et al. 2019. Evolving deep neural networks. In Artificial Intelligence in the Age of Neural Networks and Brain Computing . Elsevier, 293–312

work page 2019
[19]

Julian F. Miller. 1999. Evolution of Digital Filters Using a Gate Array Model. In Evolutionary Image Analysis, Signal Processing and Telecommunications , Riccardo Poli, Hans-Michael Voigt, Stefano Cagnoni, David Corne, George D. Smith, and Terence C. Fogarty (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 17–30

work page 1999
[20]

Julian F Miller. 2011. Cartesian genetic programming. In Cartesian Genetic Programming. Springer, 17–34

work page 2011
[21]

Pablo Moscato, Carlos Cotta, and Alexandre Mendes. 2004. Memetic Algorithms. Springer Berlin Heidelberg, Berlin, Heidelberg, 53–85

work page 2004
[22]

Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve re- stricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) . 807–814

work page 2010
[23]

Varun Kumar Ojha, Ajith Abraham, and Václav Snásel. 2017. Metaheuristic design of feedforward neural networks: A review of two decades of research. Engineering Applications of Artificial Intelligence 60 (2017), 97–116

work page 2017
[24]

Olson, William La Cava, Patryk Orzechowski, Ryan J

Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, 1 (11 Dec 2017), 36

work page 2017
[25]

Prajit Ramachandran, Barret Zoph, and Quoc V. Le. 2018. Searching for activation functions. In 6th International Conference on Learning Representations ICLR 2018

work page 2018
[26]

Joseph P Rauschecker and Wolf Singer. 1981. The effects of early visual experience on the cat’s visual cortex and their possible explanation by Hebb synapses. The Journal of physiology 310, 1 (1981), 215–239

work page 1981
[27]

Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Sue- matsu, Jie Tan, Quoc Le, and Alex Kurakin. 2017. Large-Scale Evolution of Image Classifiers. arXiv:cs.NE/1703.01041 8

work page internal anchor Pith review Pith/arXiv arXiv 2017
[28]

Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical Bayesian Optimization of Machine Learning Algorithms. InAdvances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2951–2959

work page 2012
[29]

Andrea Soltoggio, Kenneth O Stanley, and Sebastian Risi. 2018. Born to learn: the inspiration, progress, and future of evolved plastic artificial neural networks. Neural Networks (2018)

work page 2018
[30]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15 (2014), 1929–1958

work page 2014
[31]

Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway Networks. arXiv:cs.LG/1505.00387

work page internal anchor Pith review Pith/arXiv arXiv 2015
[32]

Kenneth O Stanley, David B D’Ambrosio, and Jason Gauci. 2009. A hypercube- based encoding for evolving large-scale neural networks. Artificial life 15, 2 (2009), 185–212

work page 2009
[33]

Kenneth O Stanley and Risto Miikkulainen. 2002. Evolving neural networks through augmenting topologies. Evolutionary computation 10, 2 (2002), 99–127

work page 2002
[34]

Masanori Suganuma, Shinichi Shirakawa, and Tomoharu Nagao. 2017. A genetic programming approach to designing convolutional neural network architectures. In Proceedings of the Genetic and Evolutionary Computation Conference . ACM, 497–504

work page 2017
[35]

Alexander Topchy and William F Punch. 2001. Faster genetic programming based on local gradient search of numeric leaf values. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation . Morgan Kaufmann Publishers Inc., 155–162

work page 2001
[36]

Andrew James Turner and Julian Francis Miller. 2013. Cartesian genetic program- ming encoded artificial neural networks: a comparison using three benchmarks. In Proceedings of the 15th annual conference on Genetic and evolutionary computa- tion. ACM, 1005–1012

work page 2013
[37]

Zdenek Vasicek. 2015. Cartesian gp in optimization of combinational circuits with hundreds of inputs and thousands of gates. In European Conference on Genetic Programming (2015-01-01). Springer, Springer, 139–150

work page 2015
[38]

Ricardo Vilalta and Youssef Drissi. 2002. A Perspective View and Survey of Meta-Learning. Artificial Intelligence Review 18, 2 (01 Jun 2002), 77–95

work page 2002
[39]

Barret Zoph and Quoc V. Le. 2016. Neural Architecture Search with Reinforcement Learning. arXiv:cs.LG/1611.01578 9

work page internal anchor Pith review Pith/arXiv arXiv 2016

[1] [1]

Laurence Ashmore and Julian Francis Miller. 2003. Evolutionary Art with Carte- sian Genetic Programming. Technical Report (2003)

work page 2003

[2] [2]

Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. 2016. Designing Neural Network Architectures using Reinforcement Learning. arXiv:cs.LG/1611.02167

work page internal anchor Pith review Pith/arXiv arXiv 2016

[3] [3]

Mina Basirat and Peter M. Roth. 2018. The Quest for the Golden Activation Function. arXiv:cs.NE/1808.00783

work page internal anchor Pith review Pith/arXiv arXiv 2018

[4] [4]

Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Efficient Architecture Search by Network Transformation. AAAI

work page 2018

[5] [5]

Bogdan Draganski, Christian Gaser, Volker Busch, Gerhard Schuierer, Ulrich Bogdahn, and Arne May. 2004. Neuroplasticity: changes in grey matter induced by training. Nature 427, 6972 (2004), 311

work page 2004

[6] [6]

Z Emigdio, Leonardo Trujillo, Oliver Schütze, Pierrick Legrand, et al. 2015. A local search approach to genetic programming for binary classification. In Proceedings of the 2015 on Genetic and Evolutionary Computation Conference-GECCO’15

work page 2015

[7] [7]

Jiemin Fang, Yukang Chen, Xinbang Zhang, Qian Zhang, Chang Huang, Gaofeng Meng, Wenyu Liu, and Xinggang Wang. 2019. EAT-NAS: Elastic Ar- chitecture Transfer for Accelerating Large-scale Neural Architecture Search. arXiv:cs.CV/1901.05884

work page internal anchor Pith review Pith/arXiv arXiv 2019

[8] [8]

Isabelle Guyon, Imad Chaabane, Hugo Jair Escalante, Sergio Escalera, Damir Jajetic, James Robert Lloyd, Núria Macià, Bisakha Ray, Lukasz Romaszko, Michèle Sebag, et al. 2016. A brief review of the ChaLearn AutoML challenge: any-time any-dataset learning without human intervention. In Workshop on Automatic Machine Learning. 21–30

work page 2016

[9] [9]

Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems. 1135–1143

work page 2015

[10] [10]

Simon Harding and Julian Francis Miller. 2005. Evolution of Robot Controller Using Cartesian Genetic Programming. In European Conference on Genetic Pro- gramming. Springer, 62–73

work page 2005

[11] [11]

Demis Hassabis, Dharshan Kumaran, Christopher Summerfield, and Matthew Botvinick. 2017. Neuroscience-inspired artificial intelligence. Neuron 95, 2 (2017), 245–258

work page 2017

[12] [12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition . 770–778

work page 2016

[13] [13]

Sepp Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6, 02 (1998), 107–116

work page 1998

[14] [14]

Weinberger

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger

work page

[15] [15]

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 2261–2269

Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 2261–2269

work page 2017

[16] [16]

Dario Izzo, Francesco Biscani, and Alessio Mereta. 2017. Differentiable genetic programming. In European Conference on Genetic Programming . Springer, 35–51

work page 2017

[17] [17]

Maryam Mahsal Khan, Arbab Masood Ahmad, Gul Muhammad Khan, and Julian F Miller. 2013. Fast learning neural networks using Cartesian genetic programming. Neurocomputing 121 (2013), 274–289

work page 2013

[18] [18]

Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Daniel Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duffy, et al. 2019. Evolving deep neural networks. In Artificial Intelligence in the Age of Neural Networks and Brain Computing . Elsevier, 293–312

work page 2019

[19] [19]

Julian F. Miller. 1999. Evolution of Digital Filters Using a Gate Array Model. In Evolutionary Image Analysis, Signal Processing and Telecommunications , Riccardo Poli, Hans-Michael Voigt, Stefano Cagnoni, David Corne, George D. Smith, and Terence C. Fogarty (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 17–30

work page 1999

[20] [20]

Julian F Miller. 2011. Cartesian genetic programming. In Cartesian Genetic Programming. Springer, 17–34

work page 2011

[21] [21]

Pablo Moscato, Carlos Cotta, and Alexandre Mendes. 2004. Memetic Algorithms. Springer Berlin Heidelberg, Berlin, Heidelberg, 53–85

work page 2004

[22] [22]

Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve re- stricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) . 807–814

work page 2010

[23] [23]

Varun Kumar Ojha, Ajith Abraham, and Václav Snásel. 2017. Metaheuristic design of feedforward neural networks: A review of two decades of research. Engineering Applications of Artificial Intelligence 60 (2017), 97–116

work page 2017

[24] [24]

Olson, William La Cava, Patryk Orzechowski, Ryan J

Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, 1 (11 Dec 2017), 36

work page 2017

[25] [25]

Prajit Ramachandran, Barret Zoph, and Quoc V. Le. 2018. Searching for activation functions. In 6th International Conference on Learning Representations ICLR 2018

work page 2018

[26] [26]

Joseph P Rauschecker and Wolf Singer. 1981. The effects of early visual experience on the cat’s visual cortex and their possible explanation by Hebb synapses. The Journal of physiology 310, 1 (1981), 215–239

work page 1981

[27] [27]

Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Sue- matsu, Jie Tan, Quoc Le, and Alex Kurakin. 2017. Large-Scale Evolution of Image Classifiers. arXiv:cs.NE/1703.01041 8

work page internal anchor Pith review Pith/arXiv arXiv 2017

[28] [28]

Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical Bayesian Optimization of Machine Learning Algorithms. InAdvances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2951–2959

work page 2012

[29] [29]

Andrea Soltoggio, Kenneth O Stanley, and Sebastian Risi. 2018. Born to learn: the inspiration, progress, and future of evolved plastic artificial neural networks. Neural Networks (2018)

work page 2018

[30] [30]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15 (2014), 1929–1958

work page 2014

[31] [31]

Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway Networks. arXiv:cs.LG/1505.00387

work page internal anchor Pith review Pith/arXiv arXiv 2015

[32] [32]

Kenneth O Stanley, David B D’Ambrosio, and Jason Gauci. 2009. A hypercube- based encoding for evolving large-scale neural networks. Artificial life 15, 2 (2009), 185–212

work page 2009

[33] [33]

Kenneth O Stanley and Risto Miikkulainen. 2002. Evolving neural networks through augmenting topologies. Evolutionary computation 10, 2 (2002), 99–127

work page 2002

[34] [34]

Masanori Suganuma, Shinichi Shirakawa, and Tomoharu Nagao. 2017. A genetic programming approach to designing convolutional neural network architectures. In Proceedings of the Genetic and Evolutionary Computation Conference . ACM, 497–504

work page 2017

[35] [35]

Alexander Topchy and William F Punch. 2001. Faster genetic programming based on local gradient search of numeric leaf values. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation . Morgan Kaufmann Publishers Inc., 155–162

work page 2001

[36] [36]

Andrew James Turner and Julian Francis Miller. 2013. Cartesian genetic program- ming encoded artificial neural networks: a comparison using three benchmarks. In Proceedings of the 15th annual conference on Genetic and evolutionary computa- tion. ACM, 1005–1012

work page 2013

[37] [37]

Zdenek Vasicek. 2015. Cartesian gp in optimization of combinational circuits with hundreds of inputs and thousands of gates. In European Conference on Genetic Programming (2015-01-01). Springer, Springer, 139–150

work page 2015

[38] [38]

Ricardo Vilalta and Youssef Drissi. 2002. A Perspective View and Survey of Meta-Learning. Artificial Intelligence Review 18, 2 (01 Jun 2002), 77–95

work page 2002

[39] [39]

Barret Zoph and Quoc V. Le. 2016. Neural Architecture Search with Reinforcement Learning. arXiv:cs.LG/1611.01578 9

work page internal anchor Pith review Pith/arXiv arXiv 2016