pith. sign in

arxiv: 1907.01939 · v1 · pith:GC7E6QIHnew · submitted 2019-07-03 · 💻 cs.NE

Neural Network Architecture Search with Differentiable Cartesian Genetic Programming for Regression

Pith reviewed 2026-05-25 09:35 UTC · model grok-4.3

classification 💻 cs.NE
keywords neural architecture searchcartesian genetic programmingdifferentiable CGPmemetic algorithmregressionnetwork pruningskip connectionsactivation function adaptation
0
0 comments X

The pith

Differentiable Cartesian Genetic Programming evolves neural network topologies that use fewer parameters and reach lower error on regression tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper encodes neural networks using a differentiable variant of Cartesian Genetic Programming called dCGPANN. A memetic algorithm then lets gradient descent tune the parameters while evolutionary operators on the genes rewire links, prune connections, adapt activation functions, and add skip connections. On chosen regression tasks this process improves the initial feed-forward topology, producing networks that occupy less parameter space and achieve lower error within the same training time.

Core claim

Encoding neural networks in dCGPANN permits a memetic search in which local gradient-based learning optimizes weights while evolutionary operators reshape the architecture, yielding evolved networks that require less space for parameters and reach significantly lower error on regression tasks than the starting feed-forward topology.

What carries the argument

The dCGPANN encoding, which represents network topology as genes that support both gradient descent on parameters and evolutionary modification of structure.

If this is right

  • Pruning and rewiring links reduces the number of parameters needed while maintaining or improving accuracy.
  • Adapting activation functions and inserting skip connections can be learned as part of the same evolutionary process.
  • The combined gradient-plus-evolution loop runs in the same wall-clock time as standard training yet yields lower final error.
  • The approach starts from a simple feed-forward network and discovers improvements without manual architecture design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same encoding might allow architecture search on tasks beyond regression if the gradient signal remains informative.
  • Because the evolutionary step acts on a compact gene representation, the method could scale to larger networks than exhaustive search methods.
  • If the evolved topologies generalize across datasets, the search could be performed once and reused rather than repeated for each new problem.

Load-bearing premise

Evolutionary changes to the dCGPANN genes produce architectures that meaningfully improve training speed and final error beyond the initial feed-forward topology.

What would settle it

A controlled comparison in which randomly rewired and pruned networks of the same size reach error levels statistically indistinguishable from or better than the evolved dCGPANN networks on the same regression tasks within the same training budget.

Figures

Figures reproduced from arXiv: 1907.01939 by Dario Izzo, Marcus M\"artens.

Figure 1
Figure 1. Figure 1: Most widely used form of Cartesian genetic programming, as described by [19]. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Differences between the i-th node in a CGP expres￾sion and in a dCGPANN expression. We define xR as the vector of real numbers: xR = θ = [b0,w0,0,w0,1, ...,w0,a0 ,b1,w1,0,w1,1, ...,w1,a1 , ...] The two vectors xI and xR form the chromosome of our dCGPANN and suffice for the evaluation of the terminal values Oi ,i = 1..m. Note that we also have introduced the symbol θ to indicate xR as the network parameter… view at source ↗
Figure 4
Figure 4. Figure 4: Each block shows one of the four operations of the [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Template network for a problem with 10 input fea [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Numerical values for training and test error of dCGPANNs [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 6
Figure 6. Figure 6: Final evolved topology after 10 iterations of LSMF [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Learning curves for selected regression problems, starting with a 4 layer feed-forward neural network (dark blue) [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Evolution of dCGPANN topology by running the LSMF-algorithm on the problem 344_mv. Leftmost: initial topology [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
read the original abstract

The ability to design complex neural network architectures which enable effective training by stochastic gradient descent has been the key for many achievements in the field of deep learning. However, developing such architectures remains a challenging and resourceintensive process full of trial-and-error iterations. All in all, the relation between the network topology and its ability to model the data remains poorly understood. We propose to encode neural networks with a differentiable variant of Cartesian Genetic Programming (dCGPANN) and present a memetic algorithm for architecture design: local searches with gradient descent learn the network parameters while evolutionary operators act on the dCGPANN genes shaping the network architecture towards faster learning. Studying a particular instance of such a learning scheme, we are able to improve the starting feed forward topology by learning how to rewire and prune links, adapt activation functions and introduce skip connections for chosen regression tasks. The evolved network architectures require less space for network parameters and reach, given the same amount of time, a significantly lower error on average.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes encoding neural networks via a differentiable Cartesian Genetic Programming representation (dCGPANN) and a memetic algorithm that interleaves gradient-descent parameter optimization with evolutionary operators acting on the genes to rewire, prune, adapt activations, and add skip connections. It claims that, starting from a feed-forward topology, the resulting architectures use fewer parameters and achieve significantly lower error on regression tasks when both are allotted the same amount of training time.

Significance. If the time-budget comparison is shown to be fair, the work would demonstrate a practical hybrid evolutionary-gradient method for architecture search that can improve upon fixed topologies while reducing parameter count; the dCGPANN encoding itself is a potentially reusable contribution for making CGP differentiable.

major comments (1)
  1. [Abstract] Abstract: the central claim that evolved architectures reach 'significantly lower error on average' given 'the same amount of time' is load-bearing yet unsupported without an explicit accounting of total gradient evaluations (or wall-clock budget) allocated to the baseline feed-forward network versus the aggregate of all local GD searches performed during the memetic run. Because the algorithm interleaves multiple evolutionary rewirings with separate GD optimizations, any reported improvement could be explained by extra optimization steps rather than by the architectural changes.
minor comments (1)
  1. [Abstract] The abstract contains the compound word 'resourceintensive' without a space.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on ensuring the time-budget comparison is rigorously supported. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that evolved architectures reach 'significantly lower error on average' given 'the same amount of time' is load-bearing yet unsupported without an explicit accounting of total gradient evaluations (or wall-clock budget) allocated to the baseline feed-forward network versus the aggregate of all local GD searches performed during the memetic run. Because the algorithm interleaves multiple evolutionary rewirings with separate GD optimizations, any reported improvement could be explained by extra optimization steps rather than by the architectural changes.

    Authors: We agree that an explicit accounting of total gradient evaluations is required to substantiate the claim and rule out unequal optimization effort. In the revised manuscript we will add a dedicated paragraph (and accompanying table) that reports the aggregate number of gradient steps used by the baseline feed-forward networks and by every local GD search performed inside the memetic runs. This will make clear whether the reported error reductions arise from architectural changes or from additional optimization steps. If the original experiments did not enforce strict budget equivalence, we will either re-run the comparisons under matched budgets or explicitly state the actual budgets employed. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical memetic search results are not definitionally forced

full rationale

The paper describes a memetic algorithm that interleaves evolutionary operators on dCGPANN genes with local gradient-descent parameter updates. Reported improvements in error and parameter count are presented as experimental outcomes on regression tasks, not as quantities derived by algebraic identity or by fitting a parameter that is then renamed a prediction. No equations reduce the final error metric to a function of the search procedure itself, no uniqueness theorem is invoked via self-citation to force the architecture class, and the comparison to the initial feed-forward topology is framed as an empirical test rather than a tautology. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms or invented entities; all such elements remain unknown.

pith-pipeline@v0.9.0 · 5699 in / 1032 out tokens · 34510 ms · 2026-05-25T09:35:17.851852+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 6 internal anchors

  1. [1]

    Laurence Ashmore and Julian Francis Miller. 2003. Evolutionary Art with Carte- sian Genetic Programming. Technical Report (2003)

  2. [2]

    Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. 2016. Designing Neural Network Architectures using Reinforcement Learning. arXiv:cs.LG/1611.02167

  3. [3]

    Mina Basirat and Peter M. Roth. 2018. The Quest for the Golden Activation Function. arXiv:cs.NE/1808.00783

  4. [4]

    Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Efficient Architecture Search by Network Transformation. AAAI

  5. [5]

    Bogdan Draganski, Christian Gaser, Volker Busch, Gerhard Schuierer, Ulrich Bogdahn, and Arne May. 2004. Neuroplasticity: changes in grey matter induced by training. Nature 427, 6972 (2004), 311

  6. [6]

    Z Emigdio, Leonardo Trujillo, Oliver Schütze, Pierrick Legrand, et al. 2015. A local search approach to genetic programming for binary classification. In Proceedings of the 2015 on Genetic and Evolutionary Computation Conference-GECCO’15

  7. [7]

    Jiemin Fang, Yukang Chen, Xinbang Zhang, Qian Zhang, Chang Huang, Gaofeng Meng, Wenyu Liu, and Xinggang Wang. 2019. EAT-NAS: Elastic Ar- chitecture Transfer for Accelerating Large-scale Neural Architecture Search. arXiv:cs.CV/1901.05884

  8. [8]

    Isabelle Guyon, Imad Chaabane, Hugo Jair Escalante, Sergio Escalera, Damir Jajetic, James Robert Lloyd, Núria Macià, Bisakha Ray, Lukasz Romaszko, Michèle Sebag, et al. 2016. A brief review of the ChaLearn AutoML challenge: any-time any-dataset learning without human intervention. In Workshop on Automatic Machine Learning. 21–30

  9. [9]

    Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems. 1135–1143

  10. [10]

    Simon Harding and Julian Francis Miller. 2005. Evolution of Robot Controller Using Cartesian Genetic Programming. In European Conference on Genetic Pro- gramming. Springer, 62–73

  11. [11]

    Demis Hassabis, Dharshan Kumaran, Christopher Summerfield, and Matthew Botvinick. 2017. Neuroscience-inspired artificial intelligence. Neuron 95, 2 (2017), 245–258

  12. [12]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition . 770–778

  13. [13]

    Sepp Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6, 02 (1998), 107–116

  14. [14]

    Weinberger

    Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger

  15. [15]

    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 2261–2269

    Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 2261–2269

  16. [16]

    Dario Izzo, Francesco Biscani, and Alessio Mereta. 2017. Differentiable genetic programming. In European Conference on Genetic Programming . Springer, 35–51

  17. [17]

    Maryam Mahsal Khan, Arbab Masood Ahmad, Gul Muhammad Khan, and Julian F Miller. 2013. Fast learning neural networks using Cartesian genetic programming. Neurocomputing 121 (2013), 274–289

  18. [18]

    Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Daniel Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duffy, et al. 2019. Evolving deep neural networks. In Artificial Intelligence in the Age of Neural Networks and Brain Computing . Elsevier, 293–312

  19. [19]

    Julian F. Miller. 1999. Evolution of Digital Filters Using a Gate Array Model. In Evolutionary Image Analysis, Signal Processing and Telecommunications , Riccardo Poli, Hans-Michael Voigt, Stefano Cagnoni, David Corne, George D. Smith, and Terence C. Fogarty (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 17–30

  20. [20]

    Julian F Miller. 2011. Cartesian genetic programming. In Cartesian Genetic Programming. Springer, 17–34

  21. [21]

    Pablo Moscato, Carlos Cotta, and Alexandre Mendes. 2004. Memetic Algorithms. Springer Berlin Heidelberg, Berlin, Heidelberg, 53–85

  22. [22]

    Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve re- stricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) . 807–814

  23. [23]

    Varun Kumar Ojha, Ajith Abraham, and Václav Snásel. 2017. Metaheuristic design of feedforward neural networks: A review of two decades of research. Engineering Applications of Artificial Intelligence 60 (2017), 97–116

  24. [24]

    Olson, William La Cava, Patryk Orzechowski, Ryan J

    Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, 1 (11 Dec 2017), 36

  25. [25]

    Prajit Ramachandran, Barret Zoph, and Quoc V. Le. 2018. Searching for activation functions. In 6th International Conference on Learning Representations ICLR 2018

  26. [26]

    Joseph P Rauschecker and Wolf Singer. 1981. The effects of early visual experience on the cat’s visual cortex and their possible explanation by Hebb synapses. The Journal of physiology 310, 1 (1981), 215–239

  27. [27]

    Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Sue- matsu, Jie Tan, Quoc Le, and Alex Kurakin. 2017. Large-Scale Evolution of Image Classifiers. arXiv:cs.NE/1703.01041 8

  28. [28]

    Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical Bayesian Optimization of Machine Learning Algorithms. InAdvances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2951–2959

  29. [29]

    Andrea Soltoggio, Kenneth O Stanley, and Sebastian Risi. 2018. Born to learn: the inspiration, progress, and future of evolved plastic artificial neural networks. Neural Networks (2018)

  30. [30]

    Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15 (2014), 1929–1958

  31. [31]

    Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway Networks. arXiv:cs.LG/1505.00387

  32. [32]

    Kenneth O Stanley, David B D’Ambrosio, and Jason Gauci. 2009. A hypercube- based encoding for evolving large-scale neural networks. Artificial life 15, 2 (2009), 185–212

  33. [33]

    Kenneth O Stanley and Risto Miikkulainen. 2002. Evolving neural networks through augmenting topologies. Evolutionary computation 10, 2 (2002), 99–127

  34. [34]

    Masanori Suganuma, Shinichi Shirakawa, and Tomoharu Nagao. 2017. A genetic programming approach to designing convolutional neural network architectures. In Proceedings of the Genetic and Evolutionary Computation Conference . ACM, 497–504

  35. [35]

    Alexander Topchy and William F Punch. 2001. Faster genetic programming based on local gradient search of numeric leaf values. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation . Morgan Kaufmann Publishers Inc., 155–162

  36. [36]

    Andrew James Turner and Julian Francis Miller. 2013. Cartesian genetic program- ming encoded artificial neural networks: a comparison using three benchmarks. In Proceedings of the 15th annual conference on Genetic and evolutionary computa- tion. ACM, 1005–1012

  37. [37]

    Zdenek Vasicek. 2015. Cartesian gp in optimization of combinational circuits with hundreds of inputs and thousands of gates. In European Conference on Genetic Programming (2015-01-01). Springer, Springer, 139–150

  38. [38]

    Ricardo Vilalta and Youssef Drissi. 2002. A Perspective View and Survey of Meta-Learning. Artificial Intelligence Review 18, 2 (01 Jun 2002), 77–95

  39. [39]

    Barret Zoph and Quoc V. Le. 2016. Neural Architecture Search with Reinforcement Learning. arXiv:cs.LG/1611.01578 9