Neural Network Architecture Search with Differentiable Cartesian Genetic Programming for Regression
Pith reviewed 2026-05-25 09:35 UTC · model grok-4.3
The pith
Differentiable Cartesian Genetic Programming evolves neural network topologies that use fewer parameters and reach lower error on regression tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Encoding neural networks in dCGPANN permits a memetic search in which local gradient-based learning optimizes weights while evolutionary operators reshape the architecture, yielding evolved networks that require less space for parameters and reach significantly lower error on regression tasks than the starting feed-forward topology.
What carries the argument
The dCGPANN encoding, which represents network topology as genes that support both gradient descent on parameters and evolutionary modification of structure.
If this is right
- Pruning and rewiring links reduces the number of parameters needed while maintaining or improving accuracy.
- Adapting activation functions and inserting skip connections can be learned as part of the same evolutionary process.
- The combined gradient-plus-evolution loop runs in the same wall-clock time as standard training yet yields lower final error.
- The approach starts from a simple feed-forward network and discovers improvements without manual architecture design.
Where Pith is reading between the lines
- The same encoding might allow architecture search on tasks beyond regression if the gradient signal remains informative.
- Because the evolutionary step acts on a compact gene representation, the method could scale to larger networks than exhaustive search methods.
- If the evolved topologies generalize across datasets, the search could be performed once and reused rather than repeated for each new problem.
Load-bearing premise
Evolutionary changes to the dCGPANN genes produce architectures that meaningfully improve training speed and final error beyond the initial feed-forward topology.
What would settle it
A controlled comparison in which randomly rewired and pruned networks of the same size reach error levels statistically indistinguishable from or better than the evolved dCGPANN networks on the same regression tasks within the same training budget.
Figures
read the original abstract
The ability to design complex neural network architectures which enable effective training by stochastic gradient descent has been the key for many achievements in the field of deep learning. However, developing such architectures remains a challenging and resourceintensive process full of trial-and-error iterations. All in all, the relation between the network topology and its ability to model the data remains poorly understood. We propose to encode neural networks with a differentiable variant of Cartesian Genetic Programming (dCGPANN) and present a memetic algorithm for architecture design: local searches with gradient descent learn the network parameters while evolutionary operators act on the dCGPANN genes shaping the network architecture towards faster learning. Studying a particular instance of such a learning scheme, we are able to improve the starting feed forward topology by learning how to rewire and prune links, adapt activation functions and introduce skip connections for chosen regression tasks. The evolved network architectures require less space for network parameters and reach, given the same amount of time, a significantly lower error on average.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes encoding neural networks via a differentiable Cartesian Genetic Programming representation (dCGPANN) and a memetic algorithm that interleaves gradient-descent parameter optimization with evolutionary operators acting on the genes to rewire, prune, adapt activations, and add skip connections. It claims that, starting from a feed-forward topology, the resulting architectures use fewer parameters and achieve significantly lower error on regression tasks when both are allotted the same amount of training time.
Significance. If the time-budget comparison is shown to be fair, the work would demonstrate a practical hybrid evolutionary-gradient method for architecture search that can improve upon fixed topologies while reducing parameter count; the dCGPANN encoding itself is a potentially reusable contribution for making CGP differentiable.
major comments (1)
- [Abstract] Abstract: the central claim that evolved architectures reach 'significantly lower error on average' given 'the same amount of time' is load-bearing yet unsupported without an explicit accounting of total gradient evaluations (or wall-clock budget) allocated to the baseline feed-forward network versus the aggregate of all local GD searches performed during the memetic run. Because the algorithm interleaves multiple evolutionary rewirings with separate GD optimizations, any reported improvement could be explained by extra optimization steps rather than by the architectural changes.
minor comments (1)
- [Abstract] The abstract contains the compound word 'resourceintensive' without a space.
Simulated Author's Rebuttal
We thank the referee for the constructive comment on ensuring the time-budget comparison is rigorously supported. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that evolved architectures reach 'significantly lower error on average' given 'the same amount of time' is load-bearing yet unsupported without an explicit accounting of total gradient evaluations (or wall-clock budget) allocated to the baseline feed-forward network versus the aggregate of all local GD searches performed during the memetic run. Because the algorithm interleaves multiple evolutionary rewirings with separate GD optimizations, any reported improvement could be explained by extra optimization steps rather than by the architectural changes.
Authors: We agree that an explicit accounting of total gradient evaluations is required to substantiate the claim and rule out unequal optimization effort. In the revised manuscript we will add a dedicated paragraph (and accompanying table) that reports the aggregate number of gradient steps used by the baseline feed-forward networks and by every local GD search performed inside the memetic runs. This will make clear whether the reported error reductions arise from architectural changes or from additional optimization steps. If the original experiments did not enforce strict budget equivalence, we will either re-run the comparisons under matched budgets or explicitly state the actual budgets employed. revision: yes
Circularity Check
No circularity: empirical memetic search results are not definitionally forced
full rationale
The paper describes a memetic algorithm that interleaves evolutionary operators on dCGPANN genes with local gradient-descent parameter updates. Reported improvements in error and parameter count are presented as experimental outcomes on regression tasks, not as quantities derived by algebraic identity or by fitting a parameter that is then renamed a prediction. No equations reduce the final error metric to a function of the search procedure itself, no uniqueness theorem is invoked via self-citation to force the architecture class, and the comparison to the initial feed-forward topology is framed as an empirical test rather than a tautology. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Laurence Ashmore and Julian Francis Miller. 2003. Evolutionary Art with Carte- sian Genetic Programming. Technical Report (2003)
work page 2003
-
[2]
Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. 2016. Designing Neural Network Architectures using Reinforcement Learning. arXiv:cs.LG/1611.02167
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[3]
Mina Basirat and Peter M. Roth. 2018. The Quest for the Golden Activation Function. arXiv:cs.NE/1808.00783
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[4]
Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Efficient Architecture Search by Network Transformation. AAAI
work page 2018
-
[5]
Bogdan Draganski, Christian Gaser, Volker Busch, Gerhard Schuierer, Ulrich Bogdahn, and Arne May. 2004. Neuroplasticity: changes in grey matter induced by training. Nature 427, 6972 (2004), 311
work page 2004
-
[6]
Z Emigdio, Leonardo Trujillo, Oliver Schütze, Pierrick Legrand, et al. 2015. A local search approach to genetic programming for binary classification. In Proceedings of the 2015 on Genetic and Evolutionary Computation Conference-GECCO’15
work page 2015
-
[7]
Jiemin Fang, Yukang Chen, Xinbang Zhang, Qian Zhang, Chang Huang, Gaofeng Meng, Wenyu Liu, and Xinggang Wang. 2019. EAT-NAS: Elastic Ar- chitecture Transfer for Accelerating Large-scale Neural Architecture Search. arXiv:cs.CV/1901.05884
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[8]
Isabelle Guyon, Imad Chaabane, Hugo Jair Escalante, Sergio Escalera, Damir Jajetic, James Robert Lloyd, Núria Macià, Bisakha Ray, Lukasz Romaszko, Michèle Sebag, et al. 2016. A brief review of the ChaLearn AutoML challenge: any-time any-dataset learning without human intervention. In Workshop on Automatic Machine Learning. 21–30
work page 2016
-
[9]
Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems. 1135–1143
work page 2015
-
[10]
Simon Harding and Julian Francis Miller. 2005. Evolution of Robot Controller Using Cartesian Genetic Programming. In European Conference on Genetic Pro- gramming. Springer, 62–73
work page 2005
-
[11]
Demis Hassabis, Dharshan Kumaran, Christopher Summerfield, and Matthew Botvinick. 2017. Neuroscience-inspired artificial intelligence. Neuron 95, 2 (2017), 245–258
work page 2017
-
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition . 770–778
work page 2016
-
[13]
Sepp Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6, 02 (1998), 107–116
work page 1998
- [14]
-
[15]
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 2261–2269
Densely Connected Convolutional Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 2261–2269
work page 2017
-
[16]
Dario Izzo, Francesco Biscani, and Alessio Mereta. 2017. Differentiable genetic programming. In European Conference on Genetic Programming . Springer, 35–51
work page 2017
-
[17]
Maryam Mahsal Khan, Arbab Masood Ahmad, Gul Muhammad Khan, and Julian F Miller. 2013. Fast learning neural networks using Cartesian genetic programming. Neurocomputing 121 (2013), 274–289
work page 2013
-
[18]
Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Daniel Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duffy, et al. 2019. Evolving deep neural networks. In Artificial Intelligence in the Age of Neural Networks and Brain Computing . Elsevier, 293–312
work page 2019
-
[19]
Julian F. Miller. 1999. Evolution of Digital Filters Using a Gate Array Model. In Evolutionary Image Analysis, Signal Processing and Telecommunications , Riccardo Poli, Hans-Michael Voigt, Stefano Cagnoni, David Corne, George D. Smith, and Terence C. Fogarty (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 17–30
work page 1999
-
[20]
Julian F Miller. 2011. Cartesian genetic programming. In Cartesian Genetic Programming. Springer, 17–34
work page 2011
-
[21]
Pablo Moscato, Carlos Cotta, and Alexandre Mendes. 2004. Memetic Algorithms. Springer Berlin Heidelberg, Berlin, Heidelberg, 53–85
work page 2004
-
[22]
Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve re- stricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) . 807–814
work page 2010
-
[23]
Varun Kumar Ojha, Ajith Abraham, and Václav Snásel. 2017. Metaheuristic design of feedforward neural networks: A review of two decades of research. Engineering Applications of Artificial Intelligence 60 (2017), 97–116
work page 2017
-
[24]
Olson, William La Cava, Patryk Orzechowski, Ryan J
Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, 1 (11 Dec 2017), 36
work page 2017
-
[25]
Prajit Ramachandran, Barret Zoph, and Quoc V. Le. 2018. Searching for activation functions. In 6th International Conference on Learning Representations ICLR 2018
work page 2018
-
[26]
Joseph P Rauschecker and Wolf Singer. 1981. The effects of early visual experience on the cat’s visual cortex and their possible explanation by Hebb synapses. The Journal of physiology 310, 1 (1981), 215–239
work page 1981
-
[27]
Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Sue- matsu, Jie Tan, Quoc Le, and Alex Kurakin. 2017. Large-Scale Evolution of Image Classifiers. arXiv:cs.NE/1703.01041 8
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[28]
Jasper Snoek, Hugo Larochelle, and Ryan P Adams. 2012. Practical Bayesian Optimization of Machine Learning Algorithms. InAdvances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2951–2959
work page 2012
-
[29]
Andrea Soltoggio, Kenneth O Stanley, and Sebastian Risi. 2018. Born to learn: the inspiration, progress, and future of evolved plastic artificial neural networks. Neural Networks (2018)
work page 2018
-
[30]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15 (2014), 1929–1958
work page 2014
-
[31]
Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Highway Networks. arXiv:cs.LG/1505.00387
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[32]
Kenneth O Stanley, David B D’Ambrosio, and Jason Gauci. 2009. A hypercube- based encoding for evolving large-scale neural networks. Artificial life 15, 2 (2009), 185–212
work page 2009
-
[33]
Kenneth O Stanley and Risto Miikkulainen. 2002. Evolving neural networks through augmenting topologies. Evolutionary computation 10, 2 (2002), 99–127
work page 2002
-
[34]
Masanori Suganuma, Shinichi Shirakawa, and Tomoharu Nagao. 2017. A genetic programming approach to designing convolutional neural network architectures. In Proceedings of the Genetic and Evolutionary Computation Conference . ACM, 497–504
work page 2017
-
[35]
Alexander Topchy and William F Punch. 2001. Faster genetic programming based on local gradient search of numeric leaf values. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation . Morgan Kaufmann Publishers Inc., 155–162
work page 2001
-
[36]
Andrew James Turner and Julian Francis Miller. 2013. Cartesian genetic program- ming encoded artificial neural networks: a comparison using three benchmarks. In Proceedings of the 15th annual conference on Genetic and evolutionary computa- tion. ACM, 1005–1012
work page 2013
-
[37]
Zdenek Vasicek. 2015. Cartesian gp in optimization of combinational circuits with hundreds of inputs and thousands of gates. In European Conference on Genetic Programming (2015-01-01). Springer, Springer, 139–150
work page 2015
-
[38]
Ricardo Vilalta and Youssef Drissi. 2002. A Perspective View and Survey of Meta-Learning. Artificial Intelligence Review 18, 2 (01 Jun 2002), 77–95
work page 2002
-
[39]
Barret Zoph and Quoc V. Le. 2016. Neural Architecture Search with Reinforcement Learning. arXiv:cs.LG/1611.01578 9
work page internal anchor Pith review Pith/arXiv arXiv 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.