Surrogate Models for Enhancing the Efficiency of Neuroevolution in Reinforcement Learning
Pith reviewed 2026-05-24 17:47 UTC · model grok-4.3
The pith
Surrogate models using phenotypic distances on dynamic input sets can replace many fitness evaluations in neuroevolution for reinforcement learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Surrogate model-based neuroevolution (SMB-NE) builds a kernel from phenotypic distance measures that compare network behaviors on chosen input sets. For reinforcement learning the authors select dynamic input sets to compute these distances, allowing the surrogate to predict fitness values and thereby replace a large fraction of true environment rollouts during the evolutionary search. The results indicate that this substitution raises evaluation efficiency considerably.
What carries the argument
Kernel-based surrogate using phenotypic distance measures computed on dynamic input sets.
If this is right
- Many candidate networks can be screened by the surrogate before any true environment interaction occurs.
- Dynamic input sets supply sufficient behavioral information to keep surrogate predictions useful across generations.
- The overall number of environment interactions required to reach a given performance level drops substantially.
- The evolutionary search remains effective even when most fitness values come from the surrogate rather than direct evaluation.
Where Pith is reading between the lines
- The same dynamic-input phenotypic kernel might be tested in other expensive black-box optimization domains where behavioral comparisons are feasible.
- If dynamic input selection can be made fully automatic, the method would require even less manual tuning for new RL tasks.
Load-bearing premise
Phenotypic distance measures computed on dynamic input sets produce surrogate predictions accurate enough to replace true fitness evaluations without systematically biasing the evolutionary search in reinforcement learning environments.
What would settle it
If the surrogate-assisted runs on standard RL benchmarks reach the same final performance only after using at least as many true evaluations as the plain neuroevolution baseline, the efficiency gain claim would be refuted.
Figures
read the original abstract
In the last years, reinforcement learning received a lot of attention. One method to solve reinforcement learning tasks is Neuroevolution, where neural networks are optimized by evolutionary algorithms. A disadvantage of Neuroevolution is that it can require numerous function evaluations, while not fully utilizing the available information from each fitness evaluation. This is especially problematic when fitness evaluations become expensive. To reduce the cost of fitness evaluations, surrogate models can be employed to partially replace the fitness function. The difficulty of surrogate modeling for Neuroevolution is the complex search space and how to compare different networks. To that end, recent studies showed that a kernel based approach, particular with phenotypic distance measures, works well. These kernels compare different networks via their behavior (phenotype) rather than their topology or encoding (genotype). In this work, we discuss the use of surrogate model-based Neuroevolution (SMB-NE) using a phenotypic distance for reinforcement learning. In detail, we investigate a) the potential of SMB-NE with respect to evaluation efficiency and b) how to select adequate input sets for the phenotypic distance measure in a reinforcement learning problem. The results indicate that we are able to considerably increase the evaluation efficiency using dynamic input sets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines surrogate model-based neuroevolution (SMB-NE) for reinforcement learning, focusing on phenotypic distance kernels to approximate fitness and reduce expensive evaluations. It specifically investigates the choice of input sets for these distances and reports that dynamic input sets yield considerable gains in evaluation efficiency compared to static alternatives.
Significance. If the empirical claims hold with proper validation, the work could improve the practicality of neuroevolution on costly RL benchmarks by substituting a fraction of true fitness calls with surrogates while preserving search direction. The emphasis on phenotypic rather than genotypic distances is a methodological strength that aligns with prior kernel-based NE literature.
major comments (2)
- [Abstract] Abstract: the claim of 'considerably increase[d] evaluation efficiency using dynamic input sets' is presented without any reported quantitative metrics (e.g., number of evaluations saved, final returns, or statistical tests), which is load-bearing for the central contribution.
- [Abstract] Abstract / Results section: no rank-correlation, surrogate-vs-true fitness scatter, or policy-return comparison is described that would confirm the phenotypic-distance surrogate does not systematically bias the evolutionary trajectory in RL environments; without such a check the efficiency gain could reflect an easier surrogate landscape rather than faithful approximation.
minor comments (1)
- [Abstract] The abstract would be strengthened by a single sentence summarizing the magnitude of the efficiency improvement and the RL domains tested.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim of 'considerably increase[d] evaluation efficiency using dynamic input sets' is presented without any reported quantitative metrics (e.g., number of evaluations saved, final returns, or statistical tests), which is load-bearing for the central contribution.
Authors: We agree that the abstract should report key quantitative results to support the central claim. In the revision we will add specific metrics (e.g., evaluation savings, final returns, and statistical test outcomes) drawn from the experiments already presented in the results section. revision: yes
-
Referee: [Abstract] Abstract / Results section: no rank-correlation, surrogate-vs-true fitness scatter, or policy-return comparison is described that would confirm the phenotypic-distance surrogate does not systematically bias the evolutionary trajectory in RL environments; without such a check the efficiency gain could reflect an easier surrogate landscape rather than faithful approximation.
Authors: We agree that explicit fidelity checks strengthen the argument. While the reported performance comparisons already indicate that dynamic-input SMB-NE reaches returns comparable to or better than standard neuroevolution, we will add rank-correlation values, surrogate-versus-true fitness scatter plots, and expanded policy-return comparisons in the revised results section. revision: yes
Circularity Check
No circularity: empirical efficiency claims rest on experimental outcomes
full rationale
The paper reports experimental results showing increased evaluation efficiency via dynamic input sets for phenotypic-distance surrogates in neuroevolution. No equations, uniqueness theorems, fitted parameters renamed as predictions, or derivation steps are present that reduce the reported gains to inputs by construction. Mentions of prior kernel-based work are background citations, not load-bearing for the central empirical finding. The analysis chain is self-contained as an empirical study without self-definitional or self-citation reductions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Phenotypic distance measures based on observed behavior can serve as a reliable kernel input for surrogate models of neural network fitness.
Reference graph
Works this paper leans on
-
[1]
Aggarwal, Alexander Hinneburg, and Daniel A
Charu C. Aggarwal, Alexander Hinneburg, and Daniel A. Keim. 2001. On the Surprising Behavior of Distance Metrics in High Dimensional Space. In Database Theory — ICDT 2001: 8th International Conference (Lecture Notes in Computer Science), Vol. 1973. Springer, London, UK, 420–434
work page 2001
-
[2]
Thomas Bartz-Beielstein and Martin Zaefferer. 2017. Model-based Methods for Continuous and Discrete Global Optimization. Applied Soft Computing 55 (feb 2017), 154 – 167. https://doi.org/10.1016/j.asoc.2017.01.039
-
[3]
William Jay Conover and Ronald L. Iman. 1979. On Multiple-comparisons Pro- cedures. Technical Report LA-7677-MS. Los Alamos Sci. Lab. Available: http: //permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-07677-MS, ac- cessed: 2018-07-11
work page 1979
-
[4]
2008.Engineering Design via Surrogate Modelling
Alexander Forrester, Andras Sobester, and Andy Keane. 2008.Engineering Design via Surrogate Modelling. Wiley
work page 2008
-
[5]
Adam Gaier, Alexander Asteroth, and Jean-Baptiste Mouret. 2018. Data-efficient Neuroevolution with Kernel-Based Surrogate Models. InGenetic and Evolutionary Computation Conference (GECCO)
work page 2018
-
[6]
Torsten Hildebrandt and Jürgen Branke. 2015. On Using Surrogates with Genetic Programming. Evolutionary Computation 23, 3 (Jun 2015), 343–367
work page 2015
-
[7]
Yaochu Jin. 2011. Surrogate-assisted evolutionary computation: Recent advances and future challenges. Swarm and Evolutionary Computation 1, 2 (2011), 61–70
work page 2011
-
[8]
Jones, Matthias Schonlau, and William J
Donald R. Jones, Matthias Schonlau, and William J. Welch. 1998. Efficient global optimization of expensive black-box functions. Journal of Global Optimization 13, 4 (1998), 455–492
work page 1998
-
[9]
M. M. Khan, G. M. Khan, and J. F. Miller. 2010. Evolution of neural networks using Cartesian Genetic Programming. In IEEE Congress on Evolutionary Computation . 1–8. https://doi.org/10.1109/CEC.2010.5586547
-
[10]
Rogier Koppejan and Shimon Whiteson. 2011. Neuroevolutionary reinforcement learning for generalized control of simulated helicopters.Evolutionary Intelligence 4, 4 (01 Dec 2011), 219–241. https://doi.org/10.1007/s12065-011-0066-z
-
[11]
William H. Kruskal and W. Allen Wallis. 1952. Use of Ranks in One-Criterion Variance Analysis. J. Amer. Statist. Assoc. 47, 260 (Dec. 1952), 583–621. https: //doi.org/10.2307/2280779
-
[12]
Michael D McKay, Richard J Beckman, and William J Conover. 1979. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21, 2 (1979), 239–245
work page 1979
-
[13]
Julian F Miller and Peter Thomson. 2000. Cartesian genetic programming. In European Conference on Genetic Programming . Springer, 121–132
work page 2000
-
[14]
Mateusz Pawlik and Nikolaus Augsten. 2015. Efficient Computation of the Tree Edit Distance. ACM Transactions on Database Systems 40, 1 (mar 2015), 1–40. http://dx.doi.org/10.1145/2699485
-
[15]
J David Schaffer, Darrell Whitley, and Larry J Eshelman. 1992. Combinations of genetic algorithms and neural networks: A survey of the state of the art. In Combinations of Genetic Algorithms and Neural Networks, 1992., COGANN-92. International Workshop on. IEEE, 1–37
work page 1992
-
[16]
Jörg Stork, Thomas Bartz-Beielstein, Andreas Fischbach, and Martin Zaefferer
- [17]
-
[18]
Jörg Stork, Martin Zaefferer, and Thomas Bartz-Beielstein. 2018. Distance-based Kernels for Surrogate Model-based Neuroevolution. ArXiv e-prints (July 2018). DEVONN Workshop PPSN 2018 (PPSN XV) conference. ArXiv ID: 1807.07839
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[19]
Jörg Stork, Martin Zaefferer, and Thomas Bartz-Beielstein. 2019. Improving Neu- roEvolution Efficiency by Surrogate Model-Based Optimization with Phenotypic Distance Kernels. In Applications of Evolutionary Computation, Paul Kaufmann and Pedro A. Castillo (Eds.). Springer International Publishing, Cham, 504–519
work page 2019
-
[20]
Andrew James Turner and Julian Francis Miller. 2013. Cartesian genetic program- ming encoded artificial neural networks: a comparison using three benchmarks. In Proc. GECCO’13. ACM, 1005–1012
work page 2013
-
[21]
Martin Zaefferer. 2017. Combinatorial Efficient Global Optimization in R - CEGO v2.2.0. online: https://cran.r-project.org/package=CEGO. accessed: 2018-01-10
work page 2017
-
[22]
Martin Zaefferer. 2018. Surrogate Models for Discrete Optimization Problems . phdthesis. Technische Universität Dortmund
work page 2018
-
[23]
Martin Zaefferer, Jörg Stork, Oliver Flasch, and Thomas Bartz-Beielstein. 2018. Linear Combination of Distance Measures for Surrogate Models in Genetic Pro- gramming. In Parallel Problem Solving from Nature – PPSN XV: 15th International Conference, Vol. 11102. Springer, Coimbra, Portugal, 220–231
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.