Surrogate Models for Enhancing the Efficiency of Neuroevolution in Reinforcement Learning

A. E. Eiben; J\"org Stork; Martin Zaefferer; Thomas Bartz-Beielstein

arxiv: 1907.09300 · v1 · pith:MRJLG6YYnew · submitted 2019-07-22 · 💻 cs.NE

Surrogate Models for Enhancing the Efficiency of Neuroevolution in Reinforcement Learning

J\"org Stork , Martin Zaefferer , Thomas Bartz-Beielstein , A. E. Eiben This is my paper

Pith reviewed 2026-05-24 17:47 UTC · model grok-4.3

classification 💻 cs.NE

keywords neuroevolutionreinforcement learningsurrogate modelsphenotypic distanceevolutionary algorithmsevaluation efficiency

0 comments

The pith

Surrogate models using phenotypic distances on dynamic input sets can replace many fitness evaluations in neuroevolution for reinforcement learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how surrogate models can reduce the number of expensive fitness evaluations needed when evolving neural networks for reinforcement learning tasks. It focuses on kernel-based surrogates that compare networks by their observed behavior rather than structure, and tests dynamic input sets to compute those behavioral distances in RL settings. A sympathetic reader would care because each fitness evaluation in RL requires running the network in an environment, which quickly becomes costly for complex tasks. The work shows that suitable dynamic inputs let the surrogates guide the search without needing a full evaluation for every candidate network. This leads to considerably higher evaluation efficiency while still locating good solutions.

Core claim

Surrogate model-based neuroevolution (SMB-NE) builds a kernel from phenotypic distance measures that compare network behaviors on chosen input sets. For reinforcement learning the authors select dynamic input sets to compute these distances, allowing the surrogate to predict fitness values and thereby replace a large fraction of true environment rollouts during the evolutionary search. The results indicate that this substitution raises evaluation efficiency considerably.

What carries the argument

Kernel-based surrogate using phenotypic distance measures computed on dynamic input sets.

If this is right

Many candidate networks can be screened by the surrogate before any true environment interaction occurs.
Dynamic input sets supply sufficient behavioral information to keep surrogate predictions useful across generations.
The overall number of environment interactions required to reach a given performance level drops substantially.
The evolutionary search remains effective even when most fitness values come from the surrogate rather than direct evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dynamic-input phenotypic kernel might be tested in other expensive black-box optimization domains where behavioral comparisons are feasible.
If dynamic input selection can be made fully automatic, the method would require even less manual tuning for new RL tasks.

Load-bearing premise

Phenotypic distance measures computed on dynamic input sets produce surrogate predictions accurate enough to replace true fitness evaluations without systematically biasing the evolutionary search in reinforcement learning environments.

What would settle it

If the surrogate-assisted runs on standard RL benchmarks reach the same final performance only after using at least as many true evaluations as the plain neuroevolution baseline, the efficiency gain claim would be refuted.

Figures

Figures reproduced from arXiv: 1907.09300 by A. E. Eiben, J\"org Stork, Martin Zaefferer, Thomas Bartz-Beielstein.

**Figure 2.** Figure 2: Example for Kriging modeling: distances between [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: SMB-NE Cycle for Reinforcement Learning selected from Dt to build the model. This subset selection intends to avoid issues with growing data sizes, which may render the Kriging model too time-consuming to compute. This set Mt contains numm of all archived solutions. It is typically set to numm > 100, so for runs with fewer than 100 evaluations it has no effect. Mt is formed by combining a number (typically… view at source ↗

**Figure 5.** Figure 5: Experimental results. The number of required function evaluations (episodes) to solve the environments is Log10 [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Convergence plot of SMBNE.DynSet 10 (solid blue) [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

In the last years, reinforcement learning received a lot of attention. One method to solve reinforcement learning tasks is Neuroevolution, where neural networks are optimized by evolutionary algorithms. A disadvantage of Neuroevolution is that it can require numerous function evaluations, while not fully utilizing the available information from each fitness evaluation. This is especially problematic when fitness evaluations become expensive. To reduce the cost of fitness evaluations, surrogate models can be employed to partially replace the fitness function. The difficulty of surrogate modeling for Neuroevolution is the complex search space and how to compare different networks. To that end, recent studies showed that a kernel based approach, particular with phenotypic distance measures, works well. These kernels compare different networks via their behavior (phenotype) rather than their topology or encoding (genotype). In this work, we discuss the use of surrogate model-based Neuroevolution (SMB-NE) using a phenotypic distance for reinforcement learning. In detail, we investigate a) the potential of SMB-NE with respect to evaluation efficiency and b) how to select adequate input sets for the phenotypic distance measure in a reinforcement learning problem. The results indicate that we are able to considerably increase the evaluation efficiency using dynamic input sets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Dynamic input sets for phenotypic kernels in SMB-NE improve efficiency on RL tasks, but the paper needs explicit checks that the surrogates do not bias the search.

read the letter

The paper tests surrogate model-based neuroevolution with phenotypic distance kernels on reinforcement learning problems and finds that switching to dynamic input sets raises evaluation efficiency. That is the core result they report. The work takes the kernel approach from earlier papers and applies it specifically to RL, where the choice of inputs for behavior comparison matters because policies can produce very different trajectories. They also look at how to pick those inputs rather than fixing them in advance. This is a straightforward extension that addresses a real practical issue when evaluations are costly. The experiments appear to show measurable savings in the number of true fitness calls while still reaching competitive policies. That part is useful for anyone already using neuroevolution on control tasks. The main weakness is that the abstract gives no numbers on rank correlation between surrogate and true fitness, no baseline comparisons with standard NE, and no check on whether the final policies differ from what a non-surrogate run would find. Without those, it is difficult to rule out that the efficiency gain comes from the surrogate creating an easier landscape rather than faithfully approximating the original one. The concern about systematic bias in the evolutionary search therefore stays open until the full results are examined. The paper is aimed at people working on evolutionary methods for RL who already know the surrogate literature. A reader looking for a practical tweak to reduce wall-clock time on expensive simulators would get something out of it if the validation holds. It is narrow enough that it does not need to be groundbreaking to be worth referee time, but the experimental design and statistical reporting need a close look. I would send it to peer review.

Referee Report

2 major / 1 minor

Summary. The paper examines surrogate model-based neuroevolution (SMB-NE) for reinforcement learning, focusing on phenotypic distance kernels to approximate fitness and reduce expensive evaluations. It specifically investigates the choice of input sets for these distances and reports that dynamic input sets yield considerable gains in evaluation efficiency compared to static alternatives.

Significance. If the empirical claims hold with proper validation, the work could improve the practicality of neuroevolution on costly RL benchmarks by substituting a fraction of true fitness calls with surrogates while preserving search direction. The emphasis on phenotypic rather than genotypic distances is a methodological strength that aligns with prior kernel-based NE literature.

major comments (2)

[Abstract] Abstract: the claim of 'considerably increase[d] evaluation efficiency using dynamic input sets' is presented without any reported quantitative metrics (e.g., number of evaluations saved, final returns, or statistical tests), which is load-bearing for the central contribution.
[Abstract] Abstract / Results section: no rank-correlation, surrogate-vs-true fitness scatter, or policy-return comparison is described that would confirm the phenotypic-distance surrogate does not systematically bias the evolutionary trajectory in RL environments; without such a check the efficiency gain could reflect an easier surrogate landscape rather than faithful approximation.

minor comments (1)

[Abstract] The abstract would be strengthened by a single sentence summarizing the magnitude of the efficiency improvement and the RL domains tested.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'considerably increase[d] evaluation efficiency using dynamic input sets' is presented without any reported quantitative metrics (e.g., number of evaluations saved, final returns, or statistical tests), which is load-bearing for the central contribution.

Authors: We agree that the abstract should report key quantitative results to support the central claim. In the revision we will add specific metrics (e.g., evaluation savings, final returns, and statistical test outcomes) drawn from the experiments already presented in the results section. revision: yes
Referee: [Abstract] Abstract / Results section: no rank-correlation, surrogate-vs-true fitness scatter, or policy-return comparison is described that would confirm the phenotypic-distance surrogate does not systematically bias the evolutionary trajectory in RL environments; without such a check the efficiency gain could reflect an easier surrogate landscape rather than faithful approximation.

Authors: We agree that explicit fidelity checks strengthen the argument. While the reported performance comparisons already indicate that dynamic-input SMB-NE reaches returns comparable to or better than standard neuroevolution, we will add rank-correlation values, surrogate-versus-true fitness scatter plots, and expanded policy-return comparisons in the revised results section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical efficiency claims rest on experimental outcomes

full rationale

The paper reports experimental results showing increased evaluation efficiency via dynamic input sets for phenotypic-distance surrogates in neuroevolution. No equations, uniqueness theorems, fitted parameters renamed as predictions, or derivation steps are present that reduce the reported gains to inputs by construction. Mentions of prior kernel-based work are background citations, not load-bearing for the central empirical finding. The analysis chain is self-contained as an empirical study without self-definitional or self-citation reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; the ledger therefore records only the domain assumptions visible in the abstract text.

axioms (1)

domain assumption Phenotypic distance measures based on observed behavior can serve as a reliable kernel input for surrogate models of neural network fitness.
Invoked when the abstract states that recent studies showed kernel-based approaches with phenotypic distances work well and then applies them to RL.

pith-pipeline@v0.9.0 · 5754 in / 1172 out tokens · 153709 ms · 2026-05-24T17:47:59.679393+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

[1]

Aggarwal, Alexander Hinneburg, and Daniel A

Charu C. Aggarwal, Alexander Hinneburg, and Daniel A. Keim. 2001. On the Surprising Behavior of Distance Metrics in High Dimensional Space. In Database Theory — ICDT 2001: 8th International Conference (Lecture Notes in Computer Science), Vol. 1973. Springer, London, UK, 420–434

work page 2001
[2]

Thomas Bartz-Beielstein and Martin Zaefferer. 2017. Model-based Methods for Continuous and Discrete Global Optimization. Applied Soft Computing 55 (feb 2017), 154 – 167. https://doi.org/10.1016/j.asoc.2017.01.039

work page doi:10.1016/j.asoc.2017.01.039 2017
[3]

William Jay Conover and Ronald L. Iman. 1979. On Multiple-comparisons Pro- cedures. Technical Report LA-7677-MS. Los Alamos Sci. Lab. Available: http: //permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-07677-MS, ac- cessed: 2018-07-11

work page 1979
[4]

2008.Engineering Design via Surrogate Modelling

Alexander Forrester, Andras Sobester, and Andy Keane. 2008.Engineering Design via Surrogate Modelling. Wiley

work page 2008
[5]

Adam Gaier, Alexander Asteroth, and Jean-Baptiste Mouret. 2018. Data-efficient Neuroevolution with Kernel-Based Surrogate Models. InGenetic and Evolutionary Computation Conference (GECCO)

work page 2018
[6]

Torsten Hildebrandt and Jürgen Branke. 2015. On Using Surrogates with Genetic Programming. Evolutionary Computation 23, 3 (Jun 2015), 343–367

work page 2015
[7]

Yaochu Jin. 2011. Surrogate-assisted evolutionary computation: Recent advances and future challenges. Swarm and Evolutionary Computation 1, 2 (2011), 61–70

work page 2011
[8]

Jones, Matthias Schonlau, and William J

Donald R. Jones, Matthias Schonlau, and William J. Welch. 1998. Efficient global optimization of expensive black-box functions. Journal of Global Optimization 13, 4 (1998), 455–492

work page 1998
[9]

M. M. Khan, G. M. Khan, and J. F. Miller. 2010. Evolution of neural networks using Cartesian Genetic Programming. In IEEE Congress on Evolutionary Computation . 1–8. https://doi.org/10.1109/CEC.2010.5586547

work page doi:10.1109/cec.2010.5586547 2010
[10]

Rogier Koppejan and Shimon Whiteson. 2011. Neuroevolutionary reinforcement learning for generalized control of simulated helicopters.Evolutionary Intelligence 4, 4 (01 Dec 2011), 219–241. https://doi.org/10.1007/s12065-011-0066-z

work page doi:10.1007/s12065-011-0066-z 2011
[11]

Kruskal and W

William H. Kruskal and W. Allen Wallis. 1952. Use of Ranks in One-Criterion Variance Analysis. J. Amer. Statist. Assoc. 47, 260 (Dec. 1952), 583–621. https: //doi.org/10.2307/2280779

work page doi:10.2307/2280779 1952
[12]

Michael D McKay, Richard J Beckman, and William J Conover. 1979. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21, 2 (1979), 239–245

work page 1979
[13]

Julian F Miller and Peter Thomson. 2000. Cartesian genetic programming. In European Conference on Genetic Programming . Springer, 121–132

work page 2000
[14]

Mateusz Pawlik and Nikolaus Augsten. 2015. Efficient Computation of the Tree Edit Distance. ACM Transactions on Database Systems 40, 1 (mar 2015), 1–40. http://dx.doi.org/10.1145/2699485

work page doi:10.1145/2699485 2015
[15]

J David Schaffer, Darrell Whitley, and Larry J Eshelman. 1992. Combinations of genetic algorithms and neural networks: A survey of the state of the art. In Combinations of Genetic Algorithms and Neural Networks, 1992., COGANN-92. International Workshop on. IEEE, 1–37

work page 1992
[16]

Jörg Stork, Thomas Bartz-Beielstein, Andreas Fischbach, and Martin Zaefferer

work page
[17]

In GMA CI-Workshop

Surrogate Assisted Learning of Neural Networks. In GMA CI-Workshop

work page
[18]

Jörg Stork, Martin Zaefferer, and Thomas Bartz-Beielstein. 2018. Distance-based Kernels for Surrogate Model-based Neuroevolution. ArXiv e-prints (July 2018). DEVONN Workshop PPSN 2018 (PPSN XV) conference. ArXiv ID: 1807.07839

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

Jörg Stork, Martin Zaefferer, and Thomas Bartz-Beielstein. 2019. Improving Neu- roEvolution Efficiency by Surrogate Model-Based Optimization with Phenotypic Distance Kernels. In Applications of Evolutionary Computation, Paul Kaufmann and Pedro A. Castillo (Eds.). Springer International Publishing, Cham, 504–519

work page 2019
[20]

Andrew James Turner and Julian Francis Miller. 2013. Cartesian genetic program- ming encoded artificial neural networks: a comparison using three benchmarks. In Proc. GECCO’13. ACM, 1005–1012

work page 2013
[21]

Martin Zaefferer. 2017. Combinatorial Efficient Global Optimization in R - CEGO v2.2.0. online: https://cran.r-project.org/package=CEGO. accessed: 2018-01-10

work page 2017
[22]

Martin Zaefferer. 2018. Surrogate Models for Discrete Optimization Problems . phdthesis. Technische Universität Dortmund

work page 2018
[23]

Martin Zaefferer, Jörg Stork, Oliver Flasch, and Thomas Bartz-Beielstein. 2018. Linear Combination of Distance Measures for Surrogate Models in Genetic Pro- gramming. In Parallel Problem Solving from Nature – PPSN XV: 15th International Conference, Vol. 11102. Springer, Coimbra, Portugal, 220–231

work page 2018

[1] [1]

Aggarwal, Alexander Hinneburg, and Daniel A

Charu C. Aggarwal, Alexander Hinneburg, and Daniel A. Keim. 2001. On the Surprising Behavior of Distance Metrics in High Dimensional Space. In Database Theory — ICDT 2001: 8th International Conference (Lecture Notes in Computer Science), Vol. 1973. Springer, London, UK, 420–434

work page 2001

[2] [2]

Thomas Bartz-Beielstein and Martin Zaefferer. 2017. Model-based Methods for Continuous and Discrete Global Optimization. Applied Soft Computing 55 (feb 2017), 154 – 167. https://doi.org/10.1016/j.asoc.2017.01.039

work page doi:10.1016/j.asoc.2017.01.039 2017

[3] [3]

William Jay Conover and Ronald L. Iman. 1979. On Multiple-comparisons Pro- cedures. Technical Report LA-7677-MS. Los Alamos Sci. Lab. Available: http: //permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-07677-MS, ac- cessed: 2018-07-11

work page 1979

[4] [4]

2008.Engineering Design via Surrogate Modelling

Alexander Forrester, Andras Sobester, and Andy Keane. 2008.Engineering Design via Surrogate Modelling. Wiley

work page 2008

[5] [5]

Adam Gaier, Alexander Asteroth, and Jean-Baptiste Mouret. 2018. Data-efficient Neuroevolution with Kernel-Based Surrogate Models. InGenetic and Evolutionary Computation Conference (GECCO)

work page 2018

[6] [6]

Torsten Hildebrandt and Jürgen Branke. 2015. On Using Surrogates with Genetic Programming. Evolutionary Computation 23, 3 (Jun 2015), 343–367

work page 2015

[7] [7]

Yaochu Jin. 2011. Surrogate-assisted evolutionary computation: Recent advances and future challenges. Swarm and Evolutionary Computation 1, 2 (2011), 61–70

work page 2011

[8] [8]

Jones, Matthias Schonlau, and William J

Donald R. Jones, Matthias Schonlau, and William J. Welch. 1998. Efficient global optimization of expensive black-box functions. Journal of Global Optimization 13, 4 (1998), 455–492

work page 1998

[9] [9]

M. M. Khan, G. M. Khan, and J. F. Miller. 2010. Evolution of neural networks using Cartesian Genetic Programming. In IEEE Congress on Evolutionary Computation . 1–8. https://doi.org/10.1109/CEC.2010.5586547

work page doi:10.1109/cec.2010.5586547 2010

[10] [10]

Rogier Koppejan and Shimon Whiteson. 2011. Neuroevolutionary reinforcement learning for generalized control of simulated helicopters.Evolutionary Intelligence 4, 4 (01 Dec 2011), 219–241. https://doi.org/10.1007/s12065-011-0066-z

work page doi:10.1007/s12065-011-0066-z 2011

[11] [11]

Kruskal and W

William H. Kruskal and W. Allen Wallis. 1952. Use of Ranks in One-Criterion Variance Analysis. J. Amer. Statist. Assoc. 47, 260 (Dec. 1952), 583–621. https: //doi.org/10.2307/2280779

work page doi:10.2307/2280779 1952

[12] [12]

Michael D McKay, Richard J Beckman, and William J Conover. 1979. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21, 2 (1979), 239–245

work page 1979

[13] [13]

Julian F Miller and Peter Thomson. 2000. Cartesian genetic programming. In European Conference on Genetic Programming . Springer, 121–132

work page 2000

[14] [14]

Mateusz Pawlik and Nikolaus Augsten. 2015. Efficient Computation of the Tree Edit Distance. ACM Transactions on Database Systems 40, 1 (mar 2015), 1–40. http://dx.doi.org/10.1145/2699485

work page doi:10.1145/2699485 2015

[15] [15]

J David Schaffer, Darrell Whitley, and Larry J Eshelman. 1992. Combinations of genetic algorithms and neural networks: A survey of the state of the art. In Combinations of Genetic Algorithms and Neural Networks, 1992., COGANN-92. International Workshop on. IEEE, 1–37

work page 1992

[16] [16]

Jörg Stork, Thomas Bartz-Beielstein, Andreas Fischbach, and Martin Zaefferer

work page

[17] [17]

In GMA CI-Workshop

Surrogate Assisted Learning of Neural Networks. In GMA CI-Workshop

work page

[18] [18]

Jörg Stork, Martin Zaefferer, and Thomas Bartz-Beielstein. 2018. Distance-based Kernels for Surrogate Model-based Neuroevolution. ArXiv e-prints (July 2018). DEVONN Workshop PPSN 2018 (PPSN XV) conference. ArXiv ID: 1807.07839

work page internal anchor Pith review Pith/arXiv arXiv 2018

[19] [19]

Jörg Stork, Martin Zaefferer, and Thomas Bartz-Beielstein. 2019. Improving Neu- roEvolution Efficiency by Surrogate Model-Based Optimization with Phenotypic Distance Kernels. In Applications of Evolutionary Computation, Paul Kaufmann and Pedro A. Castillo (Eds.). Springer International Publishing, Cham, 504–519

work page 2019

[20] [20]

Andrew James Turner and Julian Francis Miller. 2013. Cartesian genetic program- ming encoded artificial neural networks: a comparison using three benchmarks. In Proc. GECCO’13. ACM, 1005–1012

work page 2013

[21] [21]

Martin Zaefferer. 2017. Combinatorial Efficient Global Optimization in R - CEGO v2.2.0. online: https://cran.r-project.org/package=CEGO. accessed: 2018-01-10

work page 2017

[22] [22]

Martin Zaefferer. 2018. Surrogate Models for Discrete Optimization Problems . phdthesis. Technische Universität Dortmund

work page 2018

[23] [23]

Martin Zaefferer, Jörg Stork, Oliver Flasch, and Thomas Bartz-Beielstein. 2018. Linear Combination of Distance Measures for Surrogate Models in Genetic Pro- gramming. In Parallel Problem Solving from Nature – PPSN XV: 15th International Conference, Vol. 11102. Springer, Coimbra, Portugal, 220–231

work page 2018