pith. machine review for the scientific record. sign in

arxiv: 2604.20365 · v1 · submitted 2026-04-22 · 💻 cs.RO · cs.AI

Recognition: unknown

Benefits of Low-Cost Bio-Inspiration in the Age of Overparametrization

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:18 UTC · model grok-4.3

classification 💻 cs.RO cs.AI
keywords central pattern generatorsmulti-layer perceptronsevolutionary strategiesreinforcement learningrobot controlparameter impactbio-inspired controllers
0
0 comments X

The pith

Shallow MLPs and dense CPGs outperform deeper networks and RL architectures in bounded robot control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether larger parameter counts always improve learning when controlling robots with small input and output spaces. It tests Central Pattern Generators and Multi-Layer Perceptrons under evolutionary strategies and reinforcement learning on a robot with limited proprioception, using multiple reward functions. Results indicate that shallow MLPs and densely connected CPGs deliver higher performance than deeper MLPs or Actor-Critic methods. A new Parameter Impact metric links this outcome to parameter count, showing that the extra parameters demanded by reinforcement learning add no benefit and that evolutionary approaches are preferable.

Core claim

By varying parameter spaces across multiple reward functions, shallow MLPs and densely connected CPGs result in better performance when compared to deeper MLPs or Actor-Critic architectures. The additional parameters required by the reinforcement technique do not translate into better performance, thus favouring evolutionary strategies.

What carries the argument

Parameter Impact metric, which quantifies how performance scales with the number of optimized parameters across bio-inspired controller families.

If this is right

  • Evolutionary strategies are more efficient than reinforcement learning for optimizing these low-dimensional controllers.
  • Densely connected CPGs and shallow MLPs are the preferred architectures when input-output dimensionality is small and task performance is bounded.
  • Overparametrization can reduce learning effectiveness in robot control tasks with capped rewards.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The pattern may appear in other optimization domains where the environment or task imposes a hard performance ceiling.
  • Designers could test whether increasing parameter count ever becomes beneficial once sensory input dimensionality is raised.
  • The results suggest prioritizing parameter-efficient bio-inspired designs over scaling model size in similar robotics settings.

Load-bearing premise

The specific robot morphology, limited proprioceptive sensors, chosen reward functions, and training protocols are representative cases in which extra parameters inherently limit rather than expand achievable performance.

What would settle it

If deeper MLPs or Actor-Critic controllers achieve strictly higher rewards than shallow MLPs and dense CPGs under identical robot morphology, rewards, and evaluation protocol, the central claim would be contradicted.

Figures

Figures reproduced from arXiv: 2604.20365 by Anil Yaman, Anna V. Kononova, Kevin Godin-Dubois.

Figure 1
Figure 1. Figure 1: Robot “spider” morphology with 8 hinges for locomotion. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Neighbourhood configurations for a CPG network. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: ANN architectures with w denoting the width of hidden layers. we use Stable Baselines3 [31] for our RL training, which also provides a default network architecture. In its baseline configuration, ANNs used in conjunction with PPO [33] have two layers of 64 neurons with hyperbolic tangent activation function except for the output layer. There, the network outputs are given by a Gaussian distribution, during… view at source ↗
Figure 4
Figure 4. Figure 4: Relationship between performance and number of parameters for all con [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Parameter impact across all configurations and reward functions. Color [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: PCA of the 16D diversity space obtained by fitting sinusoidals onto each [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
read the original abstract

While Central Pattern Generators (CPGs) and Multi-Layer Perceptrons (MLP) are widely used paradigms in robot control, few systematic studies have been performed on the relative merits of large parameter spaces. In contexts where input and output spaces are small and performance is bounded, having more parameters to optimize may actively hinder the learning process instead of empowering it. To empirically measure this, we submit a given robot morphology, with limited proprioceptive capabilities, to controller optimization under two bio-inspired paradigms (CPGs and MLPs) with evolutionary- and reinforcement- trainer protocols. By varying parameter spaces across multiple reward functions, we observe that shallow MLPs and densely connected CPGs result in better performance when compared to deeper MLPs or Actor-Critic architectures. To account for the relationship between said performance and the number of parameters, we introduce a Parameter Impact metric which demonstrates that the additional parameters required by the reinforcement technique do not translate into better performance, thus favouring evolutionary strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper claims that in robot control tasks with small input/output spaces and bounded performance, increasing parameter counts (as in deep MLPs or Actor-Critic RL) hinders rather than helps learning. Experiments on a limited-proprioception robot morphology compare CPGs and MLPs under evolutionary strategies (ES) versus reinforcement learning, finding superior performance for shallow MLPs and densely connected CPGs; a new Parameter Impact metric is introduced to show that RL's extra parameters do not yield better results, favoring low-cost bio-inspired ES approaches.

Significance. If the central claim holds after proper controls and validation, the work would provide useful evidence against overparametrization trends in robotics controllers for constrained tasks, potentially guiding practitioners toward simpler bio-inspired designs like dense CPGs with ES. The Parameter Impact metric could become a reusable analysis tool if independently validated, but the current lack of experimental rigor limits immediate impact.

major comments (3)
  1. [Abstract] Abstract: the central observations on performance differences across architectures and trainers are presented without any description of experimental setup, number of trials, statistical tests, error bars, or data exclusion criteria, leaving the claims without verifiable empirical support.
  2. [Parameter Impact metric] Section introducing the Parameter Impact metric: the metric is used to explain why additional RL parameters do not improve performance, yet its definition and calculation appear derived from the same experimental outcomes, creating a circularity risk that undermines its explanatory power.
  3. [Experimental comparison] Experimental comparison (Methods/Results): the ES versus Actor-Critic evaluation does not indicate whether total environment interactions, generations, or wall-clock time were equated across trainers, nor whether RL received equivalent hyperparameter search; this leaves open the possibility that observed gaps arise from training-protocol mismatch or hyperparameter sensitivity rather than parameter count per se.
minor comments (1)
  1. [Abstract] Abstract: the specific reward functions and parameter-space variations are referenced but not enumerated, reducing clarity for readers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We address each of the major comments in detail below, providing clarifications and indicating the revisions made to strengthen the paper.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central observations on performance differences across architectures and trainers are presented without any description of experimental setup, number of trials, statistical tests, error bars, or data exclusion criteria, leaving the claims without verifiable empirical support.

    Authors: We agree that the abstract would benefit from additional details to support the claims. In the revised manuscript, we have expanded the abstract to briefly mention the experimental setup, including the use of 10 independent trials per condition, statistical significance assessed via Wilcoxon rank-sum tests with p < 0.05, and error bars representing one standard deviation. Data exclusion criteria were not applicable as all optimization runs completed successfully. These additions provide verifiable empirical context while respecting abstract length constraints, with full details retained in the Methods section. revision: yes

  2. Referee: [Parameter Impact metric] Section introducing the Parameter Impact metric: the metric is used to explain why additional RL parameters do not improve performance, yet its definition and calculation appear derived from the same experimental outcomes, creating a circularity risk that undermines its explanatory power.

    Authors: We appreciate this observation regarding potential circularity. The Parameter Impact metric is defined independently as the normalized performance gain per additional parameter, using the formula: Impact = (Perf_arch - Perf_baseline) / (Params_arch - Params_baseline), where baseline is the simplest architecture. Although applied to our results, the definition is general and not dependent on specific outcomes. In the revision, we have moved the formal mathematical definition to precede the results section and clarified that it can be used as a standalone tool for analyzing parameter efficiency in other studies. This addresses the concern by emphasizing the metric's a priori definition. revision: partial

  3. Referee: [Experimental comparison] Experimental comparison (Methods/Results): the ES versus Actor-Critic evaluation does not indicate whether total environment interactions, generations, or wall-clock time were equated across trainers, nor whether RL received equivalent hyperparameter search; this leaves open the possibility that observed gaps arise from training-protocol mismatch or hyperparameter sensitivity rather than parameter count per se.

    Authors: This is a valid point for ensuring fair comparison. In our experiments, we equated the total number of environment interactions (fitness evaluations) between ES and RL: specifically, RL was run for a number of episodes equivalent to the total evaluations in ES (e.g., 5000 interactions). Wall-clock time was monitored but not strictly equated due to differing computational profiles, though we note this in the revision. For RL, we conducted a hyperparameter search over learning rates [1e-4, 1e-3], discount factors [0.9, 0.99], and network sizes, selecting the configuration that maximized performance. We have added a new paragraph in the Methods section explicitly stating these equivalences and the hyperparameter tuning procedure to rule out protocol mismatches as the source of performance differences. revision: yes

Circularity Check

1 steps flagged

Parameter Impact metric is defined from experimental data to demonstrate conclusions from that same data

specific steps
  1. fitted input called prediction [Abstract]
    "To account for the relationship between said performance and the number of parameters, we introduce a Parameter Impact metric which demonstrates that the additional parameters required by the reinforcement technique do not translate into better performance, thus favouring evolutionary strategies."

    The metric is introduced to account for the performance-parameter relationship observed across the varied parameter spaces and reward functions in the experiments. It is then invoked to demonstrate that extra parameters do not yield better performance. This makes the demonstration equivalent to the experimental inputs by construction, as the metric has no independent grounding or predictive power outside the fitted data.

full rationale

The paper's central explanation for why additional parameters (from Actor-Critic) fail to improve performance rests on a newly introduced Parameter Impact metric. This metric is introduced specifically to account for the observed relationship between performance and parameter count in the experiments, then used to demonstrate the lack of benefit. This reduces the explanatory step to a re-description of the input observations rather than an independent derivation. The core experimental comparisons of architectures and trainers are not themselves circular, but the interpretive claim about parameter impact is.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The central claim rests on empirical comparisons in a specific robot setup and a newly introduced metric whose exact formulation depends on choices in how performance and parameter counts are related; no independent evidence or external benchmarks are referenced in the abstract.

free parameters (1)
  • Parameter Impact metric formulation
    New metric introduced to quantify relationship between performance and parameter count; its precise definition and any scaling factors are not specified and appear chosen to fit the observed results.
axioms (2)
  • domain assumption Evolutionary and reinforcement learning protocols can be compared fairly across controller architectures without bias from implementation details
    Invoked when attributing performance differences to parameter count rather than training method specifics.
  • domain assumption Limited proprioception and bounded performance spaces make additional parameters detrimental to optimization
    Central premise for why more parameters hinder rather than help learning.
invented entities (1)
  • Parameter Impact metric no independent evidence
    purpose: To account for and demonstrate the relationship between performance and number of parameters
    Newly created quantity used to support the conclusion favoring evolutionary strategies; no independent evidence or falsifiable prediction outside the study is provided.

pith-pipeline@v0.9.0 · 5475 in / 1680 out tokens · 62199 ms · 2026-05-10T00:18:01.901328+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 41 canonical work pages · 5 internal anchors

  1. [1]

    Evolutionary Computation31(2), 81–122 (2023)

    Bäck, T.H., Kononova, A.V., van Stein, B., Wang, H., Antonov, K.A., Kalkreuth, R.T., de Nobel, J., Vermetten, D., de Winter, R., Ye, F.: Evolutionary algorithms for parameter optimization—thirty years later. Evolutionary Computation31(2), 81–122 (2023). https://doi.org/10.1162/evco_a_00325

  2. [2]

    Neural Computing and Applications32(2), 519–545 (Jan 2020)

    Baldominos,A.,Saez,Y.,Isasi,P.:Ontheautomated,evolutionarydesignofneural networks: past, present, and future. Neural Computing and Applications32(2), 519–545 (Jan 2020). https://doi.org/10.1007/s00521-019-04160-6

  3. [3]

    https://doi.org/10.48550/arXiv.2211.00458, arXiv:2211.00458 [cs]

    Bellegarda, G., Ijspeert, A.: CPG-RL: Learning Central Pattern Generators for Quadruped Locomotion (Nov 2022). https://doi.org/10.48550/arXiv.2211.00458, arXiv:2211.00458 [cs]

  4. [4]

    https://doi

    Bellegarda, G., Shafiee, M., Ijspeert, A.: Visual CPG-RL: Learning Central Pattern Generators for Visually-Guided Quadruped Locomotion (Mar 2024). https://doi. org/10.48550/arXiv.2212.14400, arXiv:2212.14400 [cs]

  5. [5]

    https://doi.org/10

    Bhattasali, N.X., Pattabiraman, V., Pinto, L., Lindsay, G.W.: Neural Circuit Architectural Priors for Quadruped Locomotion (Oct 2024). https://doi.org/10. 48550/arXiv.2410.07174, arXiv:2410.07174 [q-bio]

  6. [6]

    https://doi.org/10.48550/arXiv.2102.12891, arXiv:2102.12891 [cs]

    Campanaro, L., Gangapurwala, S., Martini, D.D., Merkt, W., Havoutis, I.: CPG- ACTOR: Reinforcement Learning for Central Pattern Generators (Feb 2021). https://doi.org/10.48550/arXiv.2102.12891, arXiv:2102.12891 [cs]

  7. [7]

    Proceedings of the Artificial Life Conference 2016 (ALIFE XV) pp

    Cheney, N., Bongard, J., Sunspiral, V., Lipson, H.: On the Difficulty of Co- Optimizing Morphology and Control in Evolved Virtual Creatures. Proceedings of the Artificial Life Conference 2016 (ALIFE XV) pp. 226–234 (2016). https: //doi.org/10.1162/978-0-262-33936-0-ch042

  8. [8]

    2013 , isbn =

    Cheney, N., MacCurdy, R., Clune, J., Lipson, H.: Unshackling Evolution: Evolving Soft Robots with Multiple Materials and a Powerful Generative Encoding. Pro- ceeding of the Fifteenth Annual Conference on Genetic and Evolutionary Com- putation - GECCO ’13 p. 167 (2013). https://doi.org/10.1145/2463372.2463404, iSBN: 9781450319638

  9. [9]

    In: Parallel Problem Solving from Nature – PPSN XVIII, vol

    van Diggelen, F., De Carlo, M., Cambier, N., Ferrante, E., Eiben, A.E.: Emergence of specialized Collective Behaviors in Evolving Heterogeneous Swarms. In: Parallel Problem Solving from Nature – PPSN XVIII, vol. 1 (Feb 2024). https://doi.org/ 10.1007/978-3-031-70068-2_4, arXiv: 2402.04763

  10. [10]

    In: Theory and Practice of Natural Computing, vol

    Eiben, A.E.: EvoSphere: the World of Robot Evolution. In: Theory and Practice of Natural Computing, vol. 9477, pp. 3–19. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-26841-5_1, series Title: Lecture Notes in Computer Science

  11. [11]

    https://doi.org/10.5281/zenodo.19633625

    Godin-Dubois, K.: Dataset on the benefits of low-cost bio- inspiration in the age of overparametrization (Apr 2026). https://doi.org/10.5281/zenodo.19633625

  12. [12]

    Artificial Life29(1), 66–93 (2023)

    Godin-Dubois, K., Cussat-Blanc, S., Duthen, Y.: Explaining the Neuroevolution of Fighting Creatures Through Virtual fMRI. Artificial Life29(1), 66–93 (2023). https://doi.org/10.1162/artl_a_00389, tex.type: journal

  13. [13]

    MIT Press (Jul 2024)

    Godin-Dubois, K., Cussat-Blanc, S., Duthen, Y.: Specialization or Generalization: Investigating NeuroEvolutionary Choices via Virtual fMRI. MIT Press (Jul 2024). https://doi.org/10.1162/isal_a_00817, tex.type: conference

  14. [14]

    Journal of Open Source Software (2025)

    Godin-Dubois, K., Miras, K., Kononova, A.V.: AMaze: a benchmark generator for sighted maze-navigating agents. Journal of Open Source Software (2025). https: //doi.org/10.21105/joss.07208, tex.type: journal 16 K. Godin-Dubois et al

  15. [15]

    In: Proceedings of IEEE international conference on evolutionary computation

    Hansen, N., Ostermeier, A.: Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation. In: Proceedings of IEEE international conference on evolutionary computation. pp. 312–317 (1996). https: //doi.org/10.1109/ICEC.1996.542381

  16. [16]

    2019, CMA-ES/pycma on Github, Zenodo, DOI:10.5281/zenodo.2559634, doi: 10.5281/zenodo.2559634

    Hansen, N., Akimoto, Y., Baudis, P.: CMA-ES/pycma on Github (Feb 2019). https://doi.org/10.5281/zenodo.2559634, tex.howpublished: Zenodo, DOI:10.5281/zenodo.2559634

  17. [17]

    Computers and Graphics (Pergamon)25(6), 1041–1048 (2001)

    Hornby, G.S., Pollack, J.B.: Evolving L-systems to generate virtual creatures. Computers and Graphics (Pergamon)25(6), 1041–1048 (2001). https://doi.org/ 10.1016/S0097-8493(01)00157-1, arXiv: 1011.1669v3 ISBN: 0097-8493

  18. [18]

    Neural Networks21(4), 642–653 (May 2008)

    Ijspeert, A.J.: Central pattern generators for locomotion control in animals and robots: A review. Neural Networks21(4), 642–653 (May 2008). https://doi.org/ 10.1016/j.neunet.2008.03.014

  19. [19]

    Adaptive Behavior7(2), 151– 172 (Mar 1999)

    Ijspeert, A.J., Hallam, J., Willshaw, D.: Evolving Swimming Controllers for a Sim- ulated Lamprey with Inspiration from Neurobiology. Adaptive Behavior7(2), 151– 172 (Mar 1999). https://doi.org/10.1177/105971239900700202

  20. [20]

    Artificial Life23(2), 206–235 (May 2017)

    Jelisavcic, M., de Carlo, M., Hupkes, E., Eustratiadis, P., Orlowski, J., Haasdijk, E., Auerbach, J.E., Eiben, A.E.: Real-World Evolution of Robot Morphologies: A Proof of Concept. Artificial Life23(2), 206–235 (May 2017). https://doi.org/10. 1162/ARTL_a_00231

  21. [21]

    Frontiers in Robotics and AI , author =

    Jelisavcic, M., Glette, K., Haasdijk, E., Eiben, A.E.: Lamarckian Evolution of Simulated Modular Robots. Frontiers in Robotics and AI6, 9 (Feb 2019). https: //doi.org/10.3389/frobt.2019.00009

  22. [22]

    Information Sciences298, 468–490 (Mar 2015)

    Kononova, A.V., Corne, D.W., De Wilde, P., Shneer, V., Caraffini, F.: Struc- tural bias in population-based algorithms. Information Sciences298, 468–490 (Mar 2015). https://doi.org/10.1016/j.ins.2014.11.035

  23. [23]

    Neurocomputing452, 294–306 (Sep 2021)

    Lan, G., Van Hooft, M., De Carlo, M., Tomczak, J.M., Eiben, A.: Learning lo- comotion skills in evolvable robots. Neurocomputing452, 294–306 (Sep 2021). https://doi.org/10.1016/j.neucom.2021.03.030

  24. [24]

    IEEE Transactions on Robotics39(5), 3382– 3401 (Oct 2023)

    Liu, X., Onal, C., Fu, J.: Reinforcement Learning of CPG-regulated Locomotion Controller for a Soft Snake Robot. IEEE Transactions on Robotics39(5), 3382– 3401 (Oct 2023). https://doi.org/10.1109/TRO.2023.3286046, arXiv:2207.04899 [cs]

  25. [25]

    Frontiers in Robotics and AI9, 797393 (May 2022)

    Luo, J., Stuurman, A.C., Tomczak, J.M., Ellers, J., Eiben, A.E.: The Effects of Learning in Morphologically Evolving Robot Systems. Frontiers in Robotics and AI9, 797393 (May 2022). https://doi.org/10.3389/frobt.2022.797393, arXiv: 2111.09851

  26. [26]

    https://doi.org/10.48550/arXiv.2309.13908, arXiv:2309.13908 [cs]

    Luo, J., Tomczak, J., Miras, K., Eiben, A.E.: A comparison of controller archi- tectures and learning mechanisms for arbitrary robot morphologies (Sep 2023). https://doi.org/10.48550/arXiv.2309.13908, arXiv:2309.13908 [cs]

  27. [27]

    10784, pp

    Miras,K.,Haasdijk,E.,Glette,K.,Eiben,A.E.:SearchSpaceAnalysisofEvolvable RobotMorphologies.In:Sim,K.,Kaufmann,P.(eds.)ApplicationsofEvolutionary Computation, vol. 10784, pp. 703–718. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-319-77538-8_47, series Title: Lecture Notes in Computer Science

  28. [28]

    https://doi.org/10.48550/ arXiv.2511.13224, arXiv:2511.13224 [astro-ph]

    Mohan, D., Scaife, A.M.M.: Natural gradient descent for improving variational in- ference based classification of radio galaxies (Nov 2025). https://doi.org/10.48550/ arXiv.2511.13224, arXiv:2511.13224 [astro-ph]

  29. [29]

    OpenAI, :, Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, Benefits of Low-Cost Bio-Inspiration 17 C., Pachocki, J., Petrov, M., Pinto, H.P.d.O., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., ...

  30. [30]

    https://doi.org/10.48550/arXiv.2507.12224, arXiv:2507.12224 [cs]

    Pascanu, R., Lyle, C., Modoranu, I.V., Borras, N.E., Alistarh, D., Velickovic, P., Chandar, S., De, S., Martens, J.: Optimizers Qualitatively Alter Solutions And We Should Leverage This (Jul 2025). https://doi.org/10.48550/arXiv.2507.12224, arXiv:2507.12224 [cs]

  31. [31]

    Journal of Machine Learning Research22(268), 1–8 (2021), http: //jmlr.org/papers/v22/20-1364.html

    Raffin Antonin, Hill Ashley, Gleave Adam, Kanervisto Anssi, Ernestus Maximil- ian, Dormann Noah: Stable-Baselines3: Reliable Reinforcement Learning Imple- mentations. Journal of Machine Learning Research22(268), 1–8 (2021), http: //jmlr.org/papers/v22/20-1364.html

  32. [32]

    High-Dimensional Continuous Control Using Generalized Advantage Estimation

    Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-Dimensional Continuous Control Using Generalized Advantage Estimation (Oct 2018). https: //doi.org/10.48550/arXiv.1506.02438, arXiv:1506.02438 [cs]

  33. [33]

    Proximal Policy Optimization Algorithms

    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms pp. 1–12 (Jul 2017), http://arxiv.org/abs/1707.06347, arXiv: 1707.06347

  34. [34]

    Designing neural networks through neuroevolu- tion.Nature Machine Intelligence, 1(1):24–35, 2019

    Stanley, K.O., Clune, J., Lehman, J., Miikkulainen, R.: Designing neural networks through neuroevolution. Nature Machine Intelligence1(1), 24–35 (2019). https: //doi.org/10.1038/s42256-018-0006-z

  35. [35]

    A framework for few-shot language model evaluation

    Stuurman, A., Weissl, O., Chiang, T.C., AndresG, Zeeuwe, D., Godin-Dubois, K., Roy: ci-group/revolve2: 1.2.3 (Nov 2024). https://doi.org/10.5281/ZENODO. 14143431, tex.type: software

  36. [36]

    Mujoco: A physics engine for model-based control.IEEE/RSJ International Conference on Intelligent Robots and Sys- tems, pages 5026–5033, 2012

    Todorov, E., Erez, T., Tassa, Y.: MuJoCo: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 5026–5033. IEEE (Oct 2012). https://doi.org/10.1109/IROS.2012. 6386109

  37. [37]

    Counter-strike deathmatch with large-scale behavioural cloning

    Tomilin, T., Dai, T., Fang, M., Pechenizkiy, M.: LevDoom: A Benchmark for Gen- eralization on Level Difficulty in Reinforcement Learning. In: 2022 IEEE Con- ference on Games (CoG). pp. 72–79. IEEE (Aug 2022). https://doi.org/10.1109/ CoG51982.2022.9893707

  38. [38]

    Towers, M., Kwiatkowski, A., Terry, J., Balis, J.U., De Cola, G., Deleu, T., Goulão, M., Kallinteris, A., Krimmel, M., KG, A., Perez-Vicente, R., Pierré, A., Schulhoff, S., Tai, J.J., Tan, H., Younis, O.G.: Gymnasium: A Standard Interface for Re- inforcement Learning Environments (Jul 2024), http://arxiv.org/abs/2407.17032, arXiv:2407.17032 [cs]

  39. [39]

    Deepgait: Planning and control of quadrupedal gaits using deep reinforcement learning,

    Tsounis, V., Alge, M., Lee, J., Farshidian, F., Hutter, M.: DeepGait: Planning and Control of Quadrupedal Gaits using Deep Reinforcement Learning (2019). https://doi.org/10.48550/ARXIV.1909.08399, version Number: 2

  40. [40]

    In: Proceedings of the Genetic and Evolu- tionary Computation Conference Companion

    Van Diggelen, F., Ferrante, E., Eiben, A.E.: Comparing lifetime learning methods for morphologically evolving robots. In: Proceedings of the Genetic and Evolu- tionary Computation Conference Companion. pp. 93–94. ACM, Lille France (Jul 2021). https://doi.org/10.1145/3449726.3459530

  41. [41]

    In: Proceedings of the Genetic and Evolutionary Computation Confer- ence Companion

    Veenstra, F., Hart, E., Buchanan, E., Li, W., De Carlo, M., Eiben, A.E.: Com- paring encodings for performance and phenotypic exploration in evolving modular robots. In: Proceedings of the Genetic and Evolutionary Computation Confer- ence Companion. pp. 127–128. ACM, Prague Czech Republic (Jul 2019). https: //doi.org/10.1145/3319619.3322054 18 K. Godin-Du...

  42. [42]

    International Journal of Advanced Robotic Systems14(4), 172988141772344 (Jul 2017)

    Wang, G., Chen, X., Han, S.K.: Central pattern generator and feedforward neu- ral network-based self-adaptive gait control for a crab-like robot locomoting on complex terrain under two reflex mechanisms. International Journal of Advanced Robotic Systems14(4), 172988141772344 (Jul 2017). https://doi.org/10.1177/ 1729881417723440

  43. [43]

    Scientific Reports15(1), 11262 (Apr 2025)

    Watanabe,T.,Kubo,A.,Tsunoda,K.,Matsuba,T.,Akatsuka,S.,Noda,Y.,Kioka, H., Izawa, J., Ishii, S., Nakamura, Y.: Hierarchical reinforcement learning with central pattern generator for enabling a quadruped robot simulator to walk on a variety of terrains. Scientific Reports15(1), 11262 (Apr 2025). https://doi.org/10. 1038/s41598-025-94163-2

  44. [44]

    https://doi.org/10.48550/arXiv.2402.06912, arXiv:2402.06912 [cs]

    Wong, A., Nobel, J.d., Bäck, T., Plaat, A., Kononova, A.V.: Solving Deep Rein- forcement Learning Tasks with Evolution Strategies and Linear Policy Networks (Jul 2024). https://doi.org/10.48550/arXiv.2402.06912, arXiv:2402.06912 [cs]

  45. [45]

    Understanding deep learning requires rethinking generalization

    Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization (Feb 2017). https://doi.org/10.48550/ arXiv.1611.03530, arXiv:1611.03530 [cs]