arxiv: 2604.20365 · v1 · submitted 2026-04-22 · 💻 cs.RO · cs.AI

Recognition: unknown

Benefits of Low-Cost Bio-Inspiration in the Age of Overparametrization

Kevin Godin-Dubois , Anil Yaman , Anna V. Kononova

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:18 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords central pattern generatorsmulti-layer perceptronsevolutionary strategiesreinforcement learningrobot controlparameter impactbio-inspired controllers

0 comments

The pith

Shallow MLPs and dense CPGs outperform deeper networks and RL architectures in bounded robot control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether larger parameter counts always improve learning when controlling robots with small input and output spaces. It tests Central Pattern Generators and Multi-Layer Perceptrons under evolutionary strategies and reinforcement learning on a robot with limited proprioception, using multiple reward functions. Results indicate that shallow MLPs and densely connected CPGs deliver higher performance than deeper MLPs or Actor-Critic methods. A new Parameter Impact metric links this outcome to parameter count, showing that the extra parameters demanded by reinforcement learning add no benefit and that evolutionary approaches are preferable.

Core claim

By varying parameter spaces across multiple reward functions, shallow MLPs and densely connected CPGs result in better performance when compared to deeper MLPs or Actor-Critic architectures. The additional parameters required by the reinforcement technique do not translate into better performance, thus favouring evolutionary strategies.

What carries the argument

Parameter Impact metric, which quantifies how performance scales with the number of optimized parameters across bio-inspired controller families.

If this is right

Evolutionary strategies are more efficient than reinforcement learning for optimizing these low-dimensional controllers.
Densely connected CPGs and shallow MLPs are the preferred architectures when input-output dimensionality is small and task performance is bounded.
Overparametrization can reduce learning effectiveness in robot control tasks with capped rewards.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The pattern may appear in other optimization domains where the environment or task imposes a hard performance ceiling.
Designers could test whether increasing parameter count ever becomes beneficial once sensory input dimensionality is raised.
The results suggest prioritizing parameter-efficient bio-inspired designs over scaling model size in similar robotics settings.

Load-bearing premise

The specific robot morphology, limited proprioceptive sensors, chosen reward functions, and training protocols are representative cases in which extra parameters inherently limit rather than expand achievable performance.

What would settle it

If deeper MLPs or Actor-Critic controllers achieve strictly higher rewards than shallow MLPs and dense CPGs under identical robot morphology, rewards, and evaluation protocol, the central claim would be contradicted.

Figures

Figures reproduced from arXiv: 2604.20365 by Anil Yaman, Anna V. Kononova, Kevin Godin-Dubois.

**Figure 2.** Figure 2: Neighbourhood configurations for a CPG network. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: ANN architectures with w denoting the width of hidden layers. we use Stable Baselines3 [31] for our RL training, which also provides a default network architecture. In its baseline configuration, ANNs used in conjunction with PPO [33] have two layers of 64 neurons with hyperbolic tangent activation function except for the output layer. There, the network outputs are given by a Gaussian distribution, during… view at source ↗

**Figure 4.** Figure 4: Relationship between performance and number of parameters for all con [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Parameter impact across all configurations and reward functions. Color [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: PCA of the 16D diversity space obtained by fitting sinusoidals onto each [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

read the original abstract

While Central Pattern Generators (CPGs) and Multi-Layer Perceptrons (MLP) are widely used paradigms in robot control, few systematic studies have been performed on the relative merits of large parameter spaces. In contexts where input and output spaces are small and performance is bounded, having more parameters to optimize may actively hinder the learning process instead of empowering it. To empirically measure this, we submit a given robot morphology, with limited proprioceptive capabilities, to controller optimization under two bio-inspired paradigms (CPGs and MLPs) with evolutionary- and reinforcement- trainer protocols. By varying parameter spaces across multiple reward functions, we observe that shallow MLPs and densely connected CPGs result in better performance when compared to deeper MLPs or Actor-Critic architectures. To account for the relationship between said performance and the number of parameters, we introduce a Parameter Impact metric which demonstrates that the additional parameters required by the reinforcement technique do not translate into better performance, thus favouring evolutionary strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Simpler shallow controllers with evolutionary training outperform deeper MLPs and Actor-Critic RL on this bounded robot task, but the gap may trace to unmatched training effort rather than parameter count alone.

read the letter

The main thing to know is that this work finds shallow MLPs and dense CPGs trained by evolution beat deeper networks and Actor-Critic setups on a robot with small sensor space and bounded performance. The authors introduce a Parameter Impact metric to argue that the extra parameters in RL do not deliver better results and therefore favor evolutionary strategies over reinforcement learning in these regimes.

Referee Report

3 major / 1 minor

Summary. The paper claims that in robot control tasks with small input/output spaces and bounded performance, increasing parameter counts (as in deep MLPs or Actor-Critic RL) hinders rather than helps learning. Experiments on a limited-proprioception robot morphology compare CPGs and MLPs under evolutionary strategies (ES) versus reinforcement learning, finding superior performance for shallow MLPs and densely connected CPGs; a new Parameter Impact metric is introduced to show that RL's extra parameters do not yield better results, favoring low-cost bio-inspired ES approaches.

Significance. If the central claim holds after proper controls and validation, the work would provide useful evidence against overparametrization trends in robotics controllers for constrained tasks, potentially guiding practitioners toward simpler bio-inspired designs like dense CPGs with ES. The Parameter Impact metric could become a reusable analysis tool if independently validated, but the current lack of experimental rigor limits immediate impact.

major comments (3)

[Abstract] Abstract: the central observations on performance differences across architectures and trainers are presented without any description of experimental setup, number of trials, statistical tests, error bars, or data exclusion criteria, leaving the claims without verifiable empirical support.
[Parameter Impact metric] Section introducing the Parameter Impact metric: the metric is used to explain why additional RL parameters do not improve performance, yet its definition and calculation appear derived from the same experimental outcomes, creating a circularity risk that undermines its explanatory power.
[Experimental comparison] Experimental comparison (Methods/Results): the ES versus Actor-Critic evaluation does not indicate whether total environment interactions, generations, or wall-clock time were equated across trainers, nor whether RL received equivalent hyperparameter search; this leaves open the possibility that observed gaps arise from training-protocol mismatch or hyperparameter sensitivity rather than parameter count per se.

minor comments (1)

[Abstract] Abstract: the specific reward functions and parameter-space variations are referenced but not enumerated, reducing clarity for readers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We address each of the major comments in detail below, providing clarifications and indicating the revisions made to strengthen the paper.

read point-by-point responses

Referee: [Abstract] Abstract: the central observations on performance differences across architectures and trainers are presented without any description of experimental setup, number of trials, statistical tests, error bars, or data exclusion criteria, leaving the claims without verifiable empirical support.

Authors: We agree that the abstract would benefit from additional details to support the claims. In the revised manuscript, we have expanded the abstract to briefly mention the experimental setup, including the use of 10 independent trials per condition, statistical significance assessed via Wilcoxon rank-sum tests with p < 0.05, and error bars representing one standard deviation. Data exclusion criteria were not applicable as all optimization runs completed successfully. These additions provide verifiable empirical context while respecting abstract length constraints, with full details retained in the Methods section. revision: yes
Referee: [Parameter Impact metric] Section introducing the Parameter Impact metric: the metric is used to explain why additional RL parameters do not improve performance, yet its definition and calculation appear derived from the same experimental outcomes, creating a circularity risk that undermines its explanatory power.

Authors: We appreciate this observation regarding potential circularity. The Parameter Impact metric is defined independently as the normalized performance gain per additional parameter, using the formula: Impact = (Perf_arch - Perf_baseline) / (Params_arch - Params_baseline), where baseline is the simplest architecture. Although applied to our results, the definition is general and not dependent on specific outcomes. In the revision, we have moved the formal mathematical definition to precede the results section and clarified that it can be used as a standalone tool for analyzing parameter efficiency in other studies. This addresses the concern by emphasizing the metric's a priori definition. revision: partial
Referee: [Experimental comparison] Experimental comparison (Methods/Results): the ES versus Actor-Critic evaluation does not indicate whether total environment interactions, generations, or wall-clock time were equated across trainers, nor whether RL received equivalent hyperparameter search; this leaves open the possibility that observed gaps arise from training-protocol mismatch or hyperparameter sensitivity rather than parameter count per se.

Authors: This is a valid point for ensuring fair comparison. In our experiments, we equated the total number of environment interactions (fitness evaluations) between ES and RL: specifically, RL was run for a number of episodes equivalent to the total evaluations in ES (e.g., 5000 interactions). Wall-clock time was monitored but not strictly equated due to differing computational profiles, though we note this in the revision. For RL, we conducted a hyperparameter search over learning rates [1e-4, 1e-3], discount factors [0.9, 0.99], and network sizes, selecting the configuration that maximized performance. We have added a new paragraph in the Methods section explicitly stating these equivalences and the hyperparameter tuning procedure to rule out protocol mismatches as the source of performance differences. revision: yes

Circularity Check

1 steps flagged

Parameter Impact metric is defined from experimental data to demonstrate conclusions from that same data

specific steps

fitted input called prediction [Abstract]
"To account for the relationship between said performance and the number of parameters, we introduce a Parameter Impact metric which demonstrates that the additional parameters required by the reinforcement technique do not translate into better performance, thus favouring evolutionary strategies."

The metric is introduced to account for the performance-parameter relationship observed across the varied parameter spaces and reward functions in the experiments. It is then invoked to demonstrate that extra parameters do not yield better performance. This makes the demonstration equivalent to the experimental inputs by construction, as the metric has no independent grounding or predictive power outside the fitted data.

full rationale

The paper's central explanation for why additional parameters (from Actor-Critic) fail to improve performance rests on a newly introduced Parameter Impact metric. This metric is introduced specifically to account for the observed relationship between performance and parameter count in the experiments, then used to demonstrate the lack of benefit. This reduces the explanatory step to a re-description of the input observations rather than an independent derivation. The core experimental comparisons of architectures and trainers are not themselves circular, but the interpretive claim about parameter impact is.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The central claim rests on empirical comparisons in a specific robot setup and a newly introduced metric whose exact formulation depends on choices in how performance and parameter counts are related; no independent evidence or external benchmarks are referenced in the abstract.

free parameters (1)

Parameter Impact metric formulation
New metric introduced to quantify relationship between performance and parameter count; its precise definition and any scaling factors are not specified and appear chosen to fit the observed results.

axioms (2)

domain assumption Evolutionary and reinforcement learning protocols can be compared fairly across controller architectures without bias from implementation details
Invoked when attributing performance differences to parameter count rather than training method specifics.
domain assumption Limited proprioception and bounded performance spaces make additional parameters detrimental to optimization
Central premise for why more parameters hinder rather than help learning.

invented entities (1)

Parameter Impact metric no independent evidence
purpose: To account for and demonstrate the relationship between performance and number of parameters
Newly created quantity used to support the conclusion favoring evolutionary strategies; no independent evidence or falsifiable prediction outside the study is provided.

pith-pipeline@v0.9.0 · 5475 in / 1680 out tokens · 62199 ms · 2026-05-10T00:18:01.901328+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 41 canonical work pages · 5 internal anchors

[1]

Evolutionary Computation31(2), 81–122 (2023)

Bäck, T.H., Kononova, A.V., van Stein, B., Wang, H., Antonov, K.A., Kalkreuth, R.T., de Nobel, J., Vermetten, D., de Winter, R., Ye, F.: Evolutionary algorithms for parameter optimization—thirty years later. Evolutionary Computation31(2), 81–122 (2023). https://doi.org/10.1162/evco_a_00325

work page doi:10.1162/evco_a_00325 2023
[2]

Neural Computing and Applications32(2), 519–545 (Jan 2020)

Baldominos,A.,Saez,Y.,Isasi,P.:Ontheautomated,evolutionarydesignofneural networks: past, present, and future. Neural Computing and Applications32(2), 519–545 (Jan 2020). https://doi.org/10.1007/s00521-019-04160-6

work page doi:10.1007/s00521-019-04160-6 2020
[3]

https://doi.org/10.48550/arXiv.2211.00458, arXiv:2211.00458 [cs]

Bellegarda, G., Ijspeert, A.: CPG-RL: Learning Central Pattern Generators for Quadruped Locomotion (Nov 2022). https://doi.org/10.48550/arXiv.2211.00458, arXiv:2211.00458 [cs]

work page doi:10.48550/arxiv.2211.00458 2022
[4]

https://doi

Bellegarda, G., Shafiee, M., Ijspeert, A.: Visual CPG-RL: Learning Central Pattern Generators for Visually-Guided Quadruped Locomotion (Mar 2024). https://doi. org/10.48550/arXiv.2212.14400, arXiv:2212.14400 [cs]

work page doi:10.48550/arxiv.2212.14400 2024
[5]

https://doi.org/10

Bhattasali, N.X., Pattabiraman, V., Pinto, L., Lindsay, G.W.: Neural Circuit Architectural Priors for Quadruped Locomotion (Oct 2024). https://doi.org/10. 48550/arXiv.2410.07174, arXiv:2410.07174 [q-bio]

work page arXiv 2024
[6]

https://doi.org/10.48550/arXiv.2102.12891, arXiv:2102.12891 [cs]

Campanaro, L., Gangapurwala, S., Martini, D.D., Merkt, W., Havoutis, I.: CPG- ACTOR: Reinforcement Learning for Central Pattern Generators (Feb 2021). https://doi.org/10.48550/arXiv.2102.12891, arXiv:2102.12891 [cs]

work page doi:10.48550/arxiv.2102.12891 2021
[7]

Proceedings of the Artificial Life Conference 2016 (ALIFE XV) pp

Cheney, N., Bongard, J., Sunspiral, V., Lipson, H.: On the Difficulty of Co- Optimizing Morphology and Control in Evolved Virtual Creatures. Proceedings of the Artificial Life Conference 2016 (ALIFE XV) pp. 226–234 (2016). https: //doi.org/10.1162/978-0-262-33936-0-ch042

work page doi:10.1162/978-0-262-33936-0-ch042 2016
[8]

2013 , isbn =

Cheney, N., MacCurdy, R., Clune, J., Lipson, H.: Unshackling Evolution: Evolving Soft Robots with Multiple Materials and a Powerful Generative Encoding. Pro- ceeding of the Fifteenth Annual Conference on Genetic and Evolutionary Com- putation - GECCO ’13 p. 167 (2013). https://doi.org/10.1145/2463372.2463404, iSBN: 9781450319638

work page doi:10.1145/2463372.2463404 2013
[9]

In: Parallel Problem Solving from Nature – PPSN XVIII, vol

van Diggelen, F., De Carlo, M., Cambier, N., Ferrante, E., Eiben, A.E.: Emergence of specialized Collective Behaviors in Evolving Heterogeneous Swarms. In: Parallel Problem Solving from Nature – PPSN XVIII, vol. 1 (Feb 2024). https://doi.org/ 10.1007/978-3-031-70068-2_4, arXiv: 2402.04763

work page doi:10.1007/978-3-031-70068-2_4 2024
[10]

In: Theory and Practice of Natural Computing, vol

Eiben, A.E.: EvoSphere: the World of Robot Evolution. In: Theory and Practice of Natural Computing, vol. 9477, pp. 3–19. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-26841-5_1, series Title: Lecture Notes in Computer Science

work page doi:10.1007/978-3-319-26841-5_1 2015
[11]

https://doi.org/10.5281/zenodo.19633625

Godin-Dubois, K.: Dataset on the benefits of low-cost bio- inspiration in the age of overparametrization (Apr 2026). https://doi.org/10.5281/zenodo.19633625

work page doi:10.5281/zenodo.19633625 2026
[12]

Artificial Life29(1), 66–93 (2023)

Godin-Dubois, K., Cussat-Blanc, S., Duthen, Y.: Explaining the Neuroevolution of Fighting Creatures Through Virtual fMRI. Artificial Life29(1), 66–93 (2023). https://doi.org/10.1162/artl_a_00389, tex.type: journal

work page doi:10.1162/artl_a_00389 2023
[13]

MIT Press (Jul 2024)

Godin-Dubois, K., Cussat-Blanc, S., Duthen, Y.: Specialization or Generalization: Investigating NeuroEvolutionary Choices via Virtual fMRI. MIT Press (Jul 2024). https://doi.org/10.1162/isal_a_00817, tex.type: conference

work page doi:10.1162/isal_a_00817 2024
[14]

Journal of Open Source Software (2025)

Godin-Dubois, K., Miras, K., Kononova, A.V.: AMaze: a benchmark generator for sighted maze-navigating agents. Journal of Open Source Software (2025). https: //doi.org/10.21105/joss.07208, tex.type: journal 16 K. Godin-Dubois et al

work page doi:10.21105/joss.07208 2025
[15]

In: Proceedings of IEEE international conference on evolutionary computation

Hansen, N., Ostermeier, A.: Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation. In: Proceedings of IEEE international conference on evolutionary computation. pp. 312–317 (1996). https: //doi.org/10.1109/ICEC.1996.542381

work page doi:10.1109/icec.1996.542381 1996
[16]

2019, CMA-ES/pycma on Github, Zenodo, DOI:10.5281/zenodo.2559634, doi: 10.5281/zenodo.2559634

Hansen, N., Akimoto, Y., Baudis, P.: CMA-ES/pycma on Github (Feb 2019). https://doi.org/10.5281/zenodo.2559634, tex.howpublished: Zenodo, DOI:10.5281/zenodo.2559634

work page doi:10.5281/zenodo.2559634 2019
[17]

Computers and Graphics (Pergamon)25(6), 1041–1048 (2001)

Hornby, G.S., Pollack, J.B.: Evolving L-systems to generate virtual creatures. Computers and Graphics (Pergamon)25(6), 1041–1048 (2001). https://doi.org/ 10.1016/S0097-8493(01)00157-1, arXiv: 1011.1669v3 ISBN: 0097-8493

work page doi:10.1016/s0097-8493(01)00157-1 2001
[18]

Neural Networks21(4), 642–653 (May 2008)

Ijspeert, A.J.: Central pattern generators for locomotion control in animals and robots: A review. Neural Networks21(4), 642–653 (May 2008). https://doi.org/ 10.1016/j.neunet.2008.03.014

work page doi:10.1016/j.neunet.2008.03.014 2008
[19]

Adaptive Behavior7(2), 151– 172 (Mar 1999)

Ijspeert, A.J., Hallam, J., Willshaw, D.: Evolving Swimming Controllers for a Sim- ulated Lamprey with Inspiration from Neurobiology. Adaptive Behavior7(2), 151– 172 (Mar 1999). https://doi.org/10.1177/105971239900700202

work page doi:10.1177/105971239900700202 1999
[20]

Artificial Life23(2), 206–235 (May 2017)

Jelisavcic, M., de Carlo, M., Hupkes, E., Eustratiadis, P., Orlowski, J., Haasdijk, E., Auerbach, J.E., Eiben, A.E.: Real-World Evolution of Robot Morphologies: A Proof of Concept. Artificial Life23(2), 206–235 (May 2017). https://doi.org/10. 1162/ARTL_a_00231

2017
[21]

Frontiers in Robotics and AI , author =

Jelisavcic, M., Glette, K., Haasdijk, E., Eiben, A.E.: Lamarckian Evolution of Simulated Modular Robots. Frontiers in Robotics and AI6, 9 (Feb 2019). https: //doi.org/10.3389/frobt.2019.00009

work page doi:10.3389/frobt.2019.00009 2019
[22]

Information Sciences298, 468–490 (Mar 2015)

Kononova, A.V., Corne, D.W., De Wilde, P., Shneer, V., Caraffini, F.: Struc- tural bias in population-based algorithms. Information Sciences298, 468–490 (Mar 2015). https://doi.org/10.1016/j.ins.2014.11.035

work page doi:10.1016/j.ins.2014.11.035 2015
[23]

Neurocomputing452, 294–306 (Sep 2021)

Lan, G., Van Hooft, M., De Carlo, M., Tomczak, J.M., Eiben, A.: Learning lo- comotion skills in evolvable robots. Neurocomputing452, 294–306 (Sep 2021). https://doi.org/10.1016/j.neucom.2021.03.030

work page doi:10.1016/j.neucom.2021.03.030 2021
[24]

IEEE Transactions on Robotics39(5), 3382– 3401 (Oct 2023)

Liu, X., Onal, C., Fu, J.: Reinforcement Learning of CPG-regulated Locomotion Controller for a Soft Snake Robot. IEEE Transactions on Robotics39(5), 3382– 3401 (Oct 2023). https://doi.org/10.1109/TRO.2023.3286046, arXiv:2207.04899 [cs]

work page doi:10.1109/tro.2023.3286046 2023
[25]

Frontiers in Robotics and AI9, 797393 (May 2022)

Luo, J., Stuurman, A.C., Tomczak, J.M., Ellers, J., Eiben, A.E.: The Effects of Learning in Morphologically Evolving Robot Systems. Frontiers in Robotics and AI9, 797393 (May 2022). https://doi.org/10.3389/frobt.2022.797393, arXiv: 2111.09851

work page doi:10.3389/frobt.2022.797393 2022
[26]

https://doi.org/10.48550/arXiv.2309.13908, arXiv:2309.13908 [cs]

Luo, J., Tomczak, J., Miras, K., Eiben, A.E.: A comparison of controller archi- tectures and learning mechanisms for arbitrary robot morphologies (Sep 2023). https://doi.org/10.48550/arXiv.2309.13908, arXiv:2309.13908 [cs]

work page doi:10.48550/arxiv.2309.13908 2023
[27]

10784, pp

Miras,K.,Haasdijk,E.,Glette,K.,Eiben,A.E.:SearchSpaceAnalysisofEvolvable RobotMorphologies.In:Sim,K.,Kaufmann,P.(eds.)ApplicationsofEvolutionary Computation, vol. 10784, pp. 703–718. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-319-77538-8_47, series Title: Lecture Notes in Computer Science

work page doi:10.1007/978-3-319-77538-8_47 2018
[28]

https://doi.org/10.48550/ arXiv.2511.13224, arXiv:2511.13224 [astro-ph]

Mohan, D., Scaife, A.M.M.: Natural gradient descent for improving variational in- ference based classification of radio galaxies (Nov 2025). https://doi.org/10.48550/ arXiv.2511.13224, arXiv:2511.13224 [astro-ph]

work page arXiv 2025
[29]

OpenAI, :, Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, Benefits of Low-Cost Bio-Inspiration 17 C., Pachocki, J., Petrov, M., Pinto, H.P.d.O., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., ...

work page internal anchor Pith review arXiv 2019
[30]

https://doi.org/10.48550/arXiv.2507.12224, arXiv:2507.12224 [cs]

Pascanu, R., Lyle, C., Modoranu, I.V., Borras, N.E., Alistarh, D., Velickovic, P., Chandar, S., De, S., Martens, J.: Optimizers Qualitatively Alter Solutions And We Should Leverage This (Jul 2025). https://doi.org/10.48550/arXiv.2507.12224, arXiv:2507.12224 [cs]

work page doi:10.48550/arxiv.2507.12224 2025
[31]

Journal of Machine Learning Research22(268), 1–8 (2021), http: //jmlr.org/papers/v22/20-1364.html

Raffin Antonin, Hill Ashley, Gleave Adam, Kanervisto Anssi, Ernestus Maximil- ian, Dormann Noah: Stable-Baselines3: Reliable Reinforcement Learning Imple- mentations. Journal of Machine Learning Research22(268), 1–8 (2021), http: //jmlr.org/papers/v22/20-1364.html

2021
[32]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-Dimensional Continuous Control Using Generalized Advantage Estimation (Oct 2018). https: //doi.org/10.48550/arXiv.1506.02438, arXiv:1506.02438 [cs]

work page internal anchor Pith review doi:10.48550/arxiv.1506.02438 2018
[33]

Proximal Policy Optimization Algorithms

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms pp. 1–12 (Jul 2017), http://arxiv.org/abs/1707.06347, arXiv: 1707.06347

work page internal anchor Pith review Pith/arXiv arXiv 2017
[34]

Designing neural networks through neuroevolu- tion.Nature Machine Intelligence, 1(1):24–35, 2019

Stanley, K.O., Clune, J., Lehman, J., Miikkulainen, R.: Designing neural networks through neuroevolution. Nature Machine Intelligence1(1), 24–35 (2019). https: //doi.org/10.1038/s42256-018-0006-z

work page doi:10.1038/s42256-018-0006-z 2019
[35]

A framework for few-shot language model evaluation

Stuurman, A., Weissl, O., Chiang, T.C., AndresG, Zeeuwe, D., Godin-Dubois, K., Roy: ci-group/revolve2: 1.2.3 (Nov 2024). https://doi.org/10.5281/ZENODO. 14143431, tex.type: software

work page doi:10.5281/zenodo 2024
[36]

Mujoco: A physics engine for model-based control.IEEE/RSJ International Conference on Intelligent Robots and Sys- tems, pages 5026–5033, 2012

Todorov, E., Erez, T., Tassa, Y.: MuJoCo: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 5026–5033. IEEE (Oct 2012). https://doi.org/10.1109/IROS.2012. 6386109

work page doi:10.1109/iros.2012 2012
[37]

Counter-strike deathmatch with large-scale behavioural cloning

Tomilin, T., Dai, T., Fang, M., Pechenizkiy, M.: LevDoom: A Benchmark for Gen- eralization on Level Difficulty in Reinforcement Learning. In: 2022 IEEE Con- ference on Games (CoG). pp. 72–79. IEEE (Aug 2022). https://doi.org/10.1109/ CoG51982.2022.9893707

work page arXiv 2022
[38]

Towers, M., Kwiatkowski, A., Terry, J., Balis, J.U., De Cola, G., Deleu, T., Goulão, M., Kallinteris, A., Krimmel, M., KG, A., Perez-Vicente, R., Pierré, A., Schulhoff, S., Tai, J.J., Tan, H., Younis, O.G.: Gymnasium: A Standard Interface for Re- inforcement Learning Environments (Jul 2024), http://arxiv.org/abs/2407.17032, arXiv:2407.17032 [cs]

work page internal anchor Pith review arXiv 2024
[39]

Deepgait: Planning and control of quadrupedal gaits using deep reinforcement learning,

Tsounis, V., Alge, M., Lee, J., Farshidian, F., Hutter, M.: DeepGait: Planning and Control of Quadrupedal Gaits using Deep Reinforcement Learning (2019). https://doi.org/10.48550/ARXIV.1909.08399, version Number: 2

work page doi:10.48550/arxiv.1909.08399 2019
[40]

In: Proceedings of the Genetic and Evolu- tionary Computation Conference Companion

Van Diggelen, F., Ferrante, E., Eiben, A.E.: Comparing lifetime learning methods for morphologically evolving robots. In: Proceedings of the Genetic and Evolu- tionary Computation Conference Companion. pp. 93–94. ACM, Lille France (Jul 2021). https://doi.org/10.1145/3449726.3459530

work page doi:10.1145/3449726.3459530 2021
[41]

In: Proceedings of the Genetic and Evolutionary Computation Confer- ence Companion

Veenstra, F., Hart, E., Buchanan, E., Li, W., De Carlo, M., Eiben, A.E.: Com- paring encodings for performance and phenotypic exploration in evolving modular robots. In: Proceedings of the Genetic and Evolutionary Computation Confer- ence Companion. pp. 127–128. ACM, Prague Czech Republic (Jul 2019). https: //doi.org/10.1145/3319619.3322054 18 K. Godin-Du...

work page doi:10.1145/3319619.3322054 2019
[42]

International Journal of Advanced Robotic Systems14(4), 172988141772344 (Jul 2017)

Wang, G., Chen, X., Han, S.K.: Central pattern generator and feedforward neu- ral network-based self-adaptive gait control for a crab-like robot locomoting on complex terrain under two reflex mechanisms. International Journal of Advanced Robotic Systems14(4), 172988141772344 (Jul 2017). https://doi.org/10.1177/ 1729881417723440

2017
[43]

Scientific Reports15(1), 11262 (Apr 2025)

Watanabe,T.,Kubo,A.,Tsunoda,K.,Matsuba,T.,Akatsuka,S.,Noda,Y.,Kioka, H., Izawa, J., Ishii, S., Nakamura, Y.: Hierarchical reinforcement learning with central pattern generator for enabling a quadruped robot simulator to walk on a variety of terrains. Scientific Reports15(1), 11262 (Apr 2025). https://doi.org/10. 1038/s41598-025-94163-2

2025
[44]

https://doi.org/10.48550/arXiv.2402.06912, arXiv:2402.06912 [cs]

Wong, A., Nobel, J.d., Bäck, T., Plaat, A., Kononova, A.V.: Solving Deep Rein- forcement Learning Tasks with Evolution Strategies and Linear Policy Networks (Jul 2024). https://doi.org/10.48550/arXiv.2402.06912, arXiv:2402.06912 [cs]

work page doi:10.48550/arxiv.2402.06912 2024
[45]

Understanding deep learning requires rethinking generalization

Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization (Feb 2017). https://doi.org/10.48550/ arXiv.1611.03530, arXiv:1611.03530 [cs]

work page internal anchor Pith review arXiv 2017