Recognition: unknown
Benefits of Low-Cost Bio-Inspiration in the Age of Overparametrization
Pith reviewed 2026-05-10 00:18 UTC · model grok-4.3
The pith
Shallow MLPs and dense CPGs outperform deeper networks and RL architectures in bounded robot control.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By varying parameter spaces across multiple reward functions, shallow MLPs and densely connected CPGs result in better performance when compared to deeper MLPs or Actor-Critic architectures. The additional parameters required by the reinforcement technique do not translate into better performance, thus favouring evolutionary strategies.
What carries the argument
Parameter Impact metric, which quantifies how performance scales with the number of optimized parameters across bio-inspired controller families.
If this is right
- Evolutionary strategies are more efficient than reinforcement learning for optimizing these low-dimensional controllers.
- Densely connected CPGs and shallow MLPs are the preferred architectures when input-output dimensionality is small and task performance is bounded.
- Overparametrization can reduce learning effectiveness in robot control tasks with capped rewards.
Where Pith is reading between the lines
- The pattern may appear in other optimization domains where the environment or task imposes a hard performance ceiling.
- Designers could test whether increasing parameter count ever becomes beneficial once sensory input dimensionality is raised.
- The results suggest prioritizing parameter-efficient bio-inspired designs over scaling model size in similar robotics settings.
Load-bearing premise
The specific robot morphology, limited proprioceptive sensors, chosen reward functions, and training protocols are representative cases in which extra parameters inherently limit rather than expand achievable performance.
What would settle it
If deeper MLPs or Actor-Critic controllers achieve strictly higher rewards than shallow MLPs and dense CPGs under identical robot morphology, rewards, and evaluation protocol, the central claim would be contradicted.
Figures
read the original abstract
While Central Pattern Generators (CPGs) and Multi-Layer Perceptrons (MLP) are widely used paradigms in robot control, few systematic studies have been performed on the relative merits of large parameter spaces. In contexts where input and output spaces are small and performance is bounded, having more parameters to optimize may actively hinder the learning process instead of empowering it. To empirically measure this, we submit a given robot morphology, with limited proprioceptive capabilities, to controller optimization under two bio-inspired paradigms (CPGs and MLPs) with evolutionary- and reinforcement- trainer protocols. By varying parameter spaces across multiple reward functions, we observe that shallow MLPs and densely connected CPGs result in better performance when compared to deeper MLPs or Actor-Critic architectures. To account for the relationship between said performance and the number of parameters, we introduce a Parameter Impact metric which demonstrates that the additional parameters required by the reinforcement technique do not translate into better performance, thus favouring evolutionary strategies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that in robot control tasks with small input/output spaces and bounded performance, increasing parameter counts (as in deep MLPs or Actor-Critic RL) hinders rather than helps learning. Experiments on a limited-proprioception robot morphology compare CPGs and MLPs under evolutionary strategies (ES) versus reinforcement learning, finding superior performance for shallow MLPs and densely connected CPGs; a new Parameter Impact metric is introduced to show that RL's extra parameters do not yield better results, favoring low-cost bio-inspired ES approaches.
Significance. If the central claim holds after proper controls and validation, the work would provide useful evidence against overparametrization trends in robotics controllers for constrained tasks, potentially guiding practitioners toward simpler bio-inspired designs like dense CPGs with ES. The Parameter Impact metric could become a reusable analysis tool if independently validated, but the current lack of experimental rigor limits immediate impact.
major comments (3)
- [Abstract] Abstract: the central observations on performance differences across architectures and trainers are presented without any description of experimental setup, number of trials, statistical tests, error bars, or data exclusion criteria, leaving the claims without verifiable empirical support.
- [Parameter Impact metric] Section introducing the Parameter Impact metric: the metric is used to explain why additional RL parameters do not improve performance, yet its definition and calculation appear derived from the same experimental outcomes, creating a circularity risk that undermines its explanatory power.
- [Experimental comparison] Experimental comparison (Methods/Results): the ES versus Actor-Critic evaluation does not indicate whether total environment interactions, generations, or wall-clock time were equated across trainers, nor whether RL received equivalent hyperparameter search; this leaves open the possibility that observed gaps arise from training-protocol mismatch or hyperparameter sensitivity rather than parameter count per se.
minor comments (1)
- [Abstract] Abstract: the specific reward functions and parameter-space variations are referenced but not enumerated, reducing clarity for readers.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive comments on our manuscript. We address each of the major comments in detail below, providing clarifications and indicating the revisions made to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central observations on performance differences across architectures and trainers are presented without any description of experimental setup, number of trials, statistical tests, error bars, or data exclusion criteria, leaving the claims without verifiable empirical support.
Authors: We agree that the abstract would benefit from additional details to support the claims. In the revised manuscript, we have expanded the abstract to briefly mention the experimental setup, including the use of 10 independent trials per condition, statistical significance assessed via Wilcoxon rank-sum tests with p < 0.05, and error bars representing one standard deviation. Data exclusion criteria were not applicable as all optimization runs completed successfully. These additions provide verifiable empirical context while respecting abstract length constraints, with full details retained in the Methods section. revision: yes
-
Referee: [Parameter Impact metric] Section introducing the Parameter Impact metric: the metric is used to explain why additional RL parameters do not improve performance, yet its definition and calculation appear derived from the same experimental outcomes, creating a circularity risk that undermines its explanatory power.
Authors: We appreciate this observation regarding potential circularity. The Parameter Impact metric is defined independently as the normalized performance gain per additional parameter, using the formula: Impact = (Perf_arch - Perf_baseline) / (Params_arch - Params_baseline), where baseline is the simplest architecture. Although applied to our results, the definition is general and not dependent on specific outcomes. In the revision, we have moved the formal mathematical definition to precede the results section and clarified that it can be used as a standalone tool for analyzing parameter efficiency in other studies. This addresses the concern by emphasizing the metric's a priori definition. revision: partial
-
Referee: [Experimental comparison] Experimental comparison (Methods/Results): the ES versus Actor-Critic evaluation does not indicate whether total environment interactions, generations, or wall-clock time were equated across trainers, nor whether RL received equivalent hyperparameter search; this leaves open the possibility that observed gaps arise from training-protocol mismatch or hyperparameter sensitivity rather than parameter count per se.
Authors: This is a valid point for ensuring fair comparison. In our experiments, we equated the total number of environment interactions (fitness evaluations) between ES and RL: specifically, RL was run for a number of episodes equivalent to the total evaluations in ES (e.g., 5000 interactions). Wall-clock time was monitored but not strictly equated due to differing computational profiles, though we note this in the revision. For RL, we conducted a hyperparameter search over learning rates [1e-4, 1e-3], discount factors [0.9, 0.99], and network sizes, selecting the configuration that maximized performance. We have added a new paragraph in the Methods section explicitly stating these equivalences and the hyperparameter tuning procedure to rule out protocol mismatches as the source of performance differences. revision: yes
Circularity Check
Parameter Impact metric is defined from experimental data to demonstrate conclusions from that same data
specific steps
-
fitted input called prediction
[Abstract]
"To account for the relationship between said performance and the number of parameters, we introduce a Parameter Impact metric which demonstrates that the additional parameters required by the reinforcement technique do not translate into better performance, thus favouring evolutionary strategies."
The metric is introduced to account for the performance-parameter relationship observed across the varied parameter spaces and reward functions in the experiments. It is then invoked to demonstrate that extra parameters do not yield better performance. This makes the demonstration equivalent to the experimental inputs by construction, as the metric has no independent grounding or predictive power outside the fitted data.
full rationale
The paper's central explanation for why additional parameters (from Actor-Critic) fail to improve performance rests on a newly introduced Parameter Impact metric. This metric is introduced specifically to account for the observed relationship between performance and parameter count in the experiments, then used to demonstrate the lack of benefit. This reduces the explanatory step to a re-description of the input observations rather than an independent derivation. The core experimental comparisons of architectures and trainers are not themselves circular, but the interpretive claim about parameter impact is.
Axiom & Free-Parameter Ledger
free parameters (1)
- Parameter Impact metric formulation
axioms (2)
- domain assumption Evolutionary and reinforcement learning protocols can be compared fairly across controller architectures without bias from implementation details
- domain assumption Limited proprioception and bounded performance spaces make additional parameters detrimental to optimization
invented entities (1)
-
Parameter Impact metric
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Evolutionary Computation31(2), 81–122 (2023)
Bäck, T.H., Kononova, A.V., van Stein, B., Wang, H., Antonov, K.A., Kalkreuth, R.T., de Nobel, J., Vermetten, D., de Winter, R., Ye, F.: Evolutionary algorithms for parameter optimization—thirty years later. Evolutionary Computation31(2), 81–122 (2023). https://doi.org/10.1162/evco_a_00325
-
[2]
Neural Computing and Applications32(2), 519–545 (Jan 2020)
Baldominos,A.,Saez,Y.,Isasi,P.:Ontheautomated,evolutionarydesignofneural networks: past, present, and future. Neural Computing and Applications32(2), 519–545 (Jan 2020). https://doi.org/10.1007/s00521-019-04160-6
-
[3]
https://doi.org/10.48550/arXiv.2211.00458, arXiv:2211.00458 [cs]
Bellegarda, G., Ijspeert, A.: CPG-RL: Learning Central Pattern Generators for Quadruped Locomotion (Nov 2022). https://doi.org/10.48550/arXiv.2211.00458, arXiv:2211.00458 [cs]
-
[4]
Bellegarda, G., Shafiee, M., Ijspeert, A.: Visual CPG-RL: Learning Central Pattern Generators for Visually-Guided Quadruped Locomotion (Mar 2024). https://doi. org/10.48550/arXiv.2212.14400, arXiv:2212.14400 [cs]
-
[5]
Bhattasali, N.X., Pattabiraman, V., Pinto, L., Lindsay, G.W.: Neural Circuit Architectural Priors for Quadruped Locomotion (Oct 2024). https://doi.org/10. 48550/arXiv.2410.07174, arXiv:2410.07174 [q-bio]
-
[6]
https://doi.org/10.48550/arXiv.2102.12891, arXiv:2102.12891 [cs]
Campanaro, L., Gangapurwala, S., Martini, D.D., Merkt, W., Havoutis, I.: CPG- ACTOR: Reinforcement Learning for Central Pattern Generators (Feb 2021). https://doi.org/10.48550/arXiv.2102.12891, arXiv:2102.12891 [cs]
-
[7]
Proceedings of the Artificial Life Conference 2016 (ALIFE XV) pp
Cheney, N., Bongard, J., Sunspiral, V., Lipson, H.: On the Difficulty of Co- Optimizing Morphology and Control in Evolved Virtual Creatures. Proceedings of the Artificial Life Conference 2016 (ALIFE XV) pp. 226–234 (2016). https: //doi.org/10.1162/978-0-262-33936-0-ch042
-
[8]
Cheney, N., MacCurdy, R., Clune, J., Lipson, H.: Unshackling Evolution: Evolving Soft Robots with Multiple Materials and a Powerful Generative Encoding. Pro- ceeding of the Fifteenth Annual Conference on Genetic and Evolutionary Com- putation - GECCO ’13 p. 167 (2013). https://doi.org/10.1145/2463372.2463404, iSBN: 9781450319638
-
[9]
In: Parallel Problem Solving from Nature – PPSN XVIII, vol
van Diggelen, F., De Carlo, M., Cambier, N., Ferrante, E., Eiben, A.E.: Emergence of specialized Collective Behaviors in Evolving Heterogeneous Swarms. In: Parallel Problem Solving from Nature – PPSN XVIII, vol. 1 (Feb 2024). https://doi.org/ 10.1007/978-3-031-70068-2_4, arXiv: 2402.04763
-
[10]
In: Theory and Practice of Natural Computing, vol
Eiben, A.E.: EvoSphere: the World of Robot Evolution. In: Theory and Practice of Natural Computing, vol. 9477, pp. 3–19. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-26841-5_1, series Title: Lecture Notes in Computer Science
-
[11]
https://doi.org/10.5281/zenodo.19633625
Godin-Dubois, K.: Dataset on the benefits of low-cost bio- inspiration in the age of overparametrization (Apr 2026). https://doi.org/10.5281/zenodo.19633625
-
[12]
Artificial Life29(1), 66–93 (2023)
Godin-Dubois, K., Cussat-Blanc, S., Duthen, Y.: Explaining the Neuroevolution of Fighting Creatures Through Virtual fMRI. Artificial Life29(1), 66–93 (2023). https://doi.org/10.1162/artl_a_00389, tex.type: journal
-
[13]
Godin-Dubois, K., Cussat-Blanc, S., Duthen, Y.: Specialization or Generalization: Investigating NeuroEvolutionary Choices via Virtual fMRI. MIT Press (Jul 2024). https://doi.org/10.1162/isal_a_00817, tex.type: conference
-
[14]
Journal of Open Source Software (2025)
Godin-Dubois, K., Miras, K., Kononova, A.V.: AMaze: a benchmark generator for sighted maze-navigating agents. Journal of Open Source Software (2025). https: //doi.org/10.21105/joss.07208, tex.type: journal 16 K. Godin-Dubois et al
-
[15]
In: Proceedings of IEEE international conference on evolutionary computation
Hansen, N., Ostermeier, A.: Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation. In: Proceedings of IEEE international conference on evolutionary computation. pp. 312–317 (1996). https: //doi.org/10.1109/ICEC.1996.542381
-
[16]
2019, CMA-ES/pycma on Github, Zenodo, DOI:10.5281/zenodo.2559634, doi: 10.5281/zenodo.2559634
Hansen, N., Akimoto, Y., Baudis, P.: CMA-ES/pycma on Github (Feb 2019). https://doi.org/10.5281/zenodo.2559634, tex.howpublished: Zenodo, DOI:10.5281/zenodo.2559634
-
[17]
Computers and Graphics (Pergamon)25(6), 1041–1048 (2001)
Hornby, G.S., Pollack, J.B.: Evolving L-systems to generate virtual creatures. Computers and Graphics (Pergamon)25(6), 1041–1048 (2001). https://doi.org/ 10.1016/S0097-8493(01)00157-1, arXiv: 1011.1669v3 ISBN: 0097-8493
-
[18]
Neural Networks21(4), 642–653 (May 2008)
Ijspeert, A.J.: Central pattern generators for locomotion control in animals and robots: A review. Neural Networks21(4), 642–653 (May 2008). https://doi.org/ 10.1016/j.neunet.2008.03.014
-
[19]
Adaptive Behavior7(2), 151– 172 (Mar 1999)
Ijspeert, A.J., Hallam, J., Willshaw, D.: Evolving Swimming Controllers for a Sim- ulated Lamprey with Inspiration from Neurobiology. Adaptive Behavior7(2), 151– 172 (Mar 1999). https://doi.org/10.1177/105971239900700202
-
[20]
Artificial Life23(2), 206–235 (May 2017)
Jelisavcic, M., de Carlo, M., Hupkes, E., Eustratiadis, P., Orlowski, J., Haasdijk, E., Auerbach, J.E., Eiben, A.E.: Real-World Evolution of Robot Morphologies: A Proof of Concept. Artificial Life23(2), 206–235 (May 2017). https://doi.org/10. 1162/ARTL_a_00231
2017
-
[21]
Frontiers in Robotics and AI , author =
Jelisavcic, M., Glette, K., Haasdijk, E., Eiben, A.E.: Lamarckian Evolution of Simulated Modular Robots. Frontiers in Robotics and AI6, 9 (Feb 2019). https: //doi.org/10.3389/frobt.2019.00009
-
[22]
Information Sciences298, 468–490 (Mar 2015)
Kononova, A.V., Corne, D.W., De Wilde, P., Shneer, V., Caraffini, F.: Struc- tural bias in population-based algorithms. Information Sciences298, 468–490 (Mar 2015). https://doi.org/10.1016/j.ins.2014.11.035
-
[23]
Neurocomputing452, 294–306 (Sep 2021)
Lan, G., Van Hooft, M., De Carlo, M., Tomczak, J.M., Eiben, A.: Learning lo- comotion skills in evolvable robots. Neurocomputing452, 294–306 (Sep 2021). https://doi.org/10.1016/j.neucom.2021.03.030
-
[24]
IEEE Transactions on Robotics39(5), 3382– 3401 (Oct 2023)
Liu, X., Onal, C., Fu, J.: Reinforcement Learning of CPG-regulated Locomotion Controller for a Soft Snake Robot. IEEE Transactions on Robotics39(5), 3382– 3401 (Oct 2023). https://doi.org/10.1109/TRO.2023.3286046, arXiv:2207.04899 [cs]
-
[25]
Frontiers in Robotics and AI9, 797393 (May 2022)
Luo, J., Stuurman, A.C., Tomczak, J.M., Ellers, J., Eiben, A.E.: The Effects of Learning in Morphologically Evolving Robot Systems. Frontiers in Robotics and AI9, 797393 (May 2022). https://doi.org/10.3389/frobt.2022.797393, arXiv: 2111.09851
-
[26]
https://doi.org/10.48550/arXiv.2309.13908, arXiv:2309.13908 [cs]
Luo, J., Tomczak, J., Miras, K., Eiben, A.E.: A comparison of controller archi- tectures and learning mechanisms for arbitrary robot morphologies (Sep 2023). https://doi.org/10.48550/arXiv.2309.13908, arXiv:2309.13908 [cs]
-
[27]
Miras,K.,Haasdijk,E.,Glette,K.,Eiben,A.E.:SearchSpaceAnalysisofEvolvable RobotMorphologies.In:Sim,K.,Kaufmann,P.(eds.)ApplicationsofEvolutionary Computation, vol. 10784, pp. 703–718. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-319-77538-8_47, series Title: Lecture Notes in Computer Science
-
[28]
https://doi.org/10.48550/ arXiv.2511.13224, arXiv:2511.13224 [astro-ph]
Mohan, D., Scaife, A.M.M.: Natural gradient descent for improving variational in- ference based classification of radio galaxies (Nov 2025). https://doi.org/10.48550/ arXiv.2511.13224, arXiv:2511.13224 [astro-ph]
-
[29]
OpenAI, :, Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Józefowicz, R., Gray, S., Olsson, Benefits of Low-Cost Bio-Inspiration 17 C., Pachocki, J., Petrov, M., Pinto, H.P.d.O., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., ...
work page internal anchor Pith review arXiv 2019
-
[30]
https://doi.org/10.48550/arXiv.2507.12224, arXiv:2507.12224 [cs]
Pascanu, R., Lyle, C., Modoranu, I.V., Borras, N.E., Alistarh, D., Velickovic, P., Chandar, S., De, S., Martens, J.: Optimizers Qualitatively Alter Solutions And We Should Leverage This (Jul 2025). https://doi.org/10.48550/arXiv.2507.12224, arXiv:2507.12224 [cs]
-
[31]
Journal of Machine Learning Research22(268), 1–8 (2021), http: //jmlr.org/papers/v22/20-1364.html
Raffin Antonin, Hill Ashley, Gleave Adam, Kanervisto Anssi, Ernestus Maximil- ian, Dormann Noah: Stable-Baselines3: Reliable Reinforcement Learning Imple- mentations. Journal of Machine Learning Research22(268), 1–8 (2021), http: //jmlr.org/papers/v22/20-1364.html
2021
-
[32]
High-Dimensional Continuous Control Using Generalized Advantage Estimation
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-Dimensional Continuous Control Using Generalized Advantage Estimation (Oct 2018). https: //doi.org/10.48550/arXiv.1506.02438, arXiv:1506.02438 [cs]
work page internal anchor Pith review doi:10.48550/arxiv.1506.02438 2018
-
[33]
Proximal Policy Optimization Algorithms
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms pp. 1–12 (Jul 2017), http://arxiv.org/abs/1707.06347, arXiv: 1707.06347
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[34]
Designing neural networks through neuroevolu- tion.Nature Machine Intelligence, 1(1):24–35, 2019
Stanley, K.O., Clune, J., Lehman, J., Miikkulainen, R.: Designing neural networks through neuroevolution. Nature Machine Intelligence1(1), 24–35 (2019). https: //doi.org/10.1038/s42256-018-0006-z
-
[35]
A framework for few-shot language model evaluation
Stuurman, A., Weissl, O., Chiang, T.C., AndresG, Zeeuwe, D., Godin-Dubois, K., Roy: ci-group/revolve2: 1.2.3 (Nov 2024). https://doi.org/10.5281/ZENODO. 14143431, tex.type: software
-
[36]
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 5026–5033. IEEE (Oct 2012). https://doi.org/10.1109/IROS.2012. 6386109
-
[37]
Counter-strike deathmatch with large-scale behavioural cloning
Tomilin, T., Dai, T., Fang, M., Pechenizkiy, M.: LevDoom: A Benchmark for Gen- eralization on Level Difficulty in Reinforcement Learning. In: 2022 IEEE Con- ference on Games (CoG). pp. 72–79. IEEE (Aug 2022). https://doi.org/10.1109/ CoG51982.2022.9893707
-
[38]
Towers, M., Kwiatkowski, A., Terry, J., Balis, J.U., De Cola, G., Deleu, T., Goulão, M., Kallinteris, A., Krimmel, M., KG, A., Perez-Vicente, R., Pierré, A., Schulhoff, S., Tai, J.J., Tan, H., Younis, O.G.: Gymnasium: A Standard Interface for Re- inforcement Learning Environments (Jul 2024), http://arxiv.org/abs/2407.17032, arXiv:2407.17032 [cs]
work page internal anchor Pith review arXiv 2024
-
[39]
Deepgait: Planning and control of quadrupedal gaits using deep reinforcement learning,
Tsounis, V., Alge, M., Lee, J., Farshidian, F., Hutter, M.: DeepGait: Planning and Control of Quadrupedal Gaits using Deep Reinforcement Learning (2019). https://doi.org/10.48550/ARXIV.1909.08399, version Number: 2
-
[40]
In: Proceedings of the Genetic and Evolu- tionary Computation Conference Companion
Van Diggelen, F., Ferrante, E., Eiben, A.E.: Comparing lifetime learning methods for morphologically evolving robots. In: Proceedings of the Genetic and Evolu- tionary Computation Conference Companion. pp. 93–94. ACM, Lille France (Jul 2021). https://doi.org/10.1145/3449726.3459530
-
[41]
In: Proceedings of the Genetic and Evolutionary Computation Confer- ence Companion
Veenstra, F., Hart, E., Buchanan, E., Li, W., De Carlo, M., Eiben, A.E.: Com- paring encodings for performance and phenotypic exploration in evolving modular robots. In: Proceedings of the Genetic and Evolutionary Computation Confer- ence Companion. pp. 127–128. ACM, Prague Czech Republic (Jul 2019). https: //doi.org/10.1145/3319619.3322054 18 K. Godin-Du...
-
[42]
International Journal of Advanced Robotic Systems14(4), 172988141772344 (Jul 2017)
Wang, G., Chen, X., Han, S.K.: Central pattern generator and feedforward neu- ral network-based self-adaptive gait control for a crab-like robot locomoting on complex terrain under two reflex mechanisms. International Journal of Advanced Robotic Systems14(4), 172988141772344 (Jul 2017). https://doi.org/10.1177/ 1729881417723440
2017
-
[43]
Scientific Reports15(1), 11262 (Apr 2025)
Watanabe,T.,Kubo,A.,Tsunoda,K.,Matsuba,T.,Akatsuka,S.,Noda,Y.,Kioka, H., Izawa, J., Ishii, S., Nakamura, Y.: Hierarchical reinforcement learning with central pattern generator for enabling a quadruped robot simulator to walk on a variety of terrains. Scientific Reports15(1), 11262 (Apr 2025). https://doi.org/10. 1038/s41598-025-94163-2
2025
-
[44]
https://doi.org/10.48550/arXiv.2402.06912, arXiv:2402.06912 [cs]
Wong, A., Nobel, J.d., Bäck, T., Plaat, A., Kononova, A.V.: Solving Deep Rein- forcement Learning Tasks with Evolution Strategies and Linear Policy Networks (Jul 2024). https://doi.org/10.48550/arXiv.2402.06912, arXiv:2402.06912 [cs]
-
[45]
Understanding deep learning requires rethinking generalization
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization (Feb 2017). https://doi.org/10.48550/ arXiv.1611.03530, arXiv:1611.03530 [cs]
work page internal anchor Pith review arXiv 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.