Evolvability ES: Scalable and Direct Optimization of Evolvability
Pith reviewed 2026-05-24 21:54 UTC · model grok-4.3
The pith
Evolvability ES derives an objective that directly maximizes behavioral diversity under random mutations to optimize for evolvability.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Evolvability ES derives a novel objective in the spirit of natural evolution strategies that maximizes the diversity of behaviors exhibited when an individual is subject to random mutations, and that efficiently scales with computation. Experiments in 2-D and 3-D locomotion tasks highlight the potential of evolvability ES to generate solutions with tens of thousands of parameters that can quickly be adapted to solve different tasks and that can productively seed further evolution. Results also show that evolvability ES can perform competitively with MAML while discovering solutions with distinct properties.
What carries the argument
The evolvability objective, derived from natural evolution strategies, that rewards diversity of behaviors produced by random mutations of an individual.
If this is right
- Solutions with tens of thousands of parameters adapt quickly to different locomotion tasks.
- Optimized individuals can productively initialize further evolutionary runs.
- The method scales computationally to deep networks.
- Performance matches gradient-based meta-learning while producing solutions with different properties.
Where Pith is reading between the lines
- The same mutation-diversity objective could be tested on control problems outside locomotion to check whether the adaptation benefit generalizes.
- Hybrid algorithms that combine the evolutionary objective with gradient updates might improve sample efficiency in meta-learning settings.
- Representations found this way may reduce the need for task-specific retraining when environments change.
Load-bearing premise
Maximizing behavioral diversity under random mutations serves as a valid and sufficient proxy for evolvability that enables fast adaptation rather than merely varied but non-adaptive behaviors.
What would settle it
An experiment in which solutions produced by evolvability ES adapt no faster than those from standard evolutionary strategies when transferred to new locomotion tasks.
Figures
read the original abstract
Designing evolutionary algorithms capable of uncovering highly evolvable representations is an open challenge; such evolvability is important because it accelerates evolution and enables fast adaptation to changing circumstances. This paper introduces evolvability ES, an evolutionary algorithm designed to explicitly and efficiently optimize for evolvability, i.e. the ability to further adapt. The insight is that it is possible to derive a novel objective in the spirit of natural evolution strategies that maximizes the diversity of behaviors exhibited when an individual is subject to random mutations, and that efficiently scales with computation. Experiments in 2-D and 3-D locomotion tasks highlight the potential of evolvability ES to generate solutions with tens of thousands of parameters that can quickly be adapted to solve different tasks and that can productively seed further evolution. We further highlight a connection between evolvability and a recent and popular gradient-based meta-learning algorithm called MAML; results show that evolvability ES can perform competitively with MAML and that it discovers solutions with distinct properties. The conclusion is that evolvability ES opens up novel research directions for studying and exploiting the potential of evolvable representations for deep neural networks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Evolvability ES, an evolutionary algorithm that derives a novel objective in the style of natural evolution strategies to directly maximize the behavioral diversity (variance) exhibited by an individual under random mutations. Experiments on 2-D and 3-D locomotion tasks with neural networks of tens of thousands of parameters show that the resulting solutions adapt rapidly to new tasks, productively seed further evolution, and perform competitively with MAML while exhibiting distinct properties.
Significance. If the central claim holds, the work supplies a computationally scalable, direct method for optimizing evolvability in high-dimensional neural network parameter spaces—an open challenge in evolutionary computation. The explicit link to MAML and the empirical results on complex locomotion domains are strengths that could influence representation design for adaptability in RL and evolutionary algorithms.
major comments (2)
- [Experiments (locomotion tasks)] Locomotion experiments (results section): the reported post-mutation adaptation performance is shown, yet the manuscript contains no ablation that removes or disables the behavioral-diversity term while retaining the base ES dynamics and initialization; without this control it remains unclear whether the observed adaptation gains are attributable to the evolvability objective rather than other algorithmic factors.
- [Method (objective derivation)] Objective derivation (method section): the claim that maximizing behavioral variance under mutations yields representations whose variation is useful for task adaptation (rather than unstructured or orthogonal noise) is load-bearing for the central thesis, but the paper provides no formal analysis, gradient-alignment test, or control experiment demonstrating that the induced diversity lies along task-relevant directions.
minor comments (2)
- [Experiments] The precise network architectures, mutation variances, and population sizes used in the 3-D experiments could be stated explicitly in the main text rather than deferred entirely to supplementary material.
- [Method] Notation for the evolvability objective (e.g., the precise definition of behavioral variance) is introduced clearly but could be cross-referenced more explicitly when results are discussed.
Simulated Author's Rebuttal
We appreciate the referee's detailed review and the opportunity to clarify and strengthen our work on Evolvability ES. Below we respond to each major comment, agreeing where revisions are needed to address valid concerns about controls and analysis.
read point-by-point responses
-
Referee: [Experiments (locomotion tasks)] Locomotion experiments (results section): the reported post-mutation adaptation performance is shown, yet the manuscript contains no ablation that removes or disables the behavioral-diversity term while retaining the base ES dynamics and initialization; without this control it remains unclear whether the observed adaptation gains are attributable to the evolvability objective rather than other algorithmic factors.
Authors: We agree that the absence of this ablation leaves open the possibility that other factors contribute to the observed adaptation. To address this, the revised manuscript will include an ablation study comparing Evolvability ES to a baseline that retains the ES dynamics and initialization but removes the behavioral diversity objective. This will isolate the contribution of the evolvability term. revision: yes
-
Referee: [Method (objective derivation)] Objective derivation (method section): the claim that maximizing behavioral variance under mutations yields representations whose variation is useful for task adaptation (rather than unstructured or orthogonal noise) is load-bearing for the central thesis, but the paper provides no formal analysis, gradient-alignment test, or control experiment demonstrating that the induced diversity lies along task-relevant directions.
Authors: While the empirical results on task adaptation and the comparison to MAML provide evidence that the induced variations are useful, we acknowledge the lack of formal analysis or specific controls for task-relevance of the diversity. In the revision, we will incorporate a gradient alignment test or additional control experiment to demonstrate that the behavioral diversity aligns with directions that improve task performance. revision: yes
Circularity Check
Derivation of evolvability objective is self-contained with no reductions to inputs
full rationale
The paper presents the evolvability ES objective as a novel derivation in the style of natural evolution strategies, explicitly maximizing behavioral diversity under random mutations. This is introduced as an independent insight without equations or definitions that reduce to fitted parameters or prior self-citations by construction. Comparisons to MAML are external benchmarks rather than load-bearing foundations. Experiments on locomotion tasks provide independent validation. No self-definitional, fitted-prediction, or uniqueness-imported patterns appear in the provided abstract or description.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San- jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Leven- berg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray,...
work page 2015
-
[2]
The evolution of evolvability in genetic programming
Lee Altenberg et al. The evolution of evolvability in genetic programming. Advances in genetic programming , 3:47–74, 1994
work page 1994
-
[3]
Evolution strategies: A comprehen- sive introduction
Hans-Georg Beyer and Hans-Paul Schwefel. Evolution strategies: A comprehen- sive introduction. Natural Computing, 1:3–52, 2002
work page 2002
-
[4]
Evolution: The evolvability enigma
J.F.Y Brookfield. Evolution: The evolvability enigma. Current Biology, 11(3):R106 – R108, 2001. ISSN 0960-9822. doi: DOI:10.1016/S0960-9822(01)00041-0
-
[5]
Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari
Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. Back to basics: Bench- marking canonical evolution strategies for playing atari. arXiv preprint arXiv:1802.08842, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
The evolutionary origins of modularity
Jeff Clune, Jean-Baptiste Mouret, and Hod Lipson. The evolutionary origins of modularity. Proc. R. Soc. B , 280(1755):20122863, 2013
work page 2013
-
[7]
Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth Stanley, and Jeff Clune. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In Advances in Neural Information Processing Systems 31 . 2018
work page 2018
-
[8]
Pybullet, a python module for physics simulation for games, robotics and machine learning
E Coumans and Y Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning. GitHub repository, 2016
work page 2016
-
[9]
Giuseppe Cuccu and Faustino Gomez. When novelty is not enough. In Euro- pean Conference on the Applications of Evolutionary Computation , pages 234–243. Springer, 2011
work page 2011
- [10]
-
[11]
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[12]
Evolution of plastic neurocontrollers for situated agents
Dario Floreano and Francesco Mondada. Evolution of plastic neurocontrollers for situated agents. In Proc. of The Fourth International Conference on Simulation of Adaptive Behavior (SAB), From Animals to Animats . ETH Zürich, 1996
work page 1996
-
[13]
Noisy networks for exploration
Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Rémi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, and Shane Legg. Noisy networks for exploration. CoRR, abs/1706.10295, 2017. URL http://arxiv.org/abs/1706.10295
-
[14]
Michael C. Fu. Chapter 19 gradient estimation. Simulation Handbooks in Opera- tions Research and Management Science , 13:575, 2006. doi: 10.1016/s0927-0507(06) 13019-4
-
[15]
J.J. Grefenstette. Evolvability in dynamic fitness landscapes: A genetic algorithm approach. In Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on, volume 3. IEEE, 2002. ISBN 0780355369
work page 1999
-
[16]
Learning to learn using gradient descent
Sepp Hochreiter, A Steven Younger, and Peter R Conwell. Learning to learn using gradient descent. In International Conference on Artificial Neural Networks , pages 87–94. Springer, 2001
work page 2001
-
[17]
Varying environments can speed up evolution
Nadav Kashtan, Elad Noor, and Uri Alon. Varying environments can speed up evolution. Proceedings of the National Academy of Sciences , 104(34):13711–13716, 2007
work page 2007
-
[18]
Marc Kirschner and John Gerhart. Evolvability. Proceedings of the National Academy of Sciences, 95(15):8420–8427, 1998
work page 1998
-
[19]
Loizos Kounios, Jeff Clune, Kostas Kouvaris, Günter P Wagner, Mihaela Pavlicev, Daniel M Weinreich, and Richard A Watson. Resolving the paradox of evolvability with learning theory: How evolution learns to improve evolvability on rugged fitness landscapes. arXiv preprint arXiv:1612.05955, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[20]
Abandoning objectives: Evolution through the search for novelty alone
Joel Lehman and Kenneth O Stanley. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation, 19(2):189–223, 2011
work page 2011
-
[21]
Improving evolvability through novelty search and self-adaptation
Joel Lehman and Kenneth O Stanley. Improving evolvability through novelty search and self-adaptation. In IEEE Congress on Evolutionary Computation , pages 2693–2700, 2011
work page 2011
-
[22]
Evolving a diversity of virtual creatures through novelty search and local competition
Joel Lehman and Kenneth O Stanley. Evolving a diversity of virtual creatures through novelty search and local competition. In Proceedings of the 13th annual conference on Genetic and evolutionary computation , pages 211–218. ACM, 2011
work page 2011
-
[23]
Evolvability is inevitable: Increasing evolv- ability without the pressure to adapt
Joel Lehman and Kenneth O Stanley. Evolvability is inevitable: Increasing evolv- ability without the pressure to adapt. PloS one, 8(4):e62186, 2013
work page 2013
-
[24]
On the potential benefits of knowing everything
Joel Lehman and Kenneth O Stanley. On the potential benefits of knowing everything. In Artificial Life Conference Proceedings, pages 558–565. MIT Press, 2018
work page 2018
-
[25]
Geoffrey J. McLachlan, Sharon X. Lee, and Suren I. Rathnayake. Finite mix- ture models. Annual Review of Statistics and Its Application , 6(1):355–378, 2019. doi: 10.1146/annurev-statistics-031017-100325. URL https://doi.org/10.1146/ annurev-statistics-031017-100325
-
[26]
Henok Mengistu, Joel Lehman, and Jeff Clune. Evolvability search. Proceedings of the 2016 on Genetic and Evolutionary Computation Conference - GECCO 16 , 2016. doi: 10.1145/2908812.2908838
-
[27]
Differentiable plasticity: training plastic neural networks with backpropagation
Thomas Miconi, Jeff Clune, and Kenneth O Stanley. Differentiable plastic- ity: training plastic neural networks with backpropagation. arXiv preprint arXiv:1804.02464, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[28]
Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015
work page 2015
-
[29]
Innovation engines: Automated creativity and improved stochastic optimization via deep learning
Anh Mai Nguyen, Jason Yosinski, and Jeff Clune. Innovation engines: Automated creativity and improved stochastic optimization via deep learning. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation , pages 959–966. ACM, 2015
work page 2015
-
[30]
Automatic differentiation in PyTorch
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. In NIPS-W, 2017
work page 2017
- [31]
-
[32]
Quality diversity: A new frontier for evolutionary computation
Justin K Pugh, Lisa B Soros, and Kenneth O Stanley. Quality diversity: A new frontier for evolutionary computation. Frontiers in Robotics and AI , 3:40, 2016
work page 2016
-
[33]
Stanley, and Risto Miikkulainen
Joseph Reisinger, Kenneth O. Stanley, and Risto Miikkulainen. Towards an empirical measure of evolvability. In Genetic and Evolutionary Computation Conference (GECCO2005) Workshop Program, pages 257–264, Washington, D.C.,
-
[34]
URL http://nn.cs.utexas.edu/?reisinger:gecco05
ACM Press. URL http://nn.cs.utexas.edu/?reisinger:gecco05
-
[35]
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[36]
Gradient Estimation Using Stochastic Computation Graphs
John Schulman, Nicolas Heess, Theophane Weber, and Pieter Abbeel. Gradient estimation using stochastic computation graphs. CoRR, abs/1506.05254, 2015. URL http://arxiv.org/abs/1506.05254
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[37]
Evolution and optimum seeking: the sixth generation
Hans-Paul Paul Schwefel. Evolution and optimum seeking: the sixth generation . John Wiley & Sons, Inc., 1993
work page 1993
-
[38]
Evolutionary advantages of neuromodulated plasticity in dynamic, reward-based scenarios
Andrea Soltoggio, John A Bullinaria, Claudio Mattiussi, Peter Dürr, and Dario Floreano. Evolutionary advantages of neuromodulated plasticity in dynamic, reward-based scenarios. In Proceedings of the 11th international conference on artificial life (Alife XI), number LIS-CONF-2008-012, pages 569–576. MIT Press, 2008
work page 2008
-
[39]
Stanley and Risto Miikkulainen
Kenneth O. Stanley and Risto Miikkulainen. Evolving neural networks through augmenting topologies. Evolutionary Computation, 10:99–127, 2002
work page 2002
-
[40]
Evolving adaptive neural networks with and without adaptive synapses
Kenneth O Stanley, Bobby D Bryant, and Risto Miikkulainen. Evolving adaptive neural networks with and without adaptive synapses. In Evolutionary Compu- tation, 2003. CEC’03. The 2003 Congress on , volume 4, pages 2557–2564. IEEE, 2003
work page 2003
-
[41]
Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Ken- neth O Stanley, and Jeff Clune. Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[42]
E. Todorov, T. Erez, and Y. Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033, Oct 2012. doi: 10.1109/IROS.2012.6386109
-
[43]
A perspective view and survey of meta- learning
Ricardo Vilalta and Youssef Drissi. A perspective view and survey of meta- learning. Artificial Intelligence Review, 18(2):77–95, 2002
work page 2002
-
[44]
Perspective: complex adaptations and the evolution of evolvability
Günter P Wagner and Lee Altenberg. Perspective: complex adaptations and the evolution of evolvability. Evolution, 50(3):967–976, 1996
work page 1996
-
[45]
Learning to reinforcement learn
Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[46]
Natural evolution strategies, 2011
Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, and JÃijrgen Schmid- huber. Natural evolution strategies, 2011
work page 2011
-
[47]
nested stochastic computation graphs,
Bryan Wilder and Kenneth Stanley. Reconciling explanations for the evolution of evolvability. Adaptive Behavior, 23(3):171–179, 2015. GECCO ’19, July 13–17, 2019, Prague, Czech Republic Alexander Gajewski, Jeff Clune, Kenneth O. Stanley, and Joel Lehman 0 20 40 60 80 100 Generation 0 10 20 30 40 50Max Final Distance Standard MaxVar MaxEnt Figure S1: 2-D l...
work page 2015
-
[48]
+ Õ j f2(xi 2)L(xi 2;θ) L(xi 1;θ, x0). (18) While this may not seem like very much of an improvement at first, it is insightful to note how similar the forms of Equations 10 z f θ Ez [f ] Figure S9: Nested stochastic computation graph represent- ing Natural Evolution Strategies. z B (·) 2 θ Ez [(·) 2] Figure S10: Nested stochastic computation graph repres...
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.