Quantitative Performance Analysis of Stopping Criteria for CMA-ES
Pith reviewed 2026-06-27 14:27 UTC · model grok-4.3
The pith
tolfunhist and the full portfolio of stopping criteria deliver the highest accuracy for halting CMA-ES on BBOB functions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that tolfunflatfitness and tolfun are frequently the first criteria triggered among the eleven, that tolfunhist and the full portfolio achieve the highest stopping accuracy in most cases, and that tolfun and tolfunhist are often activated before CMA-ES reaches complete stagnation.
What carries the argument
The portfolio of eleven stopping criteria inside CMA-ES, scored against an optimal stopping point defined by number of function evaluations on BBOB.
If this is right
- tolfunhist alone matches the optimal stopping point more closely than most single criteria.
- The combined portfolio improves accuracy over many individual criteria.
- tolfun and tolfunhist frequently halt the search before complete stagnation occurs.
- Which criterion fires first changes with population size lambda and dimension n.
- Stopping accuracy is evaluated on the noiseless BBOB set using function-evaluation counts.
Where Pith is reading between the lines
- Restart strategies that rely on these criteria may save evaluations more reliably when tolfunhist is emphasized.
- The same criteria could behave differently on noisy or constrained problems not covered by the current BBOB tests.
- Alternative reference points, such as target precision or gradient norms, might change which criterion appears best.
Load-bearing premise
An optimal stopping point defined by the number of function evaluations in one CMA-ES run provides a stable reference for judging criterion quality across the BBOB suite.
What would settle it
Measure the actual objective value and convergence state at the moment each criterion fires, then compare those states directly to the state at the optimal evaluation count on the same BBOB runs.
Figures
read the original abstract
Covariance matrix adaptation evolution strategy (CMA-ES) is a state-of-the-art black-box optimization algorithm. In general, CMA-ES uses a portfolio of multiple stopping criteria to automatically determine when to stop the search. This mechanism aims to avoid unnecessary consumption of the function evaluation budget during stagnation. Stopping criteria play an important role in CMA-ES, particularly when restart strategies are employed. However, the effectiveness of stopping criteria in CMA-ES remains poorly understood. To address this issue, this paper investigates how the 11 stopping criteria in CMA-ES behave on the noiseless BBOB function set. The performance of the stopping criteria is quantitatively evaluated based on the optimal stopping point in terms of the number of function evaluations in a single run of CMA-ES. Our results show that, although which stopping criterion is triggered first depends significantly on the sample size $\lambda$ and the dimension $n$, \texttt{tolflatfitness} and \texttt{tolfun} are frequently the first criteria to be triggered among the portfolio of 11 stopping criteria. We also demonstrate that \texttt{tolfunhist} and the portfolio achieve the highest stopping accuracy in most cases. In addition, our results show that the \texttt{tolfun} and \texttt{tolfunhist} criteria are frequently triggered before CMA-ES reaches complete stagnation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript empirically analyzes the behavior of 11 stopping criteria in the CMA-ES algorithm on the noiseless BBOB test suite. It reports that the order in which criteria are triggered first depends on the population size λ and dimension n, with tolfunflatfitness and tolfun often triggered first. The study finds that tolfunhist and the full portfolio of criteria achieve the highest stopping accuracy relative to an optimal stopping point defined by function evaluations in a single CMA-ES run, and that tolfun and tolfunhist are frequently triggered before complete stagnation.
Significance. Should the evaluation methodology prove robust to stochastic variation, this work provides quantitative empirical guidance on the relative effectiveness of individual stopping criteria versus portfolios in CMA-ES. Such data is useful for designing restart strategies that balance convergence detection against unnecessary function evaluations on standard black-box benchmarks.
major comments (2)
- [Abstract] Abstract: The accuracy metric judges each stopping criterion by proximity of its trigger time to an 'optimal stopping point' defined via the number of function evaluations in a single CMA-ES run. CMA-ES is stochastic, so the evaluation count at which a given target quality is first reached varies across independent runs on the same BBOB instance. The manuscript supplies no evidence that this reference was averaged over restarts, accompanied by variance estimates, or replaced by a run-independent proxy (e.g., fixed target precision). This assumption is load-bearing for the claim that tolfunhist and the portfolio achieve the highest stopping accuracy in most cases.
- [Abstract] Abstract: The statement that tolfunhist and the portfolio achieve highest accuracy 'in most cases' is not accompanied by a precise quantification (fraction of functions, dimensions, or λ values) or by any indication that accuracy differences were assessed with statistical tests across the BBOB suite.
Simulated Author's Rebuttal
We thank the referee for the constructive comments regarding the evaluation of stopping criteria in CMA-ES. We address each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: The accuracy metric judges each stopping criterion by proximity of its trigger time to an 'optimal stopping point' defined via the number of function evaluations in a single CMA-ES run. CMA-ES is stochastic, so the evaluation count at which a given target quality is first reached varies across independent runs on the same BBOB instance. The manuscript supplies no evidence that this reference was averaged over restarts, accompanied by variance estimates, or replaced by a run-independent proxy (e.g., fixed target precision). This assumption is load-bearing for the claim that tolfunhist and the portfolio achieve the highest stopping accuracy in most cases.
Authors: The optimal stopping point is explicitly defined per individual CMA-ES run as the evaluation count at which the target precision is first reached within that same run. This intra-run construction ensures direct comparability between each stopping criterion's trigger and the reference point under identical stochastic conditions, without cross-run averaging. We will revise the manuscript to state this definition more explicitly in the abstract and methods, and we will add a brief discussion of run-to-run variability in the optimal point for a subset of instances to address robustness concerns. revision: partial
-
Referee: [Abstract] Abstract: The statement that tolfunhist and the portfolio achieve highest accuracy 'in most cases' is not accompanied by a precise quantification (fraction of functions, dimensions, or λ values) or by any indication that accuracy differences were assessed with statistical tests across the BBOB suite.
Authors: We agree that the abstract would be strengthened by quantification. The underlying results already tabulate accuracy for every BBOB function, dimension, and λ setting; from these data, tolfunhist and the portfolio are highest in the majority of configurations. In revision we will replace 'in most cases' with a specific fraction (derived from the existing tables) and note the consistency of the ranking across the test suite. Formal statistical tests were not applied in the original study but can be added as a supplementary note if space allows. revision: yes
Circularity Check
No circularity; direct empirical counts on external BBOB benchmarks
full rationale
The paper conducts a quantitative empirical study that counts trigger frequencies and accuracies of 11 stopping criteria across BBOB functions, using a single-run reference point for 'optimal stopping'. No derivations, equations, fitted parameters, or self-citations appear in the provided text. The central claims rest on direct observation against fixed external benchmarks rather than any reduction to inputs by construction. This matches the default case of a self-contained empirical analysis.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Auger, A., Hansen, N.: Performance evaluation of an advanced local search evolutionary algorithm. In: Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2005, 2-4 September 2005, Edinburgh, UK. pp. 1777–1784. IEEE (2005).https://doi.org/10.1109/CEC.2005.1554903,https://doi.org/ 10.1109/CEC.2005.1554903
-
[2]
Auger, A., Hansen, N.: A restart CMA evolution strategy with increasing popu- lation size. In: Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2005, 2-4 September 2005, Edinburgh, UK. pp. 1769–1776. IEEE (2005). https://doi.org/10.1109/CEC.2005.1554902,https://doi.org/10.1109/CEC. 2005.1554902
-
[3]
Cuccu, G., Gomez, F.J., Glasmachers, T.: Novelty-based restarts for evolu- tion strategies. In: Proceedings of the IEEE Congress on Evolutionary Com- putation, CEC 2011, New Orleans, LA, USA, 5-8 June, 2011. pp. 158–163. IEEE (2011).https://doi.org/10.1109/CEC.2011.5949613,https://doi.org/ 10.1109/CEC.2011.5949613
-
[4]
In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H
Fukunaga, A.S.: Restart scheduling for genetic algorithms. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H. (eds.) Parallel Problem Solving from Nature - PPSN V, 5th International Conference, Amsterdam, The Netherlands, September 27-30, 1998, Proceedings. Lecture Notes in Computer Science, vol. 1498, pp. 357–
1998
-
[5]
Springer (1998).https://doi.org/10.1007/BFB0056878,https://doi.org/ 10.1007/BFb0056878
-
[6]
Hansen, N.: Benchmarking a bi-population CMA-ES on the BBOB-2009 function testbed. In: Rothlauf, F. (ed.) Genetic and Evolutionary Computation Conference, GECCO 2009, Proceedings, Montreal, Québec, Canada, July 8-12, 2009, Compan- ion Material. pp. 2389–2396. ACM (2009).https://doi.org/10.1145/1570256. 1570333,https://doi.org/10.1145/1570256.1570333
-
[7]
CoRRabs/1604.00772 (2016),http://arxiv.org/abs/1604.00772
Hansen, N.: The CMA evolution strategy: A tutorial. CoRRabs/1604.00772 (2016),http://arxiv.org/abs/1604.00772
Pith/arXiv arXiv 2016
-
[8]
Hansen, N., Akimoto, Y., Baudis, P.: CMA-ES/pycma on Github. Zenodo, DOI:10.5281/zenodo.2559634 (2019)
-
[9]
Hansen, N., Auger, A., Ros, R., Mersmann, O., Tusar, T., Brockhoff, D.: COCO: a platform for comparing continuous optimizers in a black-box setting. Optim. Meth- ods Softw.36(1), 114–144 (2021).https://doi.org/10.1080/10556788.2020. 1808977,https://doi.org/10.1080/10556788.2020.1808977
-
[10]
Hansen, N., Finck, S., Ros, R., Auger, A.: Real-parameter black-box optimization benchmarking 2009: Noiseless functions definitions. Tech. rep., INRIA (2009)
2009
-
[11]
Hansen, N., Kern, S.: Evaluating the CMA evolution strategy on multimodal test functions. In: Yao, X., Burke, E.K., Lozano, J.A., Smith, J., Guervós, J.J.M., Bul- linaria, J.A., Rowe, J.E., Tiño, P., Kabán, A., Schwefel, H. (eds.) Parallel Problem Solving from Nature - PPSN VIII, 8th International Conference, Birmingham, UK, September 18-22, 2004, Proceed...
-
[12]
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolu- tion strategies. Evol. Comput.9(2), 159–195 (2001).https://doi.org/10.1162/ 106365601750190398,https://doi.org/10.1162/106365601750190398
-
[13]
Science275(5296), 51–54 (1997) 16 Ryoji Tanabe
Huberman, B., Lukose, R., Hogg, T.: An Economics Approach to Hard Computa- tional Problems. Science275(5296), 51–54 (1997) 16 Ryoji Tanabe
1997
-
[14]
Jastrebski, G.A., Arnold, D.V.: Improving evolution strategies through active co- variance matrix adaptation. In: IEEE International Conference on Evolutionary Computation, CEC 2006, part of WCCI 2006, Vancouver, BC, Canada, 16-21 July 2006. pp. 2814–2821. IEEE (2006).https://doi.org/10.1109/CEC.2006. 1688662,https://doi.org/10.1109/CEC.2006.1688662
-
[15]
In: Genetic and Evolutionary Computation Confer- ence, GECCO ’26
Kitamura, K., Tanabe, R.: Benchmarking stopping criteria for evolutionary multi-objective optimization. In: Genetic and Evolutionary Computation Confer- ence, GECCO ’26. ACM (2026 (in press)).https://doi.org/10.1145/3795095. 3805068,https://doi.org/10.1145/3795095.3805068
-
[16]
Liu, Y., Zhou, A., Zhang, H.: Termination detection strategies in evolutionary algorithms: a survey. In: GECCO. pp. 1063–1070 (2018).https://doi.org/10. 1145/3205455.3205466,https://doi.org/10.1145/3205455.3205466
-
[17]
Operations Research Perspectives3, 43–58 (2016).https://doi.org/10.1016/j.orp.2016
López-Ibáñez, M., Dubois-Lacoste, J., Cáceres, L.P., Birattari, M., Stützle, T.: The irace package: Iterated racing for automatic algorithm configuration. Operations Research Perspectives3, 43–58 (2016).https://doi.org/10.1016/j.orp.2016. 09.002,https://doi.org/10.1016/j.orp.2016.09.002
-
[18]
In: Coello, C.A.C., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M
López-Ibáñez, M., Liao, T., Stützle, T.: On the anytime behavior of IPOP-CMA- ES. In: Coello, C.A.C., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M. (eds.) Parallel Problem Solving from Nature - PPSN XII - 12th Interna- tional Conference, Taormina, Italy, September 1-5, 2012, Proceedings, Part I. Lecture Notes in Computer Science, vol. 7491, p...
-
[19]
In: Affen- zeller, M., Winkler, S.M., Kononova, A.V., Trautmann, H., Tusar, T., Machado, P., Bäck, T
Marty, T., Hansen, N., Auger, A., Semet, Y., Héron, S.: LB+IC-CMA-ES: two simple modifications of CMA-ES to handle mixed-integer problems. In: Affen- zeller, M., Winkler, S.M., Kononova, A.V., Trautmann, H., Tusar, T., Machado, P., Bäck, T. (eds.) Parallel Problem Solving from Nature - PPSN XVIII - 18th In- ternational Conference, PPSN 2024, Hagenberg, Au...
2024
-
[20]
Springer (2024).https://doi.org/10.1007/978-3-031-70068-2_18,https: //doi.org/10.1007/978-3-031-70068-2_18
-
[21]
In: Affenzeller, M., Winkler, S.M., Kononova, A.V., Trautmann, H., Tusar, T., Machado, P., Bäck, T
de Nobel, J., Vermetten, D., Kononova, A.V., Shir, O.M., Bäck, T.: Avoiding re- dundant restarts in multimodal global optimization. In: Affenzeller, M., Winkler, S.M., Kononova, A.V., Trautmann, H., Tusar, T., Machado, P., Bäck, T. (eds.) Parallel Problem Solving from Nature - PPSN XVIII - 18th International Confer- ence, PPSN 2024, Hagenberg, Austria, Se...
-
[22]
de Nobel, J., Wang, H., Bäck, T.: Explorative data analysis of time series based al- gorithm features of CMA-ES variants. In: Chicano, F., Krawiec, K. (eds.) GECCO ’21: Genetic and Evolutionary Computation Conference, Lille, France, July 10- 14,2021.pp.510–518.ACM(2021).https://doi.org/10.1145/3449639.3459399, https://doi.org/10.1145/3449639.3459399
-
[23]
In: Auger, A., Fonseca, C.M., Lourenço, N., Machado, P., Paquete, L., Whit- ley, L.D
van Rijn, S., Doerr, C., Bäck, T.: Towards an adaptive CMA-ES configurator. In: Auger, A., Fonseca, C.M., Lourenço, N., Machado, P., Paquete, L., Whit- ley, L.D. (eds.) Parallel Problem Solving from Nature - PPSN XV - 15th In- ternational Conference, Coimbra, Portugal, September 8-12, 2018, Proceedings, Part I. Lecture Notes in Computer Science, vol. 1110...
-
[24]
Safe, M.D., Carballido, J.A., Ponzoni, I., Brignole, N.B.: On stopping criteria for genetic algorithms. In: Bazzan, A.L.C., Labidi, S. (eds.) Advances in Arti- ficial Intelligence - SBIA 2004, 17th Brazilian Symposium on Artificial Intelli- gence, São Luis, Maranhão, Brazil, September 29 - October 1, 2004, Proceed- ings. Lecture Notes in Computer Science,...
-
[25]
Schäpermeier, L.: Greedy restart schedules: A baseline for dynamic algorithm se- lection on numerical black-box optimization problems. In: Filipic, B. (ed.) Pro- ceedings of the Genetic and Evolutionary Computation Conference, GECCO 2025, NH Malaga Hotel, Malaga, Spain, July 14-18, 2025. pp. 1199–1207. ACM (2025).https://doi.org/10.1145/3712256.3726408,ht...
-
[26]
Smit, S.K., Eiben, A.E.: Beating the ’world champion’ evolutionary algorithm via REVAC tuning. In: Proceedings of the IEEE Congress on Evolutionary Com- putation, CEC 2010, Barcelona, Spain, 18-23 July 2010. pp. 1–8. IEEE (2010). https://doi.org/10.1109/CEC.2010.5586026,https://doi.org/10.1109/CEC. 2010.5586026
-
[27]
Storn, R., Price, K.V.: Differential evolution - A simple and efficient heuris- tic for global optimization over continuous spaces. J. Glob. Optim.11(4), 341– 359 (1997).https://doi.org/10.1023/A:1008202821328,https://doi.org/10. 1023/A:1008202821328
-
[28]
Tusar,T.,Brockhoff,D.,Hansen,N.:Mixed-integerbenchmarkproblemsforsingle- and bi-objective optimization. In: Auger, A., Stützle, T. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2019, Prague, Czech Republic, July 13-17, 2019. pp. 718–726. ACM (2019).https://doi.org/10.1145/ 3321707.3321868,https://doi.org/10.1145/3321707.3321868
-
[29]
Multiobjective optimization using evolutionary algorithms --- A comparative case study
Zitzler, E., Thiele, L.: Multiobjective optimization using evolutionary algorithms - A comparative case study. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwe- fel, H. (eds.) Parallel Problem Solving from Nature - PPSN V, 5th International Conference, Amsterdam, The Netherlands, September 27-30, 1998, Proceedings. Lecture Notes in Computer Science, vol. ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.