pith. sign in

arxiv: 2606.09220 · v1 · pith:KAXACRLEnew · submitted 2026-06-08 · 💻 cs.NE

Quantitative Performance Analysis of Stopping Criteria for CMA-ES

Pith reviewed 2026-06-27 14:27 UTC · model grok-4.3

classification 💻 cs.NE
keywords CMA-ESstopping criteriaBBOBblack-box optimizationevolution strategytolfunhistperformance analysisrestarts
0
0 comments X

The pith

tolfunhist and the full portfolio of stopping criteria deliver the highest accuracy for halting CMA-ES on BBOB functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests the eleven stopping criteria built into CMA-ES to decide when a run should end. It scores each criterion by how closely its trigger point matches an optimal stopping point measured in function evaluations on the noiseless BBOB suite. Results show that the first criterion to fire varies with population size and dimension, yet tolfunhist and the combined portfolio match the optimal point most reliably. tolfun and tolfunhist also tend to activate before the search has fully stagnated. Readers care because better stopping rules reduce wasted evaluations when CMA-ES is used inside restart schemes.

Core claim

The paper claims that tolfunflatfitness and tolfun are frequently the first criteria triggered among the eleven, that tolfunhist and the full portfolio achieve the highest stopping accuracy in most cases, and that tolfun and tolfunhist are often activated before CMA-ES reaches complete stagnation.

What carries the argument

The portfolio of eleven stopping criteria inside CMA-ES, scored against an optimal stopping point defined by number of function evaluations on BBOB.

If this is right

  • tolfunhist alone matches the optimal stopping point more closely than most single criteria.
  • The combined portfolio improves accuracy over many individual criteria.
  • tolfun and tolfunhist frequently halt the search before complete stagnation occurs.
  • Which criterion fires first changes with population size lambda and dimension n.
  • Stopping accuracy is evaluated on the noiseless BBOB set using function-evaluation counts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Restart strategies that rely on these criteria may save evaluations more reliably when tolfunhist is emphasized.
  • The same criteria could behave differently on noisy or constrained problems not covered by the current BBOB tests.
  • Alternative reference points, such as target precision or gradient norms, might change which criterion appears best.

Load-bearing premise

An optimal stopping point defined by the number of function evaluations in one CMA-ES run provides a stable reference for judging criterion quality across the BBOB suite.

What would settle it

Measure the actual objective value and convergence state at the moment each criterion fires, then compare those states directly to the state at the optimal evaluation count on the same BBOB runs.

Figures

Figures reproduced from arXiv: 2606.09220 by Ryoji Tanabe.

Figure 1
Figure 1. Figure 1: FE∗ (⋆) and FEstop of tolfun (•) and tolstagnation (♦) in a single run of CMA-ES. The black line shows the error value |f(x bsf) − f(x ∗ )|. as the number of function evaluations at which the best-so-far hypervolume [27] value is last updated in a single run of an EMO algorithm.3 In this work, FE∗ is simply defined as the number of function evaluations at which the best-so-far objective value was last upda… view at source ↗
Figure 2
Figure 2. Figure 2: Number of times each stopping criterion was triggered first among the 11 [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average POSE values over the 360 function instances. The horizontal axis shows the dimension, n ∈ {2, 3, 5, 10, 20, 40}, and the vertical axis shows the average POSE value. 5.2 Accuracy of each criterion for stopping the CMA-ES search [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Number of times each stopping criterion stops the search before [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
read the original abstract

Covariance matrix adaptation evolution strategy (CMA-ES) is a state-of-the-art black-box optimization algorithm. In general, CMA-ES uses a portfolio of multiple stopping criteria to automatically determine when to stop the search. This mechanism aims to avoid unnecessary consumption of the function evaluation budget during stagnation. Stopping criteria play an important role in CMA-ES, particularly when restart strategies are employed. However, the effectiveness of stopping criteria in CMA-ES remains poorly understood. To address this issue, this paper investigates how the 11 stopping criteria in CMA-ES behave on the noiseless BBOB function set. The performance of the stopping criteria is quantitatively evaluated based on the optimal stopping point in terms of the number of function evaluations in a single run of CMA-ES. Our results show that, although which stopping criterion is triggered first depends significantly on the sample size $\lambda$ and the dimension $n$, \texttt{tolflatfitness} and \texttt{tolfun} are frequently the first criteria to be triggered among the portfolio of 11 stopping criteria. We also demonstrate that \texttt{tolfunhist} and the portfolio achieve the highest stopping accuracy in most cases. In addition, our results show that the \texttt{tolfun} and \texttt{tolfunhist} criteria are frequently triggered before CMA-ES reaches complete stagnation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript empirically analyzes the behavior of 11 stopping criteria in the CMA-ES algorithm on the noiseless BBOB test suite. It reports that the order in which criteria are triggered first depends on the population size λ and dimension n, with tolfunflatfitness and tolfun often triggered first. The study finds that tolfunhist and the full portfolio of criteria achieve the highest stopping accuracy relative to an optimal stopping point defined by function evaluations in a single CMA-ES run, and that tolfun and tolfunhist are frequently triggered before complete stagnation.

Significance. Should the evaluation methodology prove robust to stochastic variation, this work provides quantitative empirical guidance on the relative effectiveness of individual stopping criteria versus portfolios in CMA-ES. Such data is useful for designing restart strategies that balance convergence detection against unnecessary function evaluations on standard black-box benchmarks.

major comments (2)
  1. [Abstract] Abstract: The accuracy metric judges each stopping criterion by proximity of its trigger time to an 'optimal stopping point' defined via the number of function evaluations in a single CMA-ES run. CMA-ES is stochastic, so the evaluation count at which a given target quality is first reached varies across independent runs on the same BBOB instance. The manuscript supplies no evidence that this reference was averaged over restarts, accompanied by variance estimates, or replaced by a run-independent proxy (e.g., fixed target precision). This assumption is load-bearing for the claim that tolfunhist and the portfolio achieve the highest stopping accuracy in most cases.
  2. [Abstract] Abstract: The statement that tolfunhist and the portfolio achieve highest accuracy 'in most cases' is not accompanied by a precise quantification (fraction of functions, dimensions, or λ values) or by any indication that accuracy differences were assessed with statistical tests across the BBOB suite.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments regarding the evaluation of stopping criteria in CMA-ES. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The accuracy metric judges each stopping criterion by proximity of its trigger time to an 'optimal stopping point' defined via the number of function evaluations in a single CMA-ES run. CMA-ES is stochastic, so the evaluation count at which a given target quality is first reached varies across independent runs on the same BBOB instance. The manuscript supplies no evidence that this reference was averaged over restarts, accompanied by variance estimates, or replaced by a run-independent proxy (e.g., fixed target precision). This assumption is load-bearing for the claim that tolfunhist and the portfolio achieve the highest stopping accuracy in most cases.

    Authors: The optimal stopping point is explicitly defined per individual CMA-ES run as the evaluation count at which the target precision is first reached within that same run. This intra-run construction ensures direct comparability between each stopping criterion's trigger and the reference point under identical stochastic conditions, without cross-run averaging. We will revise the manuscript to state this definition more explicitly in the abstract and methods, and we will add a brief discussion of run-to-run variability in the optimal point for a subset of instances to address robustness concerns. revision: partial

  2. Referee: [Abstract] Abstract: The statement that tolfunhist and the portfolio achieve highest accuracy 'in most cases' is not accompanied by a precise quantification (fraction of functions, dimensions, or λ values) or by any indication that accuracy differences were assessed with statistical tests across the BBOB suite.

    Authors: We agree that the abstract would be strengthened by quantification. The underlying results already tabulate accuracy for every BBOB function, dimension, and λ setting; from these data, tolfunhist and the portfolio are highest in the majority of configurations. In revision we will replace 'in most cases' with a specific fraction (derived from the existing tables) and note the consistency of the ranking across the test suite. Formal statistical tests were not applied in the original study but can be added as a supplementary note if space allows. revision: yes

Circularity Check

0 steps flagged

No circularity; direct empirical counts on external BBOB benchmarks

full rationale

The paper conducts a quantitative empirical study that counts trigger frequencies and accuracies of 11 stopping criteria across BBOB functions, using a single-run reference point for 'optimal stopping'. No derivations, equations, fitted parameters, or self-citations appear in the provided text. The central claims rest on direct observation against fixed external benchmarks rather than any reduction to inputs by construction. This matches the default case of a self-contained empirical analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. The 11 criteria and BBOB functions are treated as given from prior literature.

pith-pipeline@v0.9.1-grok · 5755 in / 1130 out tokens · 20686 ms · 2026-06-27T14:27:48.208104+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 24 canonical work pages

  1. [1]

    Auger and N

    Auger, A., Hansen, N.: Performance evaluation of an advanced local search evolutionary algorithm. In: Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2005, 2-4 September 2005, Edinburgh, UK. pp. 1777–1784. IEEE (2005).https://doi.org/10.1109/CEC.2005.1554903,https://doi.org/ 10.1109/CEC.2005.1554903

  2. [2]

    Auger and N

    Auger, A., Hansen, N.: A restart CMA evolution strategy with increasing popu- lation size. In: Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2005, 2-4 September 2005, Edinburgh, UK. pp. 1769–1776. IEEE (2005). https://doi.org/10.1109/CEC.2005.1554902,https://doi.org/10.1109/CEC. 2005.1554902

  3. [3]

    In: Proceedings of the IEEE Congress on Evolutionary Com- putation, CEC 2011, New Orleans, LA, USA, 5-8 June, 2011

    Cuccu, G., Gomez, F.J., Glasmachers, T.: Novelty-based restarts for evolu- tion strategies. In: Proceedings of the IEEE Congress on Evolutionary Com- putation, CEC 2011, New Orleans, LA, USA, 5-8 June, 2011. pp. 158–163. IEEE (2011).https://doi.org/10.1109/CEC.2011.5949613,https://doi.org/ 10.1109/CEC.2011.5949613

  4. [4]

    In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H

    Fukunaga, A.S.: Restart scheduling for genetic algorithms. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H. (eds.) Parallel Problem Solving from Nature - PPSN V, 5th International Conference, Amsterdam, The Netherlands, September 27-30, 1998, Proceedings. Lecture Notes in Computer Science, vol. 1498, pp. 357–

  5. [5]

    Springer (1998).https://doi.org/10.1007/BFB0056878,https://doi.org/ 10.1007/BFb0056878

  6. [6]

    In: Rothlauf, F

    Hansen, N.: Benchmarking a bi-population CMA-ES on the BBOB-2009 function testbed. In: Rothlauf, F. (ed.) Genetic and Evolutionary Computation Conference, GECCO 2009, Proceedings, Montreal, Québec, Canada, July 8-12, 2009, Compan- ion Material. pp. 2389–2396. ACM (2009).https://doi.org/10.1145/1570256. 1570333,https://doi.org/10.1145/1570256.1570333

  7. [7]

    CoRRabs/1604.00772 (2016),http://arxiv.org/abs/1604.00772

    Hansen, N.: The CMA evolution strategy: A tutorial. CoRRabs/1604.00772 (2016),http://arxiv.org/abs/1604.00772

  8. [8]

    Hansen, Y

    Hansen, N., Akimoto, Y., Baudis, P.: CMA-ES/pycma on Github. Zenodo, DOI:10.5281/zenodo.2559634 (2019)

  9. [9]

    Hansen, N., Auger, A., Ros, R., Mersmann, O., Tusar, T., Brockhoff, D.: COCO: a platform for comparing continuous optimizers in a black-box setting. Optim. Meth- ods Softw.36(1), 114–144 (2021).https://doi.org/10.1080/10556788.2020. 1808977,https://doi.org/10.1080/10556788.2020.1808977

  10. [10]

    Hansen, N., Finck, S., Ros, R., Auger, A.: Real-parameter black-box optimization benchmarking 2009: Noiseless functions definitions. Tech. rep., INRIA (2009)

  11. [11]

    In: Yao, X., Burke, E.K., Lozano, J.A., Smith, J., Guervós, J.J.M., Bul- linaria, J.A., Rowe, J.E., Tiño, P., Kabán, A., Schwefel, H

    Hansen, N., Kern, S.: Evaluating the CMA evolution strategy on multimodal test functions. In: Yao, X., Burke, E.K., Lozano, J.A., Smith, J., Guervós, J.J.M., Bul- linaria, J.A., Rowe, J.E., Tiño, P., Kabán, A., Schwefel, H. (eds.) Parallel Problem Solving from Nature - PPSN VIII, 8th International Conference, Birmingham, UK, September 18-22, 2004, Proceed...

  12. [12]

    Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolu- tion strategies. Evol. Comput.9(2), 159–195 (2001).https://doi.org/10.1162/ 106365601750190398,https://doi.org/10.1162/106365601750190398

  13. [13]

    Science275(5296), 51–54 (1997) 16 Ryoji Tanabe

    Huberman, B., Lukose, R., Hogg, T.: An Economics Approach to Hard Computa- tional Problems. Science275(5296), 51–54 (1997) 16 Ryoji Tanabe

  14. [14]

    In: IEEE International Conference on Evolutionary Computation, CEC 2006, part of WCCI 2006, Vancouver, BC, Canada, 16-21 July 2006

    Jastrebski, G.A., Arnold, D.V.: Improving evolution strategies through active co- variance matrix adaptation. In: IEEE International Conference on Evolutionary Computation, CEC 2006, part of WCCI 2006, Vancouver, BC, Canada, 16-21 July 2006. pp. 2814–2821. IEEE (2006).https://doi.org/10.1109/CEC.2006. 1688662,https://doi.org/10.1109/CEC.2006.1688662

  15. [15]

    In: Genetic and Evolutionary Computation Confer- ence, GECCO ’26

    Kitamura, K., Tanabe, R.: Benchmarking stopping criteria for evolutionary multi-objective optimization. In: Genetic and Evolutionary Computation Confer- ence, GECCO ’26. ACM (2026 (in press)).https://doi.org/10.1145/3795095. 3805068,https://doi.org/10.1145/3795095.3805068

  16. [16]

    In: GECCO

    Liu, Y., Zhou, A., Zhang, H.: Termination detection strategies in evolutionary algorithms: a survey. In: GECCO. pp. 1063–1070 (2018).https://doi.org/10. 1145/3205455.3205466,https://doi.org/10.1145/3205455.3205466

  17. [17]

    Operations Research Perspectives3, 43–58 (2016).https://doi.org/10.1016/j.orp.2016

    López-Ibáñez, M., Dubois-Lacoste, J., Cáceres, L.P., Birattari, M., Stützle, T.: The irace package: Iterated racing for automatic algorithm configuration. Operations Research Perspectives3, 43–58 (2016).https://doi.org/10.1016/j.orp.2016. 09.002,https://doi.org/10.1016/j.orp.2016.09.002

  18. [18]

    In: Coello, C.A.C., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M

    López-Ibáñez, M., Liao, T., Stützle, T.: On the anytime behavior of IPOP-CMA- ES. In: Coello, C.A.C., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M. (eds.) Parallel Problem Solving from Nature - PPSN XII - 12th Interna- tional Conference, Taormina, Italy, September 1-5, 2012, Proceedings, Part I. Lecture Notes in Computer Science, vol. 7491, p...

  19. [19]

    In: Affen- zeller, M., Winkler, S.M., Kononova, A.V., Trautmann, H., Tusar, T., Machado, P., Bäck, T

    Marty, T., Hansen, N., Auger, A., Semet, Y., Héron, S.: LB+IC-CMA-ES: two simple modifications of CMA-ES to handle mixed-integer problems. In: Affen- zeller, M., Winkler, S.M., Kononova, A.V., Trautmann, H., Tusar, T., Machado, P., Bäck, T. (eds.) Parallel Problem Solving from Nature - PPSN XVIII - 18th In- ternational Conference, PPSN 2024, Hagenberg, Au...

  20. [20]

    Springer (2024).https://doi.org/10.1007/978-3-031-70068-2_18,https: //doi.org/10.1007/978-3-031-70068-2_18

  21. [21]

    In: Affenzeller, M., Winkler, S.M., Kononova, A.V., Trautmann, H., Tusar, T., Machado, P., Bäck, T

    de Nobel, J., Vermetten, D., Kononova, A.V., Shir, O.M., Bäck, T.: Avoiding re- dundant restarts in multimodal global optimization. In: Affenzeller, M., Winkler, S.M., Kononova, A.V., Trautmann, H., Tusar, T., Machado, P., Bäck, T. (eds.) Parallel Problem Solving from Nature - PPSN XVIII - 18th International Confer- ence, PPSN 2024, Hagenberg, Austria, Se...

  22. [22]

    In: Chicano, F., Krawiec, K

    de Nobel, J., Wang, H., Bäck, T.: Explorative data analysis of time series based al- gorithm features of CMA-ES variants. In: Chicano, F., Krawiec, K. (eds.) GECCO ’21: Genetic and Evolutionary Computation Conference, Lille, France, July 10- 14,2021.pp.510–518.ACM(2021).https://doi.org/10.1145/3449639.3459399, https://doi.org/10.1145/3449639.3459399

  23. [23]

    In: Auger, A., Fonseca, C.M., Lourenço, N., Machado, P., Paquete, L., Whit- ley, L.D

    van Rijn, S., Doerr, C., Bäck, T.: Towards an adaptive CMA-ES configurator. In: Auger, A., Fonseca, C.M., Lourenço, N., Machado, P., Paquete, L., Whit- ley, L.D. (eds.) Parallel Problem Solving from Nature - PPSN XV - 15th In- ternational Conference, Coimbra, Portugal, September 8-12, 2018, Proceedings, Part I. Lecture Notes in Computer Science, vol. 1110...

  24. [24]

    In: Bazzan, A.L.C., Labidi, S

    Safe, M.D., Carballido, J.A., Ponzoni, I., Brignole, N.B.: On stopping criteria for genetic algorithms. In: Bazzan, A.L.C., Labidi, S. (eds.) Advances in Arti- ficial Intelligence - SBIA 2004, 17th Brazilian Symposium on Artificial Intelli- gence, São Luis, Maranhão, Brazil, September 29 - October 1, 2004, Proceed- ings. Lecture Notes in Computer Science,...

  25. [25]

    In: Filipic, B

    Schäpermeier, L.: Greedy restart schedules: A baseline for dynamic algorithm se- lection on numerical black-box optimization problems. In: Filipic, B. (ed.) Pro- ceedings of the Genetic and Evolutionary Computation Conference, GECCO 2025, NH Malaga Hotel, Malaga, Spain, July 14-18, 2025. pp. 1199–1207. ACM (2025).https://doi.org/10.1145/3712256.3726408,ht...

  26. [26]

    In: Proceedings of the IEEE Congress on Evolutionary Com- putation, CEC 2010, Barcelona, Spain, 18-23 July 2010

    Smit, S.K., Eiben, A.E.: Beating the ’world champion’ evolutionary algorithm via REVAC tuning. In: Proceedings of the IEEE Congress on Evolutionary Com- putation, CEC 2010, Barcelona, Spain, 18-23 July 2010. pp. 1–8. IEEE (2010). https://doi.org/10.1109/CEC.2010.5586026,https://doi.org/10.1109/CEC. 2010.5586026

  27. [27]

    Storn, R., Price, K.V.: Differential evolution - A simple and efficient heuris- tic for global optimization over continuous spaces. J. Glob. Optim.11(4), 341– 359 (1997).https://doi.org/10.1023/A:1008202821328,https://doi.org/10. 1023/A:1008202821328

  28. [28]

    In: Auger, A., Stützle, T

    Tusar,T.,Brockhoff,D.,Hansen,N.:Mixed-integerbenchmarkproblemsforsingle- and bi-objective optimization. In: Auger, A., Stützle, T. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2019, Prague, Czech Republic, July 13-17, 2019. pp. 718–726. ACM (2019).https://doi.org/10.1145/ 3321707.3321868,https://doi.org/10.1145/3321707.3321868

  29. [29]

    Multiobjective optimization using evolutionary algorithms --- A comparative case study

    Zitzler, E., Thiele, L.: Multiobjective optimization using evolutionary algorithms - A comparative case study. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwe- fel, H. (eds.) Parallel Problem Solving from Nature - PPSN V, 5th International Conference, Amsterdam, The Netherlands, September 27-30, 1998, Proceedings. Lecture Notes in Computer Science, vol. ...