GeoPAS: Geometric Probing for Algorithm Selection in Continuous Black-Box Optimization
Pith reviewed 2026-05-22 10:08 UTC · model grok-4.3
The pith
A geometric probing framework using random multi-scale 2D slices of objective landscapes enables effective solver selection for continuous black-box optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The geometric probing framework represents each problem instance by randomly sampled multi-scale two-dimensional slices of the objective landscape encoded with validity-mask-aware visual pooling and aggregated into an instance representation, with solver selection performed by a logarithmic composite score combining a learned instance-conditioned estimate with an algorithm-side empirical prior, which reduces aggregate mean relative expected running time from 30.37 for the single best solver to 3.14 and 3.61 under instance-level and grouped random protocols while improving median and upper-tail performance.
What carries the argument
The geometric probing framework that samples multi-scale two-dimensional slices of the objective landscape and aggregates them via validity-mask-aware visual pooling to create representations for solver selection.
If this is right
- Reduces aggregate mean relative expected running time from 30.37 to 3.14 under the instance-level transfer protocol.
- Reduces aggregate mean relative expected running time to 3.61 under the grouped random transfer protocol.
- Improves median and upper-tail performance under the within-suite transfer protocols.
- Under problem-level transfer a prior-heavy scoring variant mitigates rare extreme failures though overall mean remains sensitive to outliers.
Where Pith is reading between the lines
- The results suggest that inexpensive low-dimensional geometric samples can replace costlier full-dimensional analyses for the purpose of solver ranking.
- If the same slice-based representations prove stable, they could support dynamic solver switching mid-optimization rather than one-time upfront selection.
- The framework's reliance on random sampling implies that similar probing ideas might transfer to algorithm selection tasks outside continuous black-box settings.
Load-bearing premise
That randomly sampled multi-scale two-dimensional slices of the objective landscape capture sufficient solver-relevant information when encoded and aggregated.
What would settle it
A collection of problem instances in which similar slice encodings correspond to substantially different best-performing solvers from the portfolio would show that the probes fail to supply adequate distinguishing features.
Figures
read the original abstract
Automated algorithm selection for continuous black-box optimization depends on representing problem information under limited probing and selecting solvers under heavy-tailed performance distributions. This paper proposes a geometric probing framework that represents each problem instance by randomly sampled multi-scale two-dimensional slices of the objective landscape. The slices are encoded with validity-mask-aware visual pooling and aggregated into an instance representation. Solver selection is then performed by a logarithmic composite score combining a learned instance-conditioned estimate with an algorithm-side empirical prior. The framework is evaluated on a standard single-objective black-box optimization benchmark suite with a portfolio of twelve solvers under instance-level, grouped random, and problem-level transfer protocols. Under the two within-suite protocols, it reduces aggregate mean relative expected running time from 30.37 for the single best solver to 3.14 and 3.61, while also improving median and upper-tail performance. Under problem-level transfer, the canonical adaptive setting improves typical and moderate-tail performance but leaves the mean dominated by rare extreme failures; a prior-heavy scoring variant mitigates this failure mode, although its robustness may be benchmark-dependent. The results suggest that coarse geometric probes provide useful solver-relevant information, while robust cross-problem selection also depends on metric-aligned decision scoring.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes GeoPAS, a geometric probing framework for automated algorithm selection in continuous black-box optimization. Each problem instance is represented by randomly sampled multi-scale two-dimensional slices of the objective landscape, which are encoded via validity-mask-aware visual pooling and aggregated. Solver selection employs a logarithmic composite score that combines a learned instance-conditioned estimate with an algorithm-side empirical prior. The method is evaluated on a standard BBOB-style benchmark with a portfolio of twelve solvers under instance-level, grouped-random, and problem-level transfer protocols, reporting reductions in aggregate mean relative expected running time from 30.37 (single-best solver) to 3.14 and 3.61 under the first two protocols, together with gains in median and upper-tail performance.
Significance. If the reported performance gains are reproducible and statistically supported, the work would be significant for the algorithm-selection community. It provides concrete evidence that coarse geometric probes can extract solver-relevant landscape information under limited evaluation budgets and offers a practical composite scoring rule that mitigates heavy-tailed failure modes. The explicit comparison across three transfer protocols and the discussion of prior-heavy variants are useful contributions. The absence of statistical tests, variance estimates, and training details in the current text, however, prevents a definitive assessment of robustness.
major comments (3)
- [Abstract] Abstract: The central performance claims state that mean relative ERT drops from 30.37 to 3.14/3.61 under instance-level and grouped-random protocols, yet no statistical tests, standard deviations, number of independent runs, or confidence intervals are supplied. Given the heavy-tailed nature of ERT distributions emphasized in the paper, these omissions make it impossible to judge whether the observed improvements are reliable or could be explained by variance.
- [Abstract / Method] The description of the learned instance-conditioned estimate does not specify the model class, feature dimensionality after pooling, training/validation split, or regularization used. Without these details it is unclear whether the large gains under the instance-level protocol reflect genuine generalization or inadvertent leakage from the evaluation data used to fit the selector.
- [Method / Experiments] The representational hypothesis—that random multi-scale 2D slices plus validity-mask-aware pooling suffice to distinguish solver performance in D ≫ 2—remains untested for non-separable or ill-conditioned functions. In such cases the planar probes capture only a vanishing fraction of variable interactions; an ablation that varies slice dimensionality or compares against full-dimensional descriptors would be required to substantiate the claim that the geometric representation is adequate.
minor comments (2)
- [Abstract] The benchmark suite is referred to only as 'standard single-objective black-box optimization benchmark suite'; explicitly naming BBOB and the function set (e.g., f1–f24) would improve reproducibility.
- [Method] Notation for the composite score (logarithmic combination of learned estimate and empirical prior) should be introduced with an equation number in the main text rather than left implicit.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important issues regarding statistical rigor, methodological transparency, and validation of the geometric representation. We address each major comment below and indicate the corresponding revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central performance claims state that mean relative ERT drops from 30.37 to 3.14/3.61 under instance-level and grouped-random protocols, yet no statistical tests, standard deviations, number of independent runs, or confidence intervals are supplied. Given the heavy-tailed nature of ERT distributions emphasized in the paper, these omissions make it impossible to judge whether the observed improvements are reliable or could be explained by variance.
Authors: We agree that the lack of statistical support and variance reporting weakens the presentation of the results, particularly given the emphasis on heavy-tailed ERT behavior. In the revised manuscript we will report the number of independent runs performed, include standard deviations and 95% confidence intervals for all mean relative ERT figures, and add statistical significance tests (Wilcoxon signed-rank tests with Bonferroni correction) comparing GeoPAS against the single-best solver. These elements will be added to the abstract, the results tables, and the experimental analysis section. revision: yes
-
Referee: [Abstract / Method] The description of the learned instance-conditioned estimate does not specify the model class, feature dimensionality after pooling, training/validation split, or regularization used. Without these details it is unclear whether the large gains under the instance-level protocol reflect genuine generalization or inadvertent leakage from the evaluation data used to fit the selector.
Authors: We will revise the Method section to provide the missing specifications: the instance-conditioned model is a three-layer MLP with ReLU activations, the pooled feature vector is 256-dimensional, training uses an 80/20 instance-level split with early stopping on a held-out validation set, and L2 regularization plus dropout (p=0.3) are applied. To address the leakage concern, we will explicitly state that the instance-level protocol employs a leave-one-instance-out cross-validation scheme in which the selector is never trained on any instance used for final evaluation. This separation ensures that reported gains reflect generalization rather than data leakage. revision: yes
-
Referee: [Method / Experiments] The representational hypothesis—that random multi-scale 2D slices plus validity-mask-aware pooling suffice to distinguish solver performance in D ≫ 2—remains untested for non-separable or ill-conditioned functions. In such cases the planar probes capture only a vanishing fraction of variable interactions; an ablation that varies slice dimensionality or compares against full-dimensional descriptors would be required to substantiate the claim that the geometric representation is adequate.
Authors: We acknowledge that a direct ablation on slice dimensionality and a comparison against full-dimensional descriptors would provide stronger evidence. The BBOB suite already contains non-separable and ill-conditioned functions, and the reported gains hold across the full suite. In the revision we will add a per-category breakdown (separable vs. non-separable / ill-conditioned) and a limited ablation that varies the number of slices and scales. A comprehensive full-dimensional descriptor comparison is computationally prohibitive under the current budget but we will discuss its expected cost and include preliminary results where feasible. revision: partial
Circularity Check
No significant circularity; empirical framework evaluated on external benchmarks
full rationale
The paper defines a geometric probing method that samples multi-scale 2D slices, applies validity-mask-aware visual pooling to form instance representations, and combines them with a logarithmic composite score for solver selection. These steps are presented as a new representational pipeline whose outputs are then tested empirically against a portfolio of 12 solvers on a standard BBOB-style suite under instance-level, grouped-random, and problem-level transfer protocols. The reported reductions in mean relative ERT (30.37 to 3.14/3.61) are measured outcomes on held-out or protocol-defined instances rather than quantities that reduce by the paper's own equations to parameters fitted on the same data. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the derivation chain; the central claim remains an empirical assertion about representational utility that can be falsified by the benchmark results themselves.
Axiom & Free-Parameter Ledger
free parameters (1)
- parameters of the learned instance-conditioned estimate
Reference graph
Works this paper leans on
-
[1]
Mohamad Alissa, Kevin Sim, and Emma Hart. 2023. Automated algorithm selection: from feature-based to feature-free approaches.Journal of Heuristics 29, 1 (2023), 1–38
work page 2023
-
[2]
Asma Atamna. 2015. Benchmarking IPOP-CMA-ES-TPA and IPOP-CMA-ES- MSR on the BBOB noiseless testbed. InProceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO) Companion. 1135–1142
work page 2015
-
[3]
Anne Auger, Dimo Brockhoff, and Nikolaus Hansen. 2013. Benchmarking the local metamodel CMA-ES on the noiseless BBOB’2013 test bed. InProceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO) Companion. 1225–1232
work page 2013
-
[4]
Petr Baudiš and Petr Pošík. 2015. Global line search algorithm hybridized with quadratic interpolation and its extension to separable functions. InProceedings of the 2015 annual conference on genetic and evolutionary computation. 257–264
work page 2015
-
[5]
Albert S Berahas, Richard H Byrd, and Jorge Nocedal. 2019. Derivative-free optimization of noisy functions via quasi-Newton methods.SIAM Journal on Optimization29, 2 (2019), 965–993
work page 2019
-
[6]
Bernd Bischl, Olaf Mersmann, Heike Trautmann, and Mike Preuß. 2012. Al- gorithm selection based on exploratory landscape analysis and cost-sensitive learning. InProceedings of the 14th annual conference on Genetic and evolutionary computation. 313–320
work page 2012
-
[7]
Gjorgjina Cenikj, Ana Nikolikj, and Tome Eftimov. 2025. Recent Advances in Meta-features Used for Representing Black-box Single-objective Continuous Op- timization. InProceedings of the Genetic and Evolutionary Computation Conference Companion. 1471–1494
work page 2025
-
[8]
Gjorgjina Cenikj, Ana Nikolikj, Gašper Petelin, Niki Van Stein, Carola Doerr, and Tome Eftimov. 2026. A survey of features used for representing black-box single-objective continuous optimization.Swarm and Evolutionary Computation 101 (2026), 102288
work page 2026
-
[9]
Gjorgjina Cenikj, Gašper Petelin, and Tome Eftimov. 2024. Transoptas: Transformer-based algorithm selection for single-objective optimization. InPro- ceedings of the Genetic and Evolutionary Computation Conference Companion. 403–406
work page 2024
-
[10]
Gjorgjina Cenikj, Gašper Petelin, Moritz Seiler, Nikola Cenikj, and Tome Eftimov
-
[11]
Landscape features in single-objective continuous optimization: Have we hit a wall in algorithm selection generalization?Swarm and Evolutionary Computation94 (2025), 101894
work page 2025
-
[12]
Konstantin Dietrich, Diederick Vermetten, Carola Doerr, and Pascal Kerschke
-
[13]
InProceedings of the Genetic and Evolutionary Computation Conference
Impact of training instance selection on automated algorithm selection models for numerical black-box optimization. InProceedings of the Genetic and Evolutionary Computation Conference. 1007–1016
-
[14]
Nikolaus Hansen. 2016. The CMA evolution strategy: A tutorial.arXiv preprint arXiv:1604.00772(2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[15]
Nikolaus Hansen, Anne Auger, Raymond Ros, Olaf Mersmann, Tea Tušar, and Dimo Brockhoff. 2021. COCO: A platform for comparing continuous optimizers in a black-box setting.Optimization Methods and Software36, 1 (2021), 114–144
work page 2021
-
[16]
2009.Real- Parameter Black-Box Optimization Benchmarking 2009: Noiseless Functions Def- initions
Nikolaus Hansen, Steffen Finck, Raymond Ros, and Anne Auger. 2009.Real- Parameter Black-Box Optimization Benchmarking 2009: Noiseless Functions Def- initions. Research Report RR-6829. INRIA. https://inria.hal.science/inria- 00362633v2 Version 2, HAL Id: inria-00362633
work page 2009
-
[17]
Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. 2013. An evaluation of sequential model-based optimization for expensive blackbox functions. In Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO) Companion. 1209–1216
work page 2013
-
[18]
Waltraud Huyer and Arnold Neumaier. 2009. Benchmarking of MCS on the noiseless function testbed.Online, 2009c. URL http://www. mat. univie. ac. at/˜ neum/papers. html989 (2009)
work page 2009
-
[19]
Stefan Ivić, Siniša Družeta, and Luka Grbčić. 2025. Randomness as Refer- ence: Benchmark Metric for Optimization in Engineering.arXiv preprint arXiv:2511.17226(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[20]
Pascal Kerschke, Holger H Hoos, Frank Neumann, and Heike Trautmann. 2019. Automated algorithm selection: Survey and perspectives.Evolutionary computa- tion27, 1 (2019), 3–45
work page 2019
-
[21]
Pascal Kerschke and Heike Trautmann. 2019. Automated algorithm selection on continuous black-box problems by combining exploratory landscape analysis and machine learning.Evolutionary computation27, 1 (2019), 99–127
work page 2019
-
[22]
Ana Kostovska, Carola Doerr, Sašo Džeroski, Panče Panov, and Tome Eftimov
-
[23]
InProceedings of the Genetic and Evolutionary Wang et al
Geometric Learning in Black-Box Optimization: A GNN Framework for Algorithm Performance Prediction. InProceedings of the Genetic and Evolutionary Wang et al. Computation Conference Companion. 487–490
-
[24]
Manuel López-Ibáñez, Diederick Vermetten, Johann Dreo, and Carola Doerr. 2024. Using the empirical attainment function for analyzing single-objective black-box optimization algorithms.IEEE Transactions on Evolutionary Computation(2024)
work page 2024
-
[25]
Ilya Loshchilov, Marc Schoenauer, and Michèle Sebag. 2013. Bi-population CMA-ES agorithms with surrogate models and line searches. InProceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO) Companion. 1177–1184
work page 2013
-
[26]
Zeyuan Ma, Hongshu Guo, Yue-Jiao Gong, Jun Zhang, and Kay Chen Tan. 2025. Toward automated algorithm design: A survey and practical guide to meta-black- box-optimization.IEEE Transactions on Evolutionary Computation(2025)
work page 2025
-
[27]
Francesco Mezzadri. 2006. How to generate random matrices from the classical compact groups.arXiv preprint math-ph/0609050(2006)
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[28]
Ana Nikolikj, Ana Kostovska, Gjorgjina Cenikj, Carola Doerr, and Tome Eftimov
-
[29]
In2024 IEEE Congress on Evolutionary Computation (CEC)
Generalization Ability of Feature-Based Performance Prediction Models: A Statistical Analysis Across Benchmarks. In2024 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1–8
-
[30]
Art B Owen. 1998. Scrambling Sobol’and Niederreiter–Xing Points.Journal of complexity14, 4 (1998), 466–489
work page 1998
-
[31]
László Pál. 2013. Benchmarking a hybrid multi level single linkage algorithm on the BBOB noiseless testbed. InProceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO) Companion. 1145–1152
work page 2013
-
[32]
László Pál. 2013. Comparison of multistart global optimization algorithms on the BBOB noiseless testbed. InProceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO) Companion. 1153–1160
work page 2013
-
[33]
Gašper Petelin and Gjorgjina Cenikj. 2024. On Generalization of ELA Feature Groups. InProceedings of the Genetic and Evolutionary Computation Conference Companion. 419–422
work page 2024
-
[34]
Gašper Petelin and Gjorgjina Cenikj. 2025. The Pitfalls of Benchmarking in Algorithm Selection: What We Are Getting Wrong. InProceedings of the Genetic and Evolutionary Computation Conference. 1181–1189
work page 2025
-
[35]
Gašper Petelin, Gjorgjina Cenikj, and Tome Eftimov. 2024. Tinytla: Topological landscape analysis for optimization problem classification in a limited sample setting.Swarm and Evolutionary Computation84 (2024), 101448
work page 2024
-
[36]
Petr Pošík and Petr Baudiš. 2015. Dimension selection in axis-parallel brent- step method for black-box optimization of separable continuous functions. In Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO) Companion. 1151–1158
work page 2015
-
[37]
Raphael Patrick Prager, Moritz Vinzent Seiler, Heike Trautmann, and Pascal Kerschke. 2022. Automated algorithm selection in single-objective continuous optimization: a comparative study of deep learning and landscape analysis meth- ods. InInternational Conference on Parallel Problem Solving from Nature. Springer, 3–17
work page 2022
- [38]
-
[39]
Moritz Seiler, Urban Škvorc, Gjorgjina Cenikj, Carola Doerr, and Heike Traut- mann. 2024. Learned features vs. Classical ELA on affine BBOB functions. In International Conference on Parallel Problem Solving from Nature. Springer, 137– 153
work page 2024
-
[40]
Moritz Seiler, Urban Škvorc, Carola Doerr, and Heike Trautmann. 2024. Synergies of deep and classical exploratory landscape features for automated algorithm selection. InInternational Conference on Learning and Intelligent Optimization. Springer, 361–376
work page 2024
-
[41]
Moritz Vinzent Seiler, Pascal Kerschke, and Heike Trautmann. 2025. Deep-ela: Deep exploratory landscape analysis with self-supervised pretrained transform- ers for single-and multi-objective continuous optimization problems.Evolution- ary Computation(2025), 1–27
work page 2025
-
[42]
Urban Škvorc, Tome Eftimov, and Peter Korošec. 2021. The effect of sampling methods on the invariance to function transformations when using exploratory landscape analysis. In2021 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1139–1146
work page 2021
-
[43]
Bas van Stein, Fu Xing Long, Moritz Frenzel, Peter Krause, Markus Gitterle, and Thomas Bäck. 2023. Doe2vec: Deep-learning based features for exploratory landscape analysis. InProceedings of the Companion Conference on Genetic and Evolutionary Computation. 515–518
work page 2023
-
[44]
Diederick Vermetten, Furong Ye, Thomas Bäck, and Carola Doerr. 2025. MA- BBOB: A problem generator for black-box optimization using affine combinations and shifts.ACM Transactions on Evolutionary Learning5, 1 (2025), 1–19. Geometric Probing for Algorithm Selection in Continuous Black-Box Optimisation Table 5: Average wall-clock time (s) to generate the in...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.