pith. sign in

arxiv: 2604.09095 · v3 · pith:AVZGGTR2new · submitted 2026-04-10 · 💻 cs.LG · math.OC

GeoPAS: Geometric Probing for Algorithm Selection in Continuous Black-Box Optimization

Pith reviewed 2026-05-22 10:08 UTC · model grok-4.3

classification 💻 cs.LG math.OC
keywords algorithm selectionblack-box optimizationgeometric probingcontinuous optimizationsolver portfoliolandscape analysistransfer protocols
0
0 comments X

The pith

A geometric probing framework using random multi-scale 2D slices of objective landscapes enables effective solver selection for continuous black-box optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a method to represent each optimization problem instance through randomly sampled two-dimensional slices taken at multiple scales from the objective function landscape. These slices undergo encoding with validity-mask-aware visual pooling before aggregation into a compact instance descriptor. Solver selection then relies on a logarithmic composite score that blends a learned prediction from the descriptor with an empirical performance prior for each solver. A reader would care because black-box problems exhibit heavy-tailed performance variation across solvers, so reliable selection can avoid wasteful runs of mismatched algorithms and thereby lower overall computational cost. Experiments on a standard benchmark suite with twelve solvers show that the approach lowers aggregate mean relative expected running time from 30.37 for the single best solver to 3.14 and 3.61 under instance-level and grouped random transfer protocols while also lifting median and upper-tail results.

Core claim

The geometric probing framework represents each problem instance by randomly sampled multi-scale two-dimensional slices of the objective landscape encoded with validity-mask-aware visual pooling and aggregated into an instance representation, with solver selection performed by a logarithmic composite score combining a learned instance-conditioned estimate with an algorithm-side empirical prior, which reduces aggregate mean relative expected running time from 30.37 for the single best solver to 3.14 and 3.61 under instance-level and grouped random protocols while improving median and upper-tail performance.

What carries the argument

The geometric probing framework that samples multi-scale two-dimensional slices of the objective landscape and aggregates them via validity-mask-aware visual pooling to create representations for solver selection.

If this is right

  • Reduces aggregate mean relative expected running time from 30.37 to 3.14 under the instance-level transfer protocol.
  • Reduces aggregate mean relative expected running time to 3.61 under the grouped random transfer protocol.
  • Improves median and upper-tail performance under the within-suite transfer protocols.
  • Under problem-level transfer a prior-heavy scoring variant mitigates rare extreme failures though overall mean remains sensitive to outliers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The results suggest that inexpensive low-dimensional geometric samples can replace costlier full-dimensional analyses for the purpose of solver ranking.
  • If the same slice-based representations prove stable, they could support dynamic solver switching mid-optimization rather than one-time upfront selection.
  • The framework's reliance on random sampling implies that similar probing ideas might transfer to algorithm selection tasks outside continuous black-box settings.

Load-bearing premise

That randomly sampled multi-scale two-dimensional slices of the objective landscape capture sufficient solver-relevant information when encoded and aggregated.

What would settle it

A collection of problem instances in which similar slice encodings correspond to substantially different best-performing solvers from the portfolio would show that the probes fail to supply adequate distinguishing features.

Figures

Figures reproduced from arXiv: 2604.09095 by Jiabao Brad Wang, Mustafa Misir, Xiang Shi, Yiliang Yuan.

Figure 1
Figure 1. Figure 1: Illustration of multi-scale geometric probing. Oriented square slices with varying relative side length [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the GeoPAS visual encoder. Each sampled slice contributes a value map [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Histogram of relERT distribution of GeoPAS vs. SBS under LPO. The relERT axis is displayed in log-scale [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Selection frequencies of GeoPAS vs. VBS under LPO on (left) all datapoints, (middle) datapoints from SBS-non [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Heatmaps of mean (top), median (middle), and 90th [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Twenty sampled, normalised slices (value channel) of resolution [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Selection distributions and pick frequencies (GeoPAS vs. VBS) under LIO, Random, and LPO. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Survival curves P(relERT > 𝑡) for GeoPAS and SBS under LIO, Random, and LPO [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Heatmaps of mean (top), median (middle), and 90th percentile (bottom) [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
read the original abstract

Automated algorithm selection for continuous black-box optimization depends on representing problem information under limited probing and selecting solvers under heavy-tailed performance distributions. This paper proposes a geometric probing framework that represents each problem instance by randomly sampled multi-scale two-dimensional slices of the objective landscape. The slices are encoded with validity-mask-aware visual pooling and aggregated into an instance representation. Solver selection is then performed by a logarithmic composite score combining a learned instance-conditioned estimate with an algorithm-side empirical prior. The framework is evaluated on a standard single-objective black-box optimization benchmark suite with a portfolio of twelve solvers under instance-level, grouped random, and problem-level transfer protocols. Under the two within-suite protocols, it reduces aggregate mean relative expected running time from 30.37 for the single best solver to 3.14 and 3.61, while also improving median and upper-tail performance. Under problem-level transfer, the canonical adaptive setting improves typical and moderate-tail performance but leaves the mean dominated by rare extreme failures; a prior-heavy scoring variant mitigates this failure mode, although its robustness may be benchmark-dependent. The results suggest that coarse geometric probes provide useful solver-relevant information, while robust cross-problem selection also depends on metric-aligned decision scoring.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes GeoPAS, a geometric probing framework for automated algorithm selection in continuous black-box optimization. Each problem instance is represented by randomly sampled multi-scale two-dimensional slices of the objective landscape, which are encoded via validity-mask-aware visual pooling and aggregated. Solver selection employs a logarithmic composite score that combines a learned instance-conditioned estimate with an algorithm-side empirical prior. The method is evaluated on a standard BBOB-style benchmark with a portfolio of twelve solvers under instance-level, grouped-random, and problem-level transfer protocols, reporting reductions in aggregate mean relative expected running time from 30.37 (single-best solver) to 3.14 and 3.61 under the first two protocols, together with gains in median and upper-tail performance.

Significance. If the reported performance gains are reproducible and statistically supported, the work would be significant for the algorithm-selection community. It provides concrete evidence that coarse geometric probes can extract solver-relevant landscape information under limited evaluation budgets and offers a practical composite scoring rule that mitigates heavy-tailed failure modes. The explicit comparison across three transfer protocols and the discussion of prior-heavy variants are useful contributions. The absence of statistical tests, variance estimates, and training details in the current text, however, prevents a definitive assessment of robustness.

major comments (3)
  1. [Abstract] Abstract: The central performance claims state that mean relative ERT drops from 30.37 to 3.14/3.61 under instance-level and grouped-random protocols, yet no statistical tests, standard deviations, number of independent runs, or confidence intervals are supplied. Given the heavy-tailed nature of ERT distributions emphasized in the paper, these omissions make it impossible to judge whether the observed improvements are reliable or could be explained by variance.
  2. [Abstract / Method] The description of the learned instance-conditioned estimate does not specify the model class, feature dimensionality after pooling, training/validation split, or regularization used. Without these details it is unclear whether the large gains under the instance-level protocol reflect genuine generalization or inadvertent leakage from the evaluation data used to fit the selector.
  3. [Method / Experiments] The representational hypothesis—that random multi-scale 2D slices plus validity-mask-aware pooling suffice to distinguish solver performance in D ≫ 2—remains untested for non-separable or ill-conditioned functions. In such cases the planar probes capture only a vanishing fraction of variable interactions; an ablation that varies slice dimensionality or compares against full-dimensional descriptors would be required to substantiate the claim that the geometric representation is adequate.
minor comments (2)
  1. [Abstract] The benchmark suite is referred to only as 'standard single-objective black-box optimization benchmark suite'; explicitly naming BBOB and the function set (e.g., f1–f24) would improve reproducibility.
  2. [Method] Notation for the composite score (logarithmic combination of learned estimate and empirical prior) should be introduced with an equation number in the main text rather than left implicit.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important issues regarding statistical rigor, methodological transparency, and validation of the geometric representation. We address each major comment below and indicate the corresponding revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central performance claims state that mean relative ERT drops from 30.37 to 3.14/3.61 under instance-level and grouped-random protocols, yet no statistical tests, standard deviations, number of independent runs, or confidence intervals are supplied. Given the heavy-tailed nature of ERT distributions emphasized in the paper, these omissions make it impossible to judge whether the observed improvements are reliable or could be explained by variance.

    Authors: We agree that the lack of statistical support and variance reporting weakens the presentation of the results, particularly given the emphasis on heavy-tailed ERT behavior. In the revised manuscript we will report the number of independent runs performed, include standard deviations and 95% confidence intervals for all mean relative ERT figures, and add statistical significance tests (Wilcoxon signed-rank tests with Bonferroni correction) comparing GeoPAS against the single-best solver. These elements will be added to the abstract, the results tables, and the experimental analysis section. revision: yes

  2. Referee: [Abstract / Method] The description of the learned instance-conditioned estimate does not specify the model class, feature dimensionality after pooling, training/validation split, or regularization used. Without these details it is unclear whether the large gains under the instance-level protocol reflect genuine generalization or inadvertent leakage from the evaluation data used to fit the selector.

    Authors: We will revise the Method section to provide the missing specifications: the instance-conditioned model is a three-layer MLP with ReLU activations, the pooled feature vector is 256-dimensional, training uses an 80/20 instance-level split with early stopping on a held-out validation set, and L2 regularization plus dropout (p=0.3) are applied. To address the leakage concern, we will explicitly state that the instance-level protocol employs a leave-one-instance-out cross-validation scheme in which the selector is never trained on any instance used for final evaluation. This separation ensures that reported gains reflect generalization rather than data leakage. revision: yes

  3. Referee: [Method / Experiments] The representational hypothesis—that random multi-scale 2D slices plus validity-mask-aware pooling suffice to distinguish solver performance in D ≫ 2—remains untested for non-separable or ill-conditioned functions. In such cases the planar probes capture only a vanishing fraction of variable interactions; an ablation that varies slice dimensionality or compares against full-dimensional descriptors would be required to substantiate the claim that the geometric representation is adequate.

    Authors: We acknowledge that a direct ablation on slice dimensionality and a comparison against full-dimensional descriptors would provide stronger evidence. The BBOB suite already contains non-separable and ill-conditioned functions, and the reported gains hold across the full suite. In the revision we will add a per-category breakdown (separable vs. non-separable / ill-conditioned) and a limited ablation that varies the number of slices and scales. A comprehensive full-dimensional descriptor comparison is computationally prohibitive under the current budget but we will discuss its expected cost and include preliminary results where feasible. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical framework evaluated on external benchmarks

full rationale

The paper defines a geometric probing method that samples multi-scale 2D slices, applies validity-mask-aware visual pooling to form instance representations, and combines them with a logarithmic composite score for solver selection. These steps are presented as a new representational pipeline whose outputs are then tested empirically against a portfolio of 12 solvers on a standard BBOB-style suite under instance-level, grouped-random, and problem-level transfer protocols. The reported reductions in mean relative ERT (30.37 to 3.14/3.61) are measured outcomes on held-out or protocol-defined instances rather than quantities that reduce by the paper's own equations to parameters fitted on the same data. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the derivation chain; the central claim remains an empirical assertion about representational utility that can be falsified by the benchmark results themselves.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The abstract indicates a learned instance-conditioned estimate, which implies fitted parameters whose exact count and training procedure are not specified; no explicit axioms or invented entities are stated.

free parameters (1)
  • parameters of the learned instance-conditioned estimate
    The composite score relies on a learned predictor whose internal parameters must be fitted to training instances.

pith-pipeline@v0.9.0 · 5753 in / 1223 out tokens · 34159 ms · 2026-05-22T10:08:36.214284+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 3 internal anchors

  1. [1]

    Mohamad Alissa, Kevin Sim, and Emma Hart. 2023. Automated algorithm selection: from feature-based to feature-free approaches.Journal of Heuristics 29, 1 (2023), 1–38

  2. [2]

    Asma Atamna. 2015. Benchmarking IPOP-CMA-ES-TPA and IPOP-CMA-ES- MSR on the BBOB noiseless testbed. InProceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO) Companion. 1135–1142

  3. [3]

    Anne Auger, Dimo Brockhoff, and Nikolaus Hansen. 2013. Benchmarking the local metamodel CMA-ES on the noiseless BBOB’2013 test bed. InProceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO) Companion. 1225–1232

  4. [4]

    Petr Baudiš and Petr Pošík. 2015. Global line search algorithm hybridized with quadratic interpolation and its extension to separable functions. InProceedings of the 2015 annual conference on genetic and evolutionary computation. 257–264

  5. [5]

    Albert S Berahas, Richard H Byrd, and Jorge Nocedal. 2019. Derivative-free optimization of noisy functions via quasi-Newton methods.SIAM Journal on Optimization29, 2 (2019), 965–993

  6. [6]

    Bernd Bischl, Olaf Mersmann, Heike Trautmann, and Mike Preuß. 2012. Al- gorithm selection based on exploratory landscape analysis and cost-sensitive learning. InProceedings of the 14th annual conference on Genetic and evolutionary computation. 313–320

  7. [7]

    Gjorgjina Cenikj, Ana Nikolikj, and Tome Eftimov. 2025. Recent Advances in Meta-features Used for Representing Black-box Single-objective Continuous Op- timization. InProceedings of the Genetic and Evolutionary Computation Conference Companion. 1471–1494

  8. [8]

    Gjorgjina Cenikj, Ana Nikolikj, Gašper Petelin, Niki Van Stein, Carola Doerr, and Tome Eftimov. 2026. A survey of features used for representing black-box single-objective continuous optimization.Swarm and Evolutionary Computation 101 (2026), 102288

  9. [9]

    Gjorgjina Cenikj, Gašper Petelin, and Tome Eftimov. 2024. Transoptas: Transformer-based algorithm selection for single-objective optimization. InPro- ceedings of the Genetic and Evolutionary Computation Conference Companion. 403–406

  10. [10]

    Gjorgjina Cenikj, Gašper Petelin, Moritz Seiler, Nikola Cenikj, and Tome Eftimov

  11. [11]

    Landscape features in single-objective continuous optimization: Have we hit a wall in algorithm selection generalization?Swarm and Evolutionary Computation94 (2025), 101894

  12. [12]

    Konstantin Dietrich, Diederick Vermetten, Carola Doerr, and Pascal Kerschke

  13. [13]

    InProceedings of the Genetic and Evolutionary Computation Conference

    Impact of training instance selection on automated algorithm selection models for numerical black-box optimization. InProceedings of the Genetic and Evolutionary Computation Conference. 1007–1016

  14. [14]

    Nikolaus Hansen. 2016. The CMA evolution strategy: A tutorial.arXiv preprint arXiv:1604.00772(2016)

  15. [15]

    Nikolaus Hansen, Anne Auger, Raymond Ros, Olaf Mersmann, Tea Tušar, and Dimo Brockhoff. 2021. COCO: A platform for comparing continuous optimizers in a black-box setting.Optimization Methods and Software36, 1 (2021), 114–144

  16. [16]

    2009.Real- Parameter Black-Box Optimization Benchmarking 2009: Noiseless Functions Def- initions

    Nikolaus Hansen, Steffen Finck, Raymond Ros, and Anne Auger. 2009.Real- Parameter Black-Box Optimization Benchmarking 2009: Noiseless Functions Def- initions. Research Report RR-6829. INRIA. https://inria.hal.science/inria- 00362633v2 Version 2, HAL Id: inria-00362633

  17. [17]

    Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. 2013. An evaluation of sequential model-based optimization for expensive blackbox functions. In Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO) Companion. 1209–1216

  18. [18]

    Waltraud Huyer and Arnold Neumaier. 2009. Benchmarking of MCS on the noiseless function testbed.Online, 2009c. URL http://www. mat. univie. ac. at/˜ neum/papers. html989 (2009)

  19. [19]

    Stefan Ivić, Siniša Družeta, and Luka Grbčić. 2025. Randomness as Refer- ence: Benchmark Metric for Optimization in Engineering.arXiv preprint arXiv:2511.17226(2025)

  20. [20]

    Pascal Kerschke, Holger H Hoos, Frank Neumann, and Heike Trautmann. 2019. Automated algorithm selection: Survey and perspectives.Evolutionary computa- tion27, 1 (2019), 3–45

  21. [21]

    Pascal Kerschke and Heike Trautmann. 2019. Automated algorithm selection on continuous black-box problems by combining exploratory landscape analysis and machine learning.Evolutionary computation27, 1 (2019), 99–127

  22. [22]

    Ana Kostovska, Carola Doerr, Sašo Džeroski, Panče Panov, and Tome Eftimov

  23. [23]

    InProceedings of the Genetic and Evolutionary Wang et al

    Geometric Learning in Black-Box Optimization: A GNN Framework for Algorithm Performance Prediction. InProceedings of the Genetic and Evolutionary Wang et al. Computation Conference Companion. 487–490

  24. [24]

    Manuel López-Ibáñez, Diederick Vermetten, Johann Dreo, and Carola Doerr. 2024. Using the empirical attainment function for analyzing single-objective black-box optimization algorithms.IEEE Transactions on Evolutionary Computation(2024)

  25. [25]

    Ilya Loshchilov, Marc Schoenauer, and Michèle Sebag. 2013. Bi-population CMA-ES agorithms with surrogate models and line searches. InProceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO) Companion. 1177–1184

  26. [26]

    Zeyuan Ma, Hongshu Guo, Yue-Jiao Gong, Jun Zhang, and Kay Chen Tan. 2025. Toward automated algorithm design: A survey and practical guide to meta-black- box-optimization.IEEE Transactions on Evolutionary Computation(2025)

  27. [27]

    Francesco Mezzadri. 2006. How to generate random matrices from the classical compact groups.arXiv preprint math-ph/0609050(2006)

  28. [28]

    Ana Nikolikj, Ana Kostovska, Gjorgjina Cenikj, Carola Doerr, and Tome Eftimov

  29. [29]

    In2024 IEEE Congress on Evolutionary Computation (CEC)

    Generalization Ability of Feature-Based Performance Prediction Models: A Statistical Analysis Across Benchmarks. In2024 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1–8

  30. [30]

    Art B Owen. 1998. Scrambling Sobol’and Niederreiter–Xing Points.Journal of complexity14, 4 (1998), 466–489

  31. [31]

    László Pál. 2013. Benchmarking a hybrid multi level single linkage algorithm on the BBOB noiseless testbed. InProceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO) Companion. 1145–1152

  32. [32]

    László Pál. 2013. Comparison of multistart global optimization algorithms on the BBOB noiseless testbed. InProceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO) Companion. 1153–1160

  33. [33]

    Gašper Petelin and Gjorgjina Cenikj. 2024. On Generalization of ELA Feature Groups. InProceedings of the Genetic and Evolutionary Computation Conference Companion. 419–422

  34. [34]

    Gašper Petelin and Gjorgjina Cenikj. 2025. The Pitfalls of Benchmarking in Algorithm Selection: What We Are Getting Wrong. InProceedings of the Genetic and Evolutionary Computation Conference. 1181–1189

  35. [35]

    Gašper Petelin, Gjorgjina Cenikj, and Tome Eftimov. 2024. Tinytla: Topological landscape analysis for optimization problem classification in a limited sample setting.Swarm and Evolutionary Computation84 (2024), 101448

  36. [36]

    Petr Pošík and Petr Baudiš. 2015. Dimension selection in axis-parallel brent- step method for black-box optimization of separable continuous functions. In Proceedings of the Annual Conference on Genetic and Evolutionary Computation (GECCO) Companion. 1151–1158

  37. [37]

    Raphael Patrick Prager, Moritz Vinzent Seiler, Heike Trautmann, and Pascal Kerschke. 2022. Automated algorithm selection in single-objective continuous optimization: a comparative study of deep learning and landscape analysis meth- ods. InInternational Conference on Parallel Problem Solving from Nature. Springer, 3–17

  38. [38]

    Iván Olarte Rodríguez, Maria Laura Santoni, Fabian Duddeck, Carola Doerr, Thomas Bäck, and Elena Raponi. 2025. MECHBench: A Set of Black-Box Op- timization Benchmarks originated from Structural Mechanics.arXiv preprint arXiv:2511.10821(2025)

  39. [39]

    Moritz Seiler, Urban Škvorc, Gjorgjina Cenikj, Carola Doerr, and Heike Traut- mann. 2024. Learned features vs. Classical ELA on affine BBOB functions. In International Conference on Parallel Problem Solving from Nature. Springer, 137– 153

  40. [40]

    Moritz Seiler, Urban Škvorc, Carola Doerr, and Heike Trautmann. 2024. Synergies of deep and classical exploratory landscape features for automated algorithm selection. InInternational Conference on Learning and Intelligent Optimization. Springer, 361–376

  41. [41]

    Moritz Vinzent Seiler, Pascal Kerschke, and Heike Trautmann. 2025. Deep-ela: Deep exploratory landscape analysis with self-supervised pretrained transform- ers for single-and multi-objective continuous optimization problems.Evolution- ary Computation(2025), 1–27

  42. [42]

    Urban Škvorc, Tome Eftimov, and Peter Korošec. 2021. The effect of sampling methods on the invariance to function transformations when using exploratory landscape analysis. In2021 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1139–1146

  43. [43]

    Bas van Stein, Fu Xing Long, Moritz Frenzel, Peter Krause, Markus Gitterle, and Thomas Bäck. 2023. Doe2vec: Deep-learning based features for exploratory landscape analysis. InProceedings of the Companion Conference on Genetic and Evolutionary Computation. 515–518

  44. [44]

    Diederick Vermetten, Furong Ye, Thomas Bäck, and Carola Doerr. 2025. MA- BBOB: A problem generator for black-box optimization using affine combinations and shifts.ACM Transactions on Evolutionary Learning5, 1 (2025), 1–19. Geometric Probing for Algorithm Selection in Continuous Black-Box Optimisation Table 5: Average wall-clock time (s) to generate the in...