Exploring Exploration in Bayesian Optimization
Pith reviewed 2026-05-23 03:25 UTC · model grok-4.3
The pith
Acquisition functions can have their exploration quantified using only the points they select via observation traveling salesman distance and observation entropy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Observation traveling salesman distance and observation entropy quantify the exploration characteristics of acquisition functions based solely on their selected observations. Applying these to several well-known acquisition functions across diverse black-box problems reveals links between exploration and empirical performance as well as new relationships among the functions themselves.
What carries the argument
Observation traveling salesman distance and observation entropy, which compute exploration directly from the sequence or set of points chosen by the acquisition function.
If this is right
- Acquisition functions can be compared and ranked by exploration level using only their chosen observations.
- Empirical performance differences among acquisition functions can be partly explained by differences in their measured exploration.
- New relationships and groupings among existing acquisition functions become visible through these exploration metrics.
- The measures supply a systematic basis for designing new acquisition functions with targeted exploration properties.
Where Pith is reading between the lines
- These metrics could be used to automatically select or switch acquisition functions mid-optimization based on desired exploration levels.
- If the metrics prove stable across problem classes, they might reduce the need for expensive full benchmarking when introducing new acquisition functions.
- Extending the metrics to track exploration over time rather than total might expose phases where different functions shift from exploration to exploitation.
Load-bearing premise
Exploration characteristics of an acquisition function can be isolated and quantified solely from the set of points it selects, without reference to the surrogate model or the acquisition function's internal formulation.
What would settle it
Finding two acquisition functions that select identical point sequences on a problem yet produce measurably different exploration behavior under these metrics, or showing that the metrics do not predict performance differences on new test problems.
Figures
read the original abstract
A well-balanced exploration-exploitation trade-off is crucial for successful acquisition functions in Bayesian optimization. However, there is a lack of quantitative measures for exploration, making it difficult to analyze and compare different acquisition functions. This work introduces two novel approaches - observation traveling salesman distance and observation entropy - to quantify the exploration characteristics of acquisition functions based on their selected observations. Using these measures, we examine the explorative nature of several well-known acquisition functions across a diverse set of black-box problems, uncover links between exploration and empirical performance, and reveal new relationships among existing acquisition functions. Beyond enabling a deeper understanding of acquisition functions, these measures also provide a foundation for guiding their design in a more principled and systematic manner.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces two novel measures—observation traveling salesman distance and observation entropy—to quantify the exploration characteristics of acquisition functions in Bayesian optimization based on their selected observation sets. It applies these to several well-known acquisition functions across a diverse set of black-box problems, examines links between exploration and empirical performance, and reveals new relationships among existing acquisition functions.
Significance. If the measures validly attribute differences in selected points to acquisition-function-specific exploration-exploitation logic (rather than surrogate updates or problem geometry), the work would supply a useful quantitative lens for comparing and designing acquisition functions. The empirical scope across multiple problems is a strength, but the central attribution claim requires verification that is not indicated in the provided description.
major comments (2)
- [Abstract] Abstract: the claim that the two measures 'quantify the exploration characteristics of acquisition functions' is load-bearing for the entire contribution, yet the abstract provides no indication that the metrics control for Gaussian-process surrogate dynamics, sequential decision effects, or black-box landscape geometry; differences in TSP distance or entropy could arise identically from these confounders regardless of the acquisition function.
- [Approach (definitions)] Approach section (definitions of observation traveling salesman distance and observation entropy): these quantities are computed solely from the final observation sets without reference to the surrogate model or the acquisition function's internal formulation; this directly risks the circularity and confounding concern that the measures do not isolate AF-specific exploration behavior.
minor comments (1)
- [Abstract] The abstract would benefit from explicitly naming the acquisition functions and benchmark problems examined so readers can immediately assess the scope of the empirical study.
Simulated Author's Rebuttal
We thank the referee for the detailed feedback. The two major comments both concern whether the proposed measures can validly attribute exploration differences to acquisition-function logic rather than surrogate dynamics or problem geometry. We address each point below and indicate where revisions will be made to clarify scope and limitations.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the two measures 'quantify the exploration characteristics of acquisition functions' is load-bearing for the entire contribution, yet the abstract provides no indication that the metrics control for Gaussian-process surrogate dynamics, sequential decision effects, or black-box landscape geometry; differences in TSP distance or entropy could arise identically from these confounders regardless of the acquisition function.
Authors: We agree that the abstract claim is strong and does not qualify the measures' dependence on the selected observation sets alone. The measures are intentionally post-hoc descriptors of the point sets produced by each acquisition function; they do not attempt to partial out surrogate updates or landscape effects. Our empirical results show consistent patterns across multiple problems and random seeds, but we accept that this does not constitute a controlled isolation. We will revise the abstract to state that the measures characterize the spatial properties of the observation sets generated by each acquisition function, without claiming full isolation from confounding factors. revision: yes
-
Referee: [Approach (definitions)] Approach section (definitions of observation traveling salesman distance and observation entropy): these quantities are computed solely from the final observation sets without reference to the surrogate model or the acquisition function's internal formulation; this directly risks the circularity and confounding concern that the measures do not isolate AF-specific exploration behavior.
Authors: The definitions are deliberately formulated on the observation sets only, because the exploration we wish to quantify is precisely the spatial distribution of points that the acquisition function ultimately selects. Including surrogate or internal AF terms would change the measures into something other than direct descriptors of exploration realized in the data. We recognize that this design choice leaves open the possibility that observed differences partly reflect surrogate dynamics or geometry rather than acquisition-function logic alone. We will add an explicit limitations paragraph in the Approach section acknowledging this and noting that future work could design controlled experiments that hold the surrogate and problem fixed while varying only the acquisition function. revision: yes
Circularity Check
No circularity: new metrics defined from observation sets and applied externally to existing AFs
full rationale
The paper introduces observation TSP distance and observation entropy as direct functions of the final selected point sets. These definitions are then used to measure and compare existing acquisition functions on black-box problems. No step reduces a claimed result to its own inputs by construction, renames a fit as a prediction, or relies on a load-bearing self-citation chain. The derivation chain consists of independent metric definitions followed by empirical application; the central claims therefore remain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard mathematical definitions of the traveling salesman problem distance and of entropy can be applied directly to finite sets of selected observation points.
invented entities (2)
-
observation traveling salesman distance
no independent evidence
-
observation entropy
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Understanding High-Dimensional Bayesian Optimization
Vanishing gradients from GP initialization explain HDBO failures; MLE of length scales suffices for SOTA performance, enabling the MSR variant on real-world tasks.
Reference graph
Works this paper leans on
-
[1]
PI is back! Switching Acquisition Functions in Bayesian Optimization
Carolin Benjamins, Elena Raponi, Anja Jankovic, Koen van der Blom, Maria Laura Santoni, Marius Lindauer, and Carola Doerr. PI is back! Switching Acquisition Functions in Bayesian Optimization. In 2022 NeurIPS Workshop on Gaussian Processes, Spatiotemporal Mod- eling, and Decision-making Systems,
work page 2022
-
[2]
A sequential Monte Carlo approach to Thompson sampling for Bayesian optimization
Hildo Bijl, Thomas B Schön, Jan-Willem van Wingerden, and Michel Verhaegen. A sequential Monte Carlo ap- proach to Thompson sampling for Bayesian optimization. arXiv preprint arXiv:1604.00169,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
Advanced population diversity measures in genetic programming
Edmund Burke, Steven Gustafson, Graham Kendall, and Na- talio Krasnogor. Advanced population diversity measures in genetic programming. In Parallel Problem Solving from Nature—PPSN VII: 7th International Conference Granada, Spain, September 7–11, 2002 Proceedings 7, pages 341–350. Springer,
work page 2002
-
[4]
Cambridge. Cambridge online dictionary. https://dictionary.cambridge.org/us/ dictionary/english/exploration. Accessed: 2025-02-08. Nachol Chaiyaratana, Theera Piroonratana, and Nuntapon Sangkawelert. Effects of diversity control in single- objective and multi-objective genetic algorithms. Journal of Heuristics, 13:1–34,
work page 2025
-
[5]
A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization
Nuojin Cheng, Leonard Papenmeier, Stephen Becker, and Luigi Nardi. A unified framework for entropy search and expected improvement in Bayesian optimization. arXiv preprint arXiv:2501.18756,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
URL https: //botorch.org/docs/tutorials/turbo_1/. Accessed: 2025-02-06. David Eriksson and Martin Jankowiak. High-Dimensional Bayesian Optimization with Sparse Axis-Aligned Sub- spaces. In Uncertainty in Artificial Intelligence, pages 493–503. PMLR,
work page 2025
-
[7]
Towards Gaus- sian process-based optimization with finite time horizon
David Ginsbourger and Rodolphe Le Riche. Towards Gaus- sian process-based optimization with finite time horizon. In mODa 9–Advances in Model-Oriented Design and Analysis: Proceedings of the 9th International Workshop in Model-Oriented Design and Analysis held in Bertinoro, Italy, June 14-18, 2010, pages 89–96. Springer,
work page 2010
-
[8]
The CMA Evolution Strategy: A Tutorial
Nikolaus Hansen. The CMA evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
Nikolaus Hansen, Youhei Akimoto, and Petr Baudis. CMA-ES/pycma on Github. Zenodo, DOI:10.5281/zenodo.2559634, February
-
[10]
URL https://doi.org/10.5281/zenodo. 2559634. Philipp Hennig and Christian J Schuler. Entropy Search for Information-Efficient Global Optimization. Journal of Machine Learning Research, 13(6),
-
[11]
Kazuyuki Inoue, Taku Hasegawa, Naoki Mori, and Keinosuke Matsumoto. Analyzing exploration exploita- tion trade-off by means of PI similarity index and dic- tyostelium based genetic algorithm. In 2015 IEEE Congress on Evolutionary Computation (CEC) , pages 2548–2555. IEEE,
work page 2015
-
[12]
Algorithms for multi-armed bandit problems
V olodymyr Kuleshov and Doina Precup. Algorithms for multi-armed bandit problems. arXiv preprint arXiv:1402.6028,
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Approximate algorithms for the traveling sales- person problem
Daniel J Rosenkrantz, Richard Edwin Stearns, and Philip M Lewis. Approximate algorithms for the traveling sales- person problem. In 15th Annual Symposium on Switching and Automata Theory (swat 1974), pages 33–42. IEEE,
work page 1974
-
[14]
Plan- ning to be surprised: Optimal Bayesian exploration in dynamic environments
Yi Sun, Faustino Gomez, and Jürgen Schmidhuber. Plan- ning to be surprised: Optimal Bayesian exploration in dynamic environments. In Artificial General Intelligence: 4th International Conference, AGI 2011, Mountain View, CA, USA, August 3-6,
work page 2011
-
[15]
Virtual library of simulation experiments: Test functions and datasets, optimization test problems
Sonja Surjanovic and Derek Bingham. Virtual library of simulation experiments: Test functions and datasets, optimization test problems. https://www.sfu. ca/~ssurjano/optimization.html. Accessed: 2024-09-01. Richard S Sutton. Reinforcement learning: An introduction. A Bradford Book,
work page 2024
-
[16]
Our proof follows from Balogh et al
11 Appendix (Supplementary Material) Leonard Papenmeier*1 Nuojin Cheng*2 Stephen Becker2 Luigi Nardi1, 3 1Department of Computer Science, Lund University, Lund, Sweden 2Department of Applied Mathematics, University of Colorado Boulder, Boulder, Colorado, USA 3DBtune A PROOFS A.1 OTSD UPPER BOUND Proof. Our proof follows from Balogh et al. [2024, Theorem 1...
work page 2024
-
[17]
extended these results to non-compactly supported densities, providing an upper bound for the bias of order O(t−2/d). A more recent study [Devroye and Györfi, 2021] presents a consistency result for the KL estimator even when the density function is not smooth. However, in our study the sample points in Xt are generally not independent and identically dis...
work page 2021
-
[18]
E RUNTIME ANALYSIS We analyze the runtimes for OTSD and OE
Figure 33: Optimization performance of the different UCB variations on the benchmarks. E RUNTIME ANALYSIS We analyze the runtimes for OTSD and OE. We measure OTSD and OE on random sequences of length 1000 in different dimensionalities, ranging from 10 – 1000 for OTSD and from 2 – 20 for OE. We repeat each experiment 20 times and observe the runtimes for c...
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.