pith. sign in

arxiv: 2105.04332 · v1 · submitted 2021-05-10 · 💻 cs.LG · stat.ML

Bayesian Optimistic Optimisation with Exponentially Decaying Regret

Pith reviewed 2026-05-24 12:38 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords Bayesian optimizationregret boundsGaussian processesMatérn kerneloptimistic optimizationblack-box optimization
0
0 comments X

The pith

The BOO algorithm achieves a regret bound of order O(N^{-√N}) for Bayesian optimization of black-box functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes the BOO algorithm as a practical method to find global optima of expensive black-box functions more efficiently than existing Bayesian optimization techniques. It achieves this by combining Bayesian optimization with tree-based optimistic optimization through space partitioning. Under the assumption that the objective is drawn from a Gaussian process with a Matérn kernel of sufficient smoothness, the algorithm attains an exponentially decaying regret bound O(N^{-√N}). Experiments on synthetic functions and machine learning tasks demonstrate superior performance over baselines.

Core claim

BOO is the first practical approach that can achieve an exponential regret bound with order O(N^{-√N}) in the noiseless setting for functions sampled from a Gaussian process with Matérn kernel where the smoothness parameter ν exceeds 4 plus D over 2.

What carries the argument

The BOO algorithm, which intertwines Bayesian optimization concepts with tree-based optimistic optimization based on partitioning the search space.

If this is right

  • The regret decreases exponentially faster than previous bounds ranging from O(log N / sqrt(N)) to O(e^{-sqrt(N)}).
  • It enables more efficient optimization with fewer function evaluations in the noiseless case.
  • It outperforms standard BO methods in practice on synthetic functions and hyperparameter tuning tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Hybrid algorithms merging probabilistic and deterministic optimization ideas may yield stronger theoretical guarantees in related search problems.
  • This approach could be tested for extensions to noisy observations or other kernel families beyond Matérn.
  • Practitioners might achieve better results in hyperparameter tuning by adopting this method when functions meet the smoothness condition.

Load-bearing premise

The objective function is sampled from a Gaussian process with a Matérn kernel having smoothness parameter ν greater than 4 plus half the number of dimensions.

What would settle it

A numerical simulation or theoretical construction showing that the regret under BOO exceeds O(N^{-√N}) for some function satisfying the Gaussian process and Matérn kernel conditions.

Figures

Figures reproduced from arXiv: 2105.04332 by Hung Tran-The, Santu Rana, Sunil Gupta, Svetha Venkatesh.

Figure 1
Figure 1. Figure 1: An illustration of partitioning procedure for m = 8: (a) SOO methods partition a cell by dividing the longest side into m equal parts; (b) our method sets m = a b and partitions a cell by dividing b = 3 longest sides into a = 2 equal parts. increases, the number of tree expansions decreases. We call this phenomenal the strict negative correlation of tree-based optimistic optimisation algorithms. Using the … view at source ↗
Figure 2
Figure 2. Figure 2: SOO samples the function at all m children nodes while our sampling strategy samples the function only at the parent node (the node selected for expansion). As a result, our strategy requires only one function evaluation irrespective of the value of m. . the function at its children (see [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of methods for Hartmann3 (D = 3), Schwefel (D = 3), and Shekel (D = 4) functions. 0 25 50 75 100 125 150 175 200 Iterations 2.45 2.40 2.35 2.30 2.25 2.20 Log Prediction Error ElasticNet GP-EI GP-UCB SOO BaMSOO IMGPO BOO 0 25 50 75 100 125 150 175 200 Iterations 4.0 3.9 3.8 3.7 3.6 3.5 Log Prediction Error MLP GP-EI GP-UCB SOO BaMSOO IMGPO BOO 0 200 400 600 800 1000 Iterations 7.70 7.65 7.60 7.55… view at source ↗
Figure 4
Figure 4. Figure 4: Log prediction error on MNIST dataset for different algorithms ElasticNet, MLP and XGBoost [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Bayesian optimisation (BO) is a well-known efficient algorithm for finding the global optimum of expensive, black-box functions. The current practical BO algorithms have regret bounds ranging from $\mathcal{O}(\frac{logN}{\sqrt{N}})$ to $\mathcal O(e^{-\sqrt{N}})$, where $N$ is the number of evaluations. This paper explores the possibility of improving the regret bound in the noiseless setting by intertwining concepts from BO and tree-based optimistic optimisation which are based on partitioning the search space. We propose the BOO algorithm, a first practical approach which can achieve an exponential regret bound with order $\mathcal O(N^{-\sqrt{N}})$ under the assumption that the objective function is sampled from a Gaussian process with a Mat\'ern kernel with smoothness parameter $\nu > 4 +\frac{D}{2}$, where $D$ is the number of dimensions. We perform experiments on optimisation of various synthetic functions and machine learning hyperparameter tuning tasks and show that our algorithm outperforms baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes the BOO algorithm, which combines Bayesian optimisation with tree-based optimistic optimisation via space partitioning. It claims to be the first practical method achieving an exponentially decaying regret bound of order O(N^{-√N}) in the noiseless setting, under the assumption that the objective function is sampled from a Gaussian process with Matérn kernel of smoothness ν > 4 + D/2. Experiments on synthetic functions and machine learning hyperparameter tuning tasks are reported to show outperformance over baselines.

Significance. If the claimed regret bound holds under the stated assumptions, the result would represent a meaningful theoretical advance for Bayesian optimisation by delivering a substantially faster rate than the O(log N / √N) to O(e^{-√N}) bounds of existing practical algorithms. The explicit conditioning on a high-smoothness GP prior is clearly stated, and the experimental component provides initial empirical grounding for the approach.

major comments (1)
  1. [Abstract] Abstract: the O(N^{-√N}) regret claim is load-bearing on the premise that the objective is drawn from a GP with Matérn kernel satisfying ν > 4 + D/2; the manuscript must explicitly verify that this smoothness threshold is both necessary for the derivation and sufficient to guarantee the stated rate without additional hidden regularity conditions.
minor comments (1)
  1. The abstract refers to 'various synthetic functions' without naming them or describing the experimental protocol (e.g., number of runs, noise model, or comparison metrics), which hinders reproducibility assessment.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and positive assessment of the significance. We address the single major comment below regarding the abstract and the smoothness threshold.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the O(N^{-√N}) regret claim is load-bearing on the premise that the objective is drawn from a GP with Matérn kernel satisfying ν > 4 + D/2; the manuscript must explicitly verify that this smoothness threshold is both necessary for the derivation and sufficient to guarantee the stated rate without additional hidden regularity conditions.

    Authors: The threshold ν > 4 + D/2 is the precise condition under which the sample-path regularity of the Matérn GP (Hölder continuity of order α = ν - D/2 > 2) enables the key concentration and optimistic partitioning lemmas in our analysis (Section 4) to yield the exponential regret without further regularity assumptions. Sufficiency is established directly by the proof; the bound does not hold for lower ν under the same partitioning scheme, though we do not provide a formal necessity proof. We will revise the abstract to restate the assumption more explicitly and add a short clarifying remark after Theorem 1 explaining the origin of the threshold from the GP regularity requirements. revision: yes

Circularity Check

0 steps flagged

No significant circularity; bound is conditional on explicit GP assumption

full rationale

The central result is a regret bound O(N^{-√N}) derived for the BOO algorithm under the stated assumption that the objective is drawn from a GP with Matérn kernel (ν > 4 + D/2). This is a standard conditional theoretical guarantee; the derivation chain does not reduce any prediction or uniqueness claim to a fitted parameter, self-definition, or load-bearing self-citation. No equations or steps in the provided claims exhibit the enumerated circular patterns. The result remains falsifiable outside the paper's own fitted values and is presented as holding only under the given function-class premise.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests primarily on the domain assumption of the function being drawn from a sufficiently smooth GP; no free parameters or new entities are mentioned in the abstract.

axioms (1)
  • domain assumption The objective function is sampled from a Gaussian process with Matérn kernel of smoothness ν > 4 + D/2 in the noiseless setting.
    This is the key assumption stated for achieving the O(N^{-√N}) regret bound.

pith-pipeline@v0.9.0 · 5709 in / 1268 out tokens · 48040 ms · 2026-05-24T12:38:38.063890+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    URL http://proceedings.mlr

    PMLR. URL http://proceedings.mlr. press/v70/chowdhury17a.html. De Freitas, N., Smola, A. J., and Zoghi, M. Exponential regret bounds for gaussian process bandits with determin- istic observations. In Proceedings of the 29th Interna- tional Coference on International Conference on Machine Learning, ICML’12, pp. 955–962, Madison, WI, USA,

  2. [2]

    Bayesian Optimization with Exponential Convergence

    Omnipress. ISBN 9781450312851. Floudas, C. A. Deterministic Global Optimization: Theory, Methods and (NONCONVEX OPTIMIZATION AND ITS APPLICATIONS Volume 37) (Nonconvex Optimization and Its Applications). Springer-Verlag, Berlin, Heidel- berg, 2005. ISBN 0792360141. Grill, J.-B., Valko, M., Munos, R., and Munos, R. Black-box optimization of noisy functions...

  3. [3]

    Srinivas, A

    PMLR. URL http://proceedings.mlr. press/v65/scarlett17a.html. Sen, R., Kandasamy, K., and Shakkottai, S. Multi-fidelity black-box optimization with hierarchical partitions. vol- ume 80 of Proceedings of Machine Learning Research, pp. 4538–4547, Stockholmsm¨assan, Stockholm Sweden, 10–15 Jul 2018. PMLR. Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M...

  4. [4]

    doi: 10.1007/ 978-1-4612-1494-6

    ISBN 0-387-98629-4. doi: 10.1007/ 978-1-4612-1494-6. URL http://dx.doi.org/ 10.1007/978-1-4612-1494-6 . Some theory for Kriging. Tran-The, H., Gupta, S., Rana, S., Ha, H., and Venkatesh, S. Sub-linear regret bounds for bayesian optimisation in unknown search spaces. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H. (eds.), Advances i...

  5. [5]

    The main difference between our proposed BOO algorithm and these algorithms are in the following blue color lines

    Review of SOO, BaMSOO, IMGPO algorithms In the first section of the Supplementary Material, we provide the details of SOO (Munos, 2011) and BamSOO (Wang et al., 2014). The main difference between our proposed BOO algorithm and these algorithms are in the following blue color lines. As we can see, SOO and BaMSOO select a node to be expanded at line 4 in eac...

  6. [6]

    We here generalize their proof to anym

    IMGPO uses a fixedm = 3. We here generalize their proof to anym. The Table 3 shows the strict negative correlation of tree-based optimistic optimisation algorithms like SOO, BamSOO, Bayesian Optimistic Optimisation with Exponentially Decaying Regret Algorithm 3 The BaMSOO Algorithm (Wang et al., 2014) Input: Parameterm Initialisation: Setg0,0 =f(c0,0),f + ...

  7. [7]

    Given any (a,b )∈M(m) and a partitioning procedureP (m;a,b ), then

    Proof of Lemma 1 Lemma 7 (Lemma 1 in the main paper). Given any (a,b )∈M(m) and a partitioning procedureP (m;a,b ), then

  8. [8]

    the longest side of a cell at depth h is at mosta−⌊bh D⌋, and

  9. [9]

    the smallest side of a cell at depth h is at leasta−⌈bh D⌉. Proof. We prove the statement by induction. At depthh = 1, we partition the search spaceX intom =ab cells using the partitioning procedureP (m;a,b ). There are two cases onb. • b =D. Then the longest side of a cell at depth h = 1 is 1/a =a−⌊ b D⌋. Also, the smallest side of a cell at depth h = 1 ...

  10. [10]

    Given a set of pointsDp−1 , we define the fill distanceFD(Dp−1,X ) as the largest distance from any point inX to the points inDp−1, as FD(Dp−1,X ) = supx∈X infci∈Dp−1||x−ci||

    Proof of Lemma 4 To derive an upper bound on variance functionσp as in Lemma 4, we use a concept, called the fill distance. Given a set of pointsDp−1 , we define the fill distanceFD(Dp−1,X ) as the largest distance from any point inX to the points inDp−1, as FD(Dp−1,X ) = supx∈X infci∈Dp−1||x−ci||. The following result, which is proven by Wu & Schaback (1992...

  11. [11]

    We prove this in the following Lemma 11

    Proof of Theorem 1 To prove Theorem 1, we will involve two stages: • Stage 1: we first prove that ifN is large enough, then under some assumptions, all the centers of nodes of expandable nodes will fall into the ballB(x∗,θ ) which is centered atx∗ with radiusθ as defined in Property 1. We prove this in the following Lemma 11. • Stage 2: when a set of expand...

  12. [12]

    We define a node (h,i ) at depth h as δ(h;a,b )-optimal ifU(ch,i)≥f(ch,i)−δ(h;a,b )

    Proof of Lemma 5 Let (h∗ p + 1,i∗) be an optimal node of depth h∗ p + 1 (i.e., x∗ ∈ Ah∗p+1,i∗). We define a node (h,i ) at depth h as δ(h;a,b )-optimal ifU(ch,i)≥f(ch,i)−δ(h;a,b ). We obtains the following result. Bayesian Optimistic Optimisation with Exponentially Decaying Regret Lemma 12. Assume thatf(ch∗p+1,i∗)≤U (ch∗p+1,i∗). Then any node (h∗ p + 1,i )...

  13. [13]

    Lemma 14

    Proof of Lemma 6 We useAp to denote the set of all points evaluated by the algorithm and all centers of optimal nodes of the treeTp afterp evaluations. Lemma 14. Pick aη∈ (0, 1). Setβp = 2log(π2p3/3η) andLp(c) =µp(c)−β1/2 p σp(c). With probability 1−η, we have Lp(c)≤f(c)≤U p(c), for everyp≥ 1 and for everyc∈Ap. Proof. Afterp evaluations, there are at most...

  14. [14]

    Assume that there is a partitioning procedureP (m;a,b ) wherea =O(N 1/D),b =D and 2≤ m < √ N− 1

    Proof of Theorem 2 Theorem 4 (Regret Bound). Assume that there is a partitioning procedureP (m;a,b ) wherea =O(N 1/D),b =D and 2≤ m < √ N− 1. Let the depth function hmax(p) =√p. We consider m2 < p≤ N, and defineh(p) as the smallest integerh such that h≥ √p−m− 1 C + 2, whereC is the constant defined by Theorem 1. Pick aη∈ (0, 1). Then for everyN≥N1, the loss...

  15. [15]

    We let node (h∗ p + 1,o∗) be the optimal node at depthh∗ p + 1

    Thus, since the node (h,j ) has been expanded, its GP-UCB value is at least as high as that of the some node (h∗ p + 1,o ) at depthh∗ p + 1, such that • (1) node (h∗ p + 1,o ) has been evaluated at somep′-th expansion before node (h,j ) and • (2) (h∗ p + 1,o )∈ argmax(h∗p+1,i)∈LUp′(ch∗p+1,i) (see Line 4 of Algorithm 1). We let node (h∗ p + 1,o∗) be the op...

  16. [16]

    Pick aη∈ (0, 1)

    Proof of Corollary 1 Corollary 2. Pick aη∈ (0, 1). There exists a constantN2 > 0 such that for everyN≥N2 we have that the simple regret of the proposed BOO with the partitioning procedureP (m;a,b ) wherea =⌊( √ N 2 ) 1 D⌋,b =D, is bounded as rN≤O (N− √ N), with probability 1−η. Proof. Witha =⌊( √ N 2 ) 1 D⌋ andb = D, m = ab≤ √ N/2. These conditions satisf...

  17. [17]

    The left plot shows different partitioning procedures while function sampling is fixed to our proposed scheme

    Ablation study between function sampling and partitioning procedure in the proposed BOO algorithm To show this point, we have performed additional experiments withm = 64 (see Figure below). The left plot shows different partitioning procedures while function sampling is fixed to our proposed scheme. We can see good improvement when b = D compared tob = 1 c...