Bayesian Optimistic Optimisation with Exponentially Decaying Regret

Hung Tran-The; Santu Rana; Sunil Gupta; Svetha Venkatesh

arxiv: 2105.04332 · v1 · submitted 2021-05-10 · 💻 cs.LG · stat.ML

Bayesian Optimistic Optimisation with Exponentially Decaying Regret

Hung Tran-The , Sunil Gupta , Santu Rana , Svetha Venkatesh This is my paper

Pith reviewed 2026-05-24 12:38 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords Bayesian optimizationregret boundsGaussian processesMatérn kerneloptimistic optimizationblack-box optimization

0 comments

The pith

The BOO algorithm achieves a regret bound of order O(N^{-√N}) for Bayesian optimization of black-box functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes the BOO algorithm as a practical method to find global optima of expensive black-box functions more efficiently than existing Bayesian optimization techniques. It achieves this by combining Bayesian optimization with tree-based optimistic optimization through space partitioning. Under the assumption that the objective is drawn from a Gaussian process with a Matérn kernel of sufficient smoothness, the algorithm attains an exponentially decaying regret bound O(N^{-√N}). Experiments on synthetic functions and machine learning tasks demonstrate superior performance over baselines.

Core claim

BOO is the first practical approach that can achieve an exponential regret bound with order O(N^{-√N}) in the noiseless setting for functions sampled from a Gaussian process with Matérn kernel where the smoothness parameter ν exceeds 4 plus D over 2.

What carries the argument

The BOO algorithm, which intertwines Bayesian optimization concepts with tree-based optimistic optimization based on partitioning the search space.

If this is right

The regret decreases exponentially faster than previous bounds ranging from O(log N / sqrt(N)) to O(e^{-sqrt(N)}).
It enables more efficient optimization with fewer function evaluations in the noiseless case.
It outperforms standard BO methods in practice on synthetic functions and hyperparameter tuning tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid algorithms merging probabilistic and deterministic optimization ideas may yield stronger theoretical guarantees in related search problems.
This approach could be tested for extensions to noisy observations or other kernel families beyond Matérn.
Practitioners might achieve better results in hyperparameter tuning by adopting this method when functions meet the smoothness condition.

Load-bearing premise

The objective function is sampled from a Gaussian process with a Matérn kernel having smoothness parameter ν greater than 4 plus half the number of dimensions.

What would settle it

A numerical simulation or theoretical construction showing that the regret under BOO exceeds O(N^{-√N}) for some function satisfying the Gaussian process and Matérn kernel conditions.

Figures

Figures reproduced from arXiv: 2105.04332 by Hung Tran-The, Santu Rana, Sunil Gupta, Svetha Venkatesh.

**Figure 1.** Figure 1: An illustration of partitioning procedure for m = 8: (a) SOO methods partition a cell by dividing the longest side into m equal parts; (b) our method sets m = a b and partitions a cell by dividing b = 3 longest sides into a = 2 equal parts. increases, the number of tree expansions decreases. We call this phenomenal the strict negative correlation of tree-based optimistic optimisation algorithms. Using the … view at source ↗

**Figure 2.** Figure 2: SOO samples the function at all m children nodes while our sampling strategy samples the function only at the parent node (the node selected for expansion). As a result, our strategy requires only one function evaluation irrespective of the value of m. . the function at its children (see [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of methods for Hartmann3 (D = 3), Schwefel (D = 3), and Shekel (D = 4) functions. 0 25 50 75 100 125 150 175 200 Iterations 2.45 2.40 2.35 2.30 2.25 2.20 Log Prediction Error ElasticNet GP-EI GP-UCB SOO BaMSOO IMGPO BOO 0 25 50 75 100 125 150 175 200 Iterations 4.0 3.9 3.8 3.7 3.6 3.5 Log Prediction Error MLP GP-EI GP-UCB SOO BaMSOO IMGPO BOO 0 200 400 600 800 1000 Iterations 7.70 7.65 7.60 7.55… view at source ↗

**Figure 4.** Figure 4: Log prediction error on MNIST dataset for different algorithms ElasticNet, MLP and XGBoost [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Bayesian optimisation (BO) is a well-known efficient algorithm for finding the global optimum of expensive, black-box functions. The current practical BO algorithms have regret bounds ranging from $\mathcal{O}(\frac{logN}{\sqrt{N}})$ to $\mathcal O(e^{-\sqrt{N}})$, where $N$ is the number of evaluations. This paper explores the possibility of improving the regret bound in the noiseless setting by intertwining concepts from BO and tree-based optimistic optimisation which are based on partitioning the search space. We propose the BOO algorithm, a first practical approach which can achieve an exponential regret bound with order $\mathcal O(N^{-\sqrt{N}})$ under the assumption that the objective function is sampled from a Gaussian process with a Mat\'ern kernel with smoothness parameter $\nu > 4 +\frac{D}{2}$, where $D$ is the number of dimensions. We perform experiments on optimisation of various synthetic functions and machine learning hyperparameter tuning tasks and show that our algorithm outperforms baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BOO claims an exponentially decaying regret bound but only under a very strong smoothness assumption that limits its reach.

read the letter

The main takeaway is that this paper introduces BOO, which combines Bayesian optimization with tree-based optimistic optimization to reach a regret bound of order O(N^{-√N}) in the noiseless setting. That is a sharper rate than the typical practical BO bounds, and they present it as the first method to hit this under an explicit assumption on the function class. They also include experiments on synthetic functions and hyperparameter tuning that show it beating some baselines. The assumption is stated up front: the objective must be drawn from a GP with Matérn kernel where ν > 4 + D/2. That is the central condition for the bound to hold, and the paper does not hide it. The combination of ideas is the concrete new element, and the experiments give at least initial evidence that the algorithm runs and performs reasonably on the tested cases. The derivation itself is not visible in the abstract, but the stress-test note confirms no obvious circularity or hidden premise in the claim structure. The soft spot is the assumption. Requiring that level of smoothness becomes restrictive quickly as dimension grows, and many black-box problems will not satisfy it. The noiseless restriction further narrows the scope. The experiments are limited in scope and do not directly verify the asymptotic rate, so they mainly show competitive behavior rather than confirming the theoretical improvement. This work is aimed at researchers who follow regret analysis in Bayesian optimization and black-box methods. A reader focused on theoretical rates would find the combination worth examining, while someone looking for immediately applicable algorithms might see less value. The paper makes a clear enough claim with stated conditions that it deserves a serious referee to check the proof details and experimental design. I would recommend sending it to peer review rather than desk rejection.

Referee Report

1 major / 1 minor

Summary. The paper proposes the BOO algorithm, which combines Bayesian optimisation with tree-based optimistic optimisation via space partitioning. It claims to be the first practical method achieving an exponentially decaying regret bound of order O(N^{-√N}) in the noiseless setting, under the assumption that the objective function is sampled from a Gaussian process with Matérn kernel of smoothness ν > 4 + D/2. Experiments on synthetic functions and machine learning hyperparameter tuning tasks are reported to show outperformance over baselines.

Significance. If the claimed regret bound holds under the stated assumptions, the result would represent a meaningful theoretical advance for Bayesian optimisation by delivering a substantially faster rate than the O(log N / √N) to O(e^{-√N}) bounds of existing practical algorithms. The explicit conditioning on a high-smoothness GP prior is clearly stated, and the experimental component provides initial empirical grounding for the approach.

major comments (1)

[Abstract] Abstract: the O(N^{-√N}) regret claim is load-bearing on the premise that the objective is drawn from a GP with Matérn kernel satisfying ν > 4 + D/2; the manuscript must explicitly verify that this smoothness threshold is both necessary for the derivation and sufficient to guarantee the stated rate without additional hidden regularity conditions.

minor comments (1)

The abstract refers to 'various synthetic functions' without naming them or describing the experimental protocol (e.g., number of runs, noise model, or comparison metrics), which hinders reproducibility assessment.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful review and positive assessment of the significance. We address the single major comment below regarding the abstract and the smoothness threshold.

read point-by-point responses

Referee: [Abstract] Abstract: the O(N^{-√N}) regret claim is load-bearing on the premise that the objective is drawn from a GP with Matérn kernel satisfying ν > 4 + D/2; the manuscript must explicitly verify that this smoothness threshold is both necessary for the derivation and sufficient to guarantee the stated rate without additional hidden regularity conditions.

Authors: The threshold ν > 4 + D/2 is the precise condition under which the sample-path regularity of the Matérn GP (Hölder continuity of order α = ν - D/2 > 2) enables the key concentration and optimistic partitioning lemmas in our analysis (Section 4) to yield the exponential regret without further regularity assumptions. Sufficiency is established directly by the proof; the bound does not hold for lower ν under the same partitioning scheme, though we do not provide a formal necessity proof. We will revise the abstract to restate the assumption more explicitly and add a short clarifying remark after Theorem 1 explaining the origin of the threshold from the GP regularity requirements. revision: yes

Circularity Check

0 steps flagged

No significant circularity; bound is conditional on explicit GP assumption

full rationale

The central result is a regret bound O(N^{-√N}) derived for the BOO algorithm under the stated assumption that the objective is drawn from a GP with Matérn kernel (ν > 4 + D/2). This is a standard conditional theoretical guarantee; the derivation chain does not reduce any prediction or uniqueness claim to a fitted parameter, self-definition, or load-bearing self-citation. No equations or steps in the provided claims exhibit the enumerated circular patterns. The result remains falsifiable outside the paper's own fitted values and is presented as holding only under the given function-class premise.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests primarily on the domain assumption of the function being drawn from a sufficiently smooth GP; no free parameters or new entities are mentioned in the abstract.

axioms (1)

domain assumption The objective function is sampled from a Gaussian process with Matérn kernel of smoothness ν > 4 + D/2 in the noiseless setting.
This is the key assumption stated for achieving the O(N^{-√N}) regret bound.

pith-pipeline@v0.9.0 · 5709 in / 1268 out tokens · 48040 ms · 2026-05-24T12:38:38.063890+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

under the assumption that the objective function is sampled from a Gaussian process with a Matérn kernel with smoothness parameter ν > 4 + D/2
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

hierarchical partition ... m-ary tree ... partitioning procedure P(m;a,b)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

[1]

URL http://proceedings.mlr

PMLR. URL http://proceedings.mlr. press/v70/chowdhury17a.html. De Freitas, N., Smola, A. J., and Zoghi, M. Exponential regret bounds for gaussian process bandits with determin- istic observations. In Proceedings of the 29th Interna- tional Coference on International Conference on Machine Learning, ICML’12, pp. 955–962, Madison, WI, USA,

work page
[2]

Bayesian Optimization with Exponential Convergence

Omnipress. ISBN 9781450312851. Floudas, C. A. Deterministic Global Optimization: Theory, Methods and (NONCONVEX OPTIMIZATION AND ITS APPLICATIONS Volume 37) (Nonconvex Optimization and Its Applications). Springer-Verlag, Berlin, Heidel- berg, 2005. ISBN 0792360141. Grill, J.-B., Valko, M., Munos, R., and Munos, R. Black-box optimization of noisy functions...

work page internal anchor Pith review Pith/arXiv arXiv 2005
[3]

Srinivas, A

PMLR. URL http://proceedings.mlr. press/v65/scarlett17a.html. Sen, R., Kandasamy, K., and Shakkottai, S. Multi-ﬁdelity black-box optimization with hierarchical partitions. vol- ume 80 of Proceedings of Machine Learning Research, pp. 4538–4547, Stockholmsm¨assan, Stockholm Sweden, 10–15 Jul 2018. PMLR. Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M...

work page doi:10.1109/tit.2011.2182033 2018
[4]

doi: 10.1007/ 978-1-4612-1494-6

ISBN 0-387-98629-4. doi: 10.1007/ 978-1-4612-1494-6. URL http://dx.doi.org/ 10.1007/978-1-4612-1494-6 . Some theory for Kriging. Tran-The, H., Gupta, S., Rana, S., Ha, H., and Venkatesh, S. Sub-linear regret bounds for bayesian optimisation in unknown search spaces. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H. (eds.), Advances i...

work page doi:10.1007/978-1-4612-1494-6 2020
[5]

The main difference between our proposed BOO algorithm and these algorithms are in the following blue color lines

Review of SOO, BaMSOO, IMGPO algorithms In the ﬁrst section of the Supplementary Material, we provide the details of SOO (Munos, 2011) and BamSOO (Wang et al., 2014). The main difference between our proposed BOO algorithm and these algorithms are in the following blue color lines. As we can see, SOO and BaMSOO select a node to be expanded at line 4 in eac...

work page 2011
[6]

We here generalize their proof to anym

IMGPO uses a ﬁxedm = 3. We here generalize their proof to anym. The Table 3 shows the strict negative correlation of tree-based optimistic optimisation algorithms like SOO, BamSOO, Bayesian Optimistic Optimisation with Exponentially Decaying Regret Algorithm 3 The BaMSOO Algorithm (Wang et al., 2014) Input: Parameterm Initialisation: Setg0,0 =f(c0,0),f + ...

work page 2014
[7]

Given any (a,b )∈M(m) and a partitioning procedureP (m;a,b ), then

Proof of Lemma 1 Lemma 7 (Lemma 1 in the main paper). Given any (a,b )∈M(m) and a partitioning procedureP (m;a,b ), then

work page
[8]

the longest side of a cell at depth h is at mosta−⌊bh D⌋, and

work page
[9]

the smallest side of a cell at depth h is at leasta−⌈bh D⌉. Proof. We prove the statement by induction. At depthh = 1, we partition the search spaceX intom =ab cells using the partitioning procedureP (m;a,b ). There are two cases onb. • b =D. Then the longest side of a cell at depth h = 1 is 1/a =a−⌊ b D⌋. Also, the smallest side of a cell at depth h = 1 ...

work page
[10]

Given a set of pointsDp−1 , we deﬁne the ﬁll distanceFD(Dp−1,X ) as the largest distance from any point inX to the points inDp−1, as FD(Dp−1,X ) = supx∈X infci∈Dp−1||x−ci||

Proof of Lemma 4 To derive an upper bound on variance functionσp as in Lemma 4, we use a concept, called the ﬁll distance. Given a set of pointsDp−1 , we deﬁne the ﬁll distanceFD(Dp−1,X ) as the largest distance from any point inX to the points inDp−1, as FD(Dp−1,X ) = supx∈X infci∈Dp−1||x−ci||. The following result, which is proven by Wu & Schaback (1992...

work page 1992
[11]

We prove this in the following Lemma 11

Proof of Theorem 1 To prove Theorem 1, we will involve two stages: • Stage 1: we ﬁrst prove that ifN is large enough, then under some assumptions, all the centers of nodes of expandable nodes will fall into the ballB(x∗,θ ) which is centered atx∗ with radiusθ as deﬁned in Property 1. We prove this in the following Lemma 11. • Stage 2: when a set of expand...

work page
[12]

We deﬁne a node (h,i ) at depth h as δ(h;a,b )-optimal ifU(ch,i)≥f(ch,i)−δ(h;a,b )

Proof of Lemma 5 Let (h∗ p + 1,i∗) be an optimal node of depth h∗ p + 1 (i.e., x∗ ∈ Ah∗p+1,i∗). We deﬁne a node (h,i ) at depth h as δ(h;a,b )-optimal ifU(ch,i)≥f(ch,i)−δ(h;a,b ). We obtains the following result. Bayesian Optimistic Optimisation with Exponentially Decaying Regret Lemma 12. Assume thatf(ch∗p+1,i∗)≤U (ch∗p+1,i∗). Then any node (h∗ p + 1,i )...

work page
[13]

Lemma 14

Proof of Lemma 6 We useAp to denote the set of all points evaluated by the algorithm and all centers of optimal nodes of the treeTp afterp evaluations. Lemma 14. Pick aη∈ (0, 1). Setβp = 2log(π2p3/3η) andLp(c) =µp(c)−β1/2 p σp(c). With probability 1−η, we have Lp(c)≤f(c)≤U p(c), for everyp≥ 1 and for everyc∈Ap. Proof. Afterp evaluations, there are at most...

work page 2012
[14]

Assume that there is a partitioning procedureP (m;a,b ) wherea =O(N 1/D),b =D and 2≤ m < √ N− 1

Proof of Theorem 2 Theorem 4 (Regret Bound). Assume that there is a partitioning procedureP (m;a,b ) wherea =O(N 1/D),b =D and 2≤ m < √ N− 1. Let the depth function hmax(p) =√p. We consider m2 < p≤ N, and deﬁneh(p) as the smallest integerh such that h≥ √p−m− 1 C + 2, whereC is the constant deﬁned by Theorem 1. Pick aη∈ (0, 1). Then for everyN≥N1, the loss...

work page
[15]

We let node (h∗ p + 1,o∗) be the optimal node at depthh∗ p + 1

Thus, since the node (h,j ) has been expanded, its GP-UCB value is at least as high as that of the some node (h∗ p + 1,o ) at depthh∗ p + 1, such that • (1) node (h∗ p + 1,o ) has been evaluated at somep′-th expansion before node (h,j ) and • (2) (h∗ p + 1,o )∈ argmax(h∗p+1,i)∈LUp′(ch∗p+1,i) (see Line 4 of Algorithm 1). We let node (h∗ p + 1,o∗) be the op...

work page
[16]

Pick aη∈ (0, 1)

Proof of Corollary 1 Corollary 2. Pick aη∈ (0, 1). There exists a constantN2 > 0 such that for everyN≥N2 we have that the simple regret of the proposed BOO with the partitioning procedureP (m;a,b ) wherea =⌊( √ N 2 ) 1 D⌋,b =D, is bounded as rN≤O (N− √ N), with probability 1−η. Proof. Witha =⌊( √ N 2 ) 1 D⌋ andb = D, m = ab≤ √ N/2. These conditions satisf...

work page
[17]

The left plot shows different partitioning procedures while function sampling is ﬁxed to our proposed scheme

Ablation study between function sampling and partitioning procedure in the proposed BOO algorithm To show this point, we have performed additional experiments withm = 64 (see Figure below). The left plot shows different partitioning procedures while function sampling is ﬁxed to our proposed scheme. We can see good improvement when b = D compared tob = 1 c...

work page

[1] [1]

URL http://proceedings.mlr

PMLR. URL http://proceedings.mlr. press/v70/chowdhury17a.html. De Freitas, N., Smola, A. J., and Zoghi, M. Exponential regret bounds for gaussian process bandits with determin- istic observations. In Proceedings of the 29th Interna- tional Coference on International Conference on Machine Learning, ICML’12, pp. 955–962, Madison, WI, USA,

work page

[2] [2]

Bayesian Optimization with Exponential Convergence

Omnipress. ISBN 9781450312851. Floudas, C. A. Deterministic Global Optimization: Theory, Methods and (NONCONVEX OPTIMIZATION AND ITS APPLICATIONS Volume 37) (Nonconvex Optimization and Its Applications). Springer-Verlag, Berlin, Heidel- berg, 2005. ISBN 0792360141. Grill, J.-B., Valko, M., Munos, R., and Munos, R. Black-box optimization of noisy functions...

work page internal anchor Pith review Pith/arXiv arXiv 2005

[3] [3]

Srinivas, A

PMLR. URL http://proceedings.mlr. press/v65/scarlett17a.html. Sen, R., Kandasamy, K., and Shakkottai, S. Multi-ﬁdelity black-box optimization with hierarchical partitions. vol- ume 80 of Proceedings of Machine Learning Research, pp. 4538–4547, Stockholmsm¨assan, Stockholm Sweden, 10–15 Jul 2018. PMLR. Srinivas, N., Krause, A., Kakade, S. M., and Seeger, M...

work page doi:10.1109/tit.2011.2182033 2018

[4] [4]

doi: 10.1007/ 978-1-4612-1494-6

ISBN 0-387-98629-4. doi: 10.1007/ 978-1-4612-1494-6. URL http://dx.doi.org/ 10.1007/978-1-4612-1494-6 . Some theory for Kriging. Tran-The, H., Gupta, S., Rana, S., Ha, H., and Venkatesh, S. Sub-linear regret bounds for bayesian optimisation in unknown search spaces. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H. (eds.), Advances i...

work page doi:10.1007/978-1-4612-1494-6 2020

[5] [5]

The main difference between our proposed BOO algorithm and these algorithms are in the following blue color lines

Review of SOO, BaMSOO, IMGPO algorithms In the ﬁrst section of the Supplementary Material, we provide the details of SOO (Munos, 2011) and BamSOO (Wang et al., 2014). The main difference between our proposed BOO algorithm and these algorithms are in the following blue color lines. As we can see, SOO and BaMSOO select a node to be expanded at line 4 in eac...

work page 2011

[6] [6]

We here generalize their proof to anym

IMGPO uses a ﬁxedm = 3. We here generalize their proof to anym. The Table 3 shows the strict negative correlation of tree-based optimistic optimisation algorithms like SOO, BamSOO, Bayesian Optimistic Optimisation with Exponentially Decaying Regret Algorithm 3 The BaMSOO Algorithm (Wang et al., 2014) Input: Parameterm Initialisation: Setg0,0 =f(c0,0),f + ...

work page 2014

[7] [7]

Given any (a,b )∈M(m) and a partitioning procedureP (m;a,b ), then

Proof of Lemma 1 Lemma 7 (Lemma 1 in the main paper). Given any (a,b )∈M(m) and a partitioning procedureP (m;a,b ), then

work page

[8] [8]

the longest side of a cell at depth h is at mosta−⌊bh D⌋, and

work page

[9] [9]

the smallest side of a cell at depth h is at leasta−⌈bh D⌉. Proof. We prove the statement by induction. At depthh = 1, we partition the search spaceX intom =ab cells using the partitioning procedureP (m;a,b ). There are two cases onb. • b =D. Then the longest side of a cell at depth h = 1 is 1/a =a−⌊ b D⌋. Also, the smallest side of a cell at depth h = 1 ...

work page

[10] [10]

Given a set of pointsDp−1 , we deﬁne the ﬁll distanceFD(Dp−1,X ) as the largest distance from any point inX to the points inDp−1, as FD(Dp−1,X ) = supx∈X infci∈Dp−1||x−ci||

Proof of Lemma 4 To derive an upper bound on variance functionσp as in Lemma 4, we use a concept, called the ﬁll distance. Given a set of pointsDp−1 , we deﬁne the ﬁll distanceFD(Dp−1,X ) as the largest distance from any point inX to the points inDp−1, as FD(Dp−1,X ) = supx∈X infci∈Dp−1||x−ci||. The following result, which is proven by Wu & Schaback (1992...

work page 1992

[11] [11]

We prove this in the following Lemma 11

Proof of Theorem 1 To prove Theorem 1, we will involve two stages: • Stage 1: we ﬁrst prove that ifN is large enough, then under some assumptions, all the centers of nodes of expandable nodes will fall into the ballB(x∗,θ ) which is centered atx∗ with radiusθ as deﬁned in Property 1. We prove this in the following Lemma 11. • Stage 2: when a set of expand...

work page

[12] [12]

We deﬁne a node (h,i ) at depth h as δ(h;a,b )-optimal ifU(ch,i)≥f(ch,i)−δ(h;a,b )

Proof of Lemma 5 Let (h∗ p + 1,i∗) be an optimal node of depth h∗ p + 1 (i.e., x∗ ∈ Ah∗p+1,i∗). We deﬁne a node (h,i ) at depth h as δ(h;a,b )-optimal ifU(ch,i)≥f(ch,i)−δ(h;a,b ). We obtains the following result. Bayesian Optimistic Optimisation with Exponentially Decaying Regret Lemma 12. Assume thatf(ch∗p+1,i∗)≤U (ch∗p+1,i∗). Then any node (h∗ p + 1,i )...

work page

[13] [13]

Lemma 14

Proof of Lemma 6 We useAp to denote the set of all points evaluated by the algorithm and all centers of optimal nodes of the treeTp afterp evaluations. Lemma 14. Pick aη∈ (0, 1). Setβp = 2log(π2p3/3η) andLp(c) =µp(c)−β1/2 p σp(c). With probability 1−η, we have Lp(c)≤f(c)≤U p(c), for everyp≥ 1 and for everyc∈Ap. Proof. Afterp evaluations, there are at most...

work page 2012

[14] [14]

Assume that there is a partitioning procedureP (m;a,b ) wherea =O(N 1/D),b =D and 2≤ m < √ N− 1

Proof of Theorem 2 Theorem 4 (Regret Bound). Assume that there is a partitioning procedureP (m;a,b ) wherea =O(N 1/D),b =D and 2≤ m < √ N− 1. Let the depth function hmax(p) =√p. We consider m2 < p≤ N, and deﬁneh(p) as the smallest integerh such that h≥ √p−m− 1 C + 2, whereC is the constant deﬁned by Theorem 1. Pick aη∈ (0, 1). Then for everyN≥N1, the loss...

work page

[15] [15]

We let node (h∗ p + 1,o∗) be the optimal node at depthh∗ p + 1

Thus, since the node (h,j ) has been expanded, its GP-UCB value is at least as high as that of the some node (h∗ p + 1,o ) at depthh∗ p + 1, such that • (1) node (h∗ p + 1,o ) has been evaluated at somep′-th expansion before node (h,j ) and • (2) (h∗ p + 1,o )∈ argmax(h∗p+1,i)∈LUp′(ch∗p+1,i) (see Line 4 of Algorithm 1). We let node (h∗ p + 1,o∗) be the op...

work page

[16] [16]

Pick aη∈ (0, 1)

Proof of Corollary 1 Corollary 2. Pick aη∈ (0, 1). There exists a constantN2 > 0 such that for everyN≥N2 we have that the simple regret of the proposed BOO with the partitioning procedureP (m;a,b ) wherea =⌊( √ N 2 ) 1 D⌋,b =D, is bounded as rN≤O (N− √ N), with probability 1−η. Proof. Witha =⌊( √ N 2 ) 1 D⌋ andb = D, m = ab≤ √ N/2. These conditions satisf...

work page

[17] [17]

The left plot shows different partitioning procedures while function sampling is ﬁxed to our proposed scheme

Ablation study between function sampling and partitioning procedure in the proposed BOO algorithm To show this point, we have performed additional experiments withm = 64 (see Figure below). The left plot shows different partitioning procedures while function sampling is ﬁxed to our proposed scheme. We can see good improvement when b = D compared tob = 1 c...

work page