Trading Convergence Rate with Computational Budget in High Dimensional Bayesian Optimization
Pith reviewed 2026-05-24 15:38 UTC · model grok-4.3
The pith
Bayesian optimization in high dimensions achieves a regret bound free of the sqrt(D) factor by maximizing the acquisition function over a sufficiently large discrete set of low-dimensional embedded subspaces.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By performing acquisition maximization exclusively inside a discrete set of low-dimensional subspaces embedded in the original high-dimensional domain, the algorithm obtains accurate maximizers within the allotted computational budget and delivers a cumulative regret that grows sub-linearly with the iteration count T; when the number of subspaces is sufficiently large the bound tightens to at most O^*(sqrt(T gamma_T)), improving on the O^*(sqrt(D T gamma_T)) guarantee of GP-UCB.
What carries the argument
Maximization of the acquisition function restricted to a discrete collection of low-dimensional subspaces embedded in the high-dimensional input space.
If this is right
- Cumulative regret grows only sub-linearly with the number of iterations.
- The convergence rate can be traded directly against the number of subspaces allocated to the acquisition step.
- Once the number of subspaces exceeds a sufficient threshold the sqrt(D) dependence disappears from the leading term of the regret bound.
Where Pith is reading between the lines
- Practical performance will hinge on how the subspaces are selected or adapted over iterations, an aspect left open by the analysis.
- The same subspace restriction idea could be applied to other acquisition functions or to non-Bayesian global optimization routines that face high-dimensional maximization bottlenecks.
- Allocating extra computational budget to more subspaces yields a quantifiable improvement in the theoretical regret guarantee.
Load-bearing premise
A discrete set of embedded low-dimensional subspaces can be chosen so that accurate maximization inside them still supplies the exploration needed to keep the stated sub-linear regret bounds intact.
What would settle it
An explicit construction or numerical experiment in which the cumulative regret remains linear in T even after the number of subspaces is increased to the point where the acquisition maximizer inside each subspace is solved to high accuracy.
Figures
read the original abstract
Scaling Bayesian optimisation (BO) to high-dimensional search spaces is a active and open research problems particularly when no assumptions are made on function structure. The main reason is that at each iteration, BO requires to find global maximisation of acquisition function, which itself is a non-convex optimization problem in the original search space. With growing dimensions, the computational budget for this maximisation gets increasingly short leading to inaccurate solution of the maximisation. This inaccuracy adversely affects both the convergence and the efficiency of BO. We propose a novel approach where the acquisition function only requires maximisation on a discrete set of low dimensional subspaces embedded in the original high-dimensional search space. Our method is free of any low dimensional structure assumption on the function unlike many recent high-dimensional BO methods. Optimising acquisition function in low dimensional subspaces allows our method to obtain accurate solutions within limited computational budget. We show that in spite of this convenience, our algorithm remains convergent. In particular, cumulative regret of our algorithm only grows sub-linearly with the number of iterations. More importantly, as evident from our regret bounds, our algorithm provides a way to trade the convergence rate with the number of subspaces used in the optimisation. Finally, when the number of subspaces is "sufficiently large", our algorithm's cumulative regret is at most $\mathcal{O}^{*}(\sqrt{T\gamma_T})$ as opposed to $\mathcal{O}^{*}(\sqrt{DT\gamma_T})$ for the GP-UCB of Srinivas et al. (2012), reducing a crucial factor $\sqrt{D}$ where $D$ being the dimensional number of input space.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a high-dimensional Bayesian optimization method that restricts acquisition-function maximization to a discrete collection of low-dimensional subspaces embedded in the original D-dimensional domain. It claims sublinear cumulative regret that can be traded against the number of subspaces, and asserts that with a 'sufficiently large' number of subspaces the regret improves to O^*(sqrt(T gamma_T)) (removing the sqrt(D) factor from the GP-UCB bound of Srinivas et al. 2012) without any low-dimensional structure assumption on the objective.
Significance. If the subspace construction and accompanying regret analysis are rigorous and practical, the result would be significant: it offers an explicit computational-convergence trade-off for scaling BO while remaining assumption-free on f. The absence of any derivation, subspace-selection procedure, or cardinality bound in the abstract, however, prevents assessment of whether the claimed improvement is achievable.
major comments (2)
- [Abstract] Abstract: the central claim that regret reduces to O^*(sqrt(T gamma_T)) once the number of subspaces is 'sufficiently large' is load-bearing, yet the abstract supplies neither the embedding construction, the selection rule for the discrete set, nor any bound on its cardinality in terms of D or T. Without these, it is impossible to verify that maximization within the subspaces still satisfies the information-gain and confidence-bound conditions required by the GP-UCB regret analysis.
- [Abstract] Abstract: the method is stated to be free of low-dimensional structure assumptions on f, but the regret improvement still depends on the chosen subspaces guaranteeing that the selected points preserve the exploration properties of full-space GP-UCB. No argument or construction is given showing that such a discrete cover exists or that its size does not re-introduce a D-dependent cost.
minor comments (1)
- [Abstract] The O^* notation and the quantity gamma_T are used without definition or reference in the abstract; a brief reminder of their meaning would improve readability.
Simulated Author's Rebuttal
We thank the referee for the detailed comments on our manuscript. We address each major comment below by pointing to the relevant sections of the paper where the constructions and analysis are provided, and we agree to revise the abstract for improved clarity on these points.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that regret reduces to O^*(sqrt(T gamma_T)) once the number of subspaces is 'sufficiently large' is load-bearing, yet the abstract supplies neither the embedding construction, the selection rule for the discrete set, nor any bound on its cardinality in terms of D or T. Without these, it is impossible to verify that maximization within the subspaces still satisfies the information-gain and confidence-bound conditions required by the GP-UCB regret analysis.
Authors: The embedding construction (a discrete collection of randomly or deterministically chosen low-dimensional subspaces embedded in the D-dimensional domain), the selection rule (maximizing the acquisition function over the union of these subspaces), and the cardinality bound (a quantity polynomial in D that grows with T to achieve the improved rate) are all defined and analyzed in Sections 3 and 4 of the manuscript. The regret proof extends the standard GP-UCB analysis by bounding the information gain over the chosen subspaces and showing that the confidence bounds remain valid, thereby preserving the required conditions without any low-dimensional assumption on f. We will revise the abstract to include a concise reference to this construction and the cardinality bound. revision: yes
-
Referee: [Abstract] Abstract: the method is stated to be free of low-dimensional structure assumptions on f, but the regret improvement still depends on the chosen subspaces guaranteeing that the selected points preserve the exploration properties of full-space GP-UCB. No argument or construction is given showing that such a discrete cover exists or that its size does not re-introduce a D-dependent cost.
Authors: The manuscript provides an explicit construction of the discrete subspace cover that ensures the selected points achieve sufficient coverage to preserve the exploration properties used in the GP-UCB regret analysis. The cardinality is bounded in a manner that trades computational cost against the regret rate; when the number of subspaces meets the derived threshold, the sqrt(D) factor is removed from the leading term while the overall bound remains sublinear. Because the subspaces are chosen independently of f (to cover the domain), no structural assumption on the objective is required. The full argument and proof appear in Section 4. revision: no
Circularity Check
No significant circularity; regret bound extends external GP-UCB analysis
full rationale
The paper claims sublinear regret O^*(sqrt(T gamma_T)) for sufficiently many subspaces by adapting the information-gain and confidence-bound arguments from the external Srinivas et al. (2012) result, without any reduction of its own quantities to fitted parameters or self-citations. The subspace selection is introduced as an algorithmic design choice trading computation for accuracy, but the stated bound is not obtained by redefining the target regret in terms of the method's own outputs. No self-definitional, fitted-input, or ansatz-smuggling steps appear in the provided derivation outline; the comparison to the baseline O^*(sqrt(D T gamma_T)) is explicitly external and therefore non-circular.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Gaussian process surrogate model with standard kernel assumptions
- domain assumption Information gain bounds used in GP-UCB analysis
Reference graph
Works this paper leans on
-
[1]
" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...
-
[2]
Chlebus, E. 2009. An approximate formula for a partial sum of the divergent p-series. Appl. Math. Lett. 22:732--737
work page 2009
-
[3]
Djolonga, J.; Krause, A.; and Cevher, V. 2013. High-dimensional gaussian process bandits. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1 , NIPS'13, 1025--1033. USA: Curran Associates Inc
work page 2013
-
[4]
Eriksson, D.; Dong, K.; Lee, E. H.; Bindel, D.; and Wilson, A. G. 2018. Scaling gaussian process regression with derivatives. In Advances in Neural Information Processing Systems , 6868--6878
work page 2018
-
[5]
Garnett, R.; Osborne, M. A.; and Hennig, P. 2014. Active learning of linear embeddings for gaussian processes. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence , UAI'14, 230--239. Arlington, Virginia, United States: AUAI Press
work page 2014
-
[6]
Ghosal, S., and Roy, A. 2006. Posterior consistency of gaussian process prior for nonparametric binary regression. Ann. Statist. 34(5):2413--2429
work page 2006
-
[7]
Hoang, T. N.; Hoang, Q. M.; Ouyang, R.; and Low, K. H. 2018. Decentralized high-dimensional bayesian optimization with factor graphs. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI) , 3231--3238
work page 2018
-
[8]
Kandasamy. 2015. High dimensional bayesian optimisation and bandits via additive models. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37 , ICML'15, 295--304. JMLR.org
work page 2015
-
[9]
Kirschner, J.; Mutny, M.; Hiller, N.; Ischebeck, R.; and Krause, A. 2019. Adaptive and safe B ayesian optimization in high dimensions via one-dimensional subspaces. In Chaudhuri, K., and Salakhutdinov, R., eds., Proceedings of the 36th International Conference on Machine Learning , volume 97 of Proceedings of Machine Learning Research , 3429--3438. Long B...
work page 2019
-
[10]
LeCun, Y., and Cortes, C. 2010. MNIST handwritten digit database
work page 2010
-
[11]
Li, C.; Gupta, S.; Rana, S.; Nguyen, V.; Venkatesh, S.; and Shilton, A. 2017. High dimensional bayesian optimization using dropout. In Proceedings of the 26th International Joint Conference on Artificial Intelligence , IJCAI'17, 2096--2102. AAAI Press
work page 2017
-
[12]
Li, C.-L. 2016. High dimensional bayesian optimization via restricted projection pursuit models. In Gretton, A., and Robert, C. C., eds., Proceedings of the 19th International Conference on Artificial Intelligence and Statistics , volume 51 of Proceedings of Machine Learning Research , 884--892. Cadiz, Spain: PMLR
work page 2016
-
[13]
Mockus, J. 1974. On bayesian methods for seeking the extremum. In Proceedings of the IFIP Technical Conference , 400--404. London, UK, UK: Springer-Verlag
work page 1974
-
[14]
Mutn\' y , M., and Krause, A. 2018. Efficient high dimensional bayesian optimization with additivity and quadrature fourier features. In Proceedings of the 32Nd International Conference on Neural Information Processing Systems , NIPS'18, 9019--9030. USA: Curran Associates Inc
work page 2018
-
[15]
Nayebi, A.; Munteanu, A.; and Poloczek, M. 2019. A framework for B ayesian optimization in embedded subspaces. In Chaudhuri, K., and Salakhutdinov, R., eds., Proceedings of the 36th International Conference on Machine Learning , volume 97 of Proceedings of Machine Learning Research , 4752--4761. Long Beach, California, USA: PMLR
work page 2019
-
[16]
Newman, C. B. D., and Merz, C. 1998. UCI repository of machine learning databases
work page 1998
-
[17]
Oh, C.; Gavves, E.; and Welling, M. 2018. BOCK : Bayesian optimization with cylindrical kernels. In ICML , volume 80 of Proceedings of Machine Learning Research , 3865--3874. PMLR
work page 2018
-
[18]
Qian, H.; Hu, Y.-Q.; and Yu, Y. 2016. Derivative-free optimization of high-dimensional non-convex functions by sequential random embeddings. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence , IJCAI'16, 1946--1952. AAAI Press
work page 2016
-
[19]
Rana, S.; Li, C.; Gupta, S.; Nguyen, V.; and Venkatesh, S. 2017. High dimensional bayesian optimization with elastic gaussian process. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 , ICML'17, 2883--2891. JMLR.org
work page 2017
-
[20]
Rasmussen, C. E., and Williams, C. K. I. 2005. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) . The MIT Press
work page 2005
-
[21]
Rolland, P.; Scarlett, J.; Bogunovic, I.; and Cevher, V. 2018. High-dimensional bayesian optimization via additive models with overlapping groups. In Storkey, A., and Perez-Cruz, F., eds., Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics , volume 84 of Proceedings of Machine Learning Research , 298--307. P...
work page 2018
-
[22]
Srinivas, N.; Krause, A.; Kakade, S. M.; and Seeger, M. W. 2012. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Trans. Inf. Theor. 58(5):3250--3265
work page 2012
-
[23]
Wang, Z.; Zoghi, M.; Hutter, F.; Matheson, D.; and De Freitas, N. 2013. Bayesian optimization in high dimensions via random embeddings. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence , IJCAI '13, 1778--1784. AAAI Press
work page 2013
-
[24]
Wang, X. 2005. Volumes of generalized unit balls. Mathematics Magazine 78(5):390 -- 395
work page 2005
-
[25]
Zhang, M.; Li, H.; and Su, S. W. 2019. High dimensional bayesian optimization via supervised dimension reduction. CoRR abs/1907.08953
work page internal anchor Pith review Pith/arXiv arXiv 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.