pith. sign in

arxiv: 1911.11950 · v2 · submitted 2019-11-27 · 📊 stat.ML · cs.LG

Trading Convergence Rate with Computational Budget in High Dimensional Bayesian Optimization

Pith reviewed 2026-05-24 15:38 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords Bayesian optimizationhigh-dimensional optimizationregret boundsacquisition functionsubspacescumulative regretGP-UCB
0
0 comments X

The pith

Bayesian optimization in high dimensions achieves a regret bound free of the sqrt(D) factor by maximizing the acquisition function over a sufficiently large discrete set of low-dimensional embedded subspaces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a method to scale Bayesian optimization to high dimensions without assuming low-dimensional structure in the objective function. It restricts acquisition function maximization to a discrete collection of low-dimensional subspaces at each step, which fits within a fixed computational budget and yields accurate solutions. The resulting algorithm still guarantees sublinear cumulative regret, and the bound improves directly with the number of subspaces employed. When that number is large enough the bound becomes O^*(sqrt(T gamma_T)), removing the extra sqrt(D) factor present in the standard GP-UCB analysis.

Core claim

By performing acquisition maximization exclusively inside a discrete set of low-dimensional subspaces embedded in the original high-dimensional domain, the algorithm obtains accurate maximizers within the allotted computational budget and delivers a cumulative regret that grows sub-linearly with the iteration count T; when the number of subspaces is sufficiently large the bound tightens to at most O^*(sqrt(T gamma_T)), improving on the O^*(sqrt(D T gamma_T)) guarantee of GP-UCB.

What carries the argument

Maximization of the acquisition function restricted to a discrete collection of low-dimensional subspaces embedded in the high-dimensional input space.

If this is right

  • Cumulative regret grows only sub-linearly with the number of iterations.
  • The convergence rate can be traded directly against the number of subspaces allocated to the acquisition step.
  • Once the number of subspaces exceeds a sufficient threshold the sqrt(D) dependence disappears from the leading term of the regret bound.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practical performance will hinge on how the subspaces are selected or adapted over iterations, an aspect left open by the analysis.
  • The same subspace restriction idea could be applied to other acquisition functions or to non-Bayesian global optimization routines that face high-dimensional maximization bottlenecks.
  • Allocating extra computational budget to more subspaces yields a quantifiable improvement in the theoretical regret guarantee.

Load-bearing premise

A discrete set of embedded low-dimensional subspaces can be chosen so that accurate maximization inside them still supplies the exploration needed to keep the stated sub-linear regret bounds intact.

What would settle it

An explicit construction or numerical experiment in which the cumulative regret remains linear in T even after the number of subspaces is increased to the point where the acquisition maximizer inside each subspace is solved to high accuracy.

Figures

Figures reproduced from arXiv: 1911.11950 by Hung Tran-The, Santu Rana, Sunil Gupta, Svetha Venkatesh.

Figure 1
Figure 1. Figure 1: Comparison of baselines and the proposed MS-UCB method on four standard functions for 20, 50 and 100 input [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Log Regret vs iterations for varying number of [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: Left panel shows the progress of function values [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: The neural network benchmark with target di [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Scaling Bayesian optimisation (BO) to high-dimensional search spaces is a active and open research problems particularly when no assumptions are made on function structure. The main reason is that at each iteration, BO requires to find global maximisation of acquisition function, which itself is a non-convex optimization problem in the original search space. With growing dimensions, the computational budget for this maximisation gets increasingly short leading to inaccurate solution of the maximisation. This inaccuracy adversely affects both the convergence and the efficiency of BO. We propose a novel approach where the acquisition function only requires maximisation on a discrete set of low dimensional subspaces embedded in the original high-dimensional search space. Our method is free of any low dimensional structure assumption on the function unlike many recent high-dimensional BO methods. Optimising acquisition function in low dimensional subspaces allows our method to obtain accurate solutions within limited computational budget. We show that in spite of this convenience, our algorithm remains convergent. In particular, cumulative regret of our algorithm only grows sub-linearly with the number of iterations. More importantly, as evident from our regret bounds, our algorithm provides a way to trade the convergence rate with the number of subspaces used in the optimisation. Finally, when the number of subspaces is "sufficiently large", our algorithm's cumulative regret is at most $\mathcal{O}^{*}(\sqrt{T\gamma_T})$ as opposed to $\mathcal{O}^{*}(\sqrt{DT\gamma_T})$ for the GP-UCB of Srinivas et al. (2012), reducing a crucial factor $\sqrt{D}$ where $D$ being the dimensional number of input space.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes a high-dimensional Bayesian optimization method that restricts acquisition-function maximization to a discrete collection of low-dimensional subspaces embedded in the original D-dimensional domain. It claims sublinear cumulative regret that can be traded against the number of subspaces, and asserts that with a 'sufficiently large' number of subspaces the regret improves to O^*(sqrt(T gamma_T)) (removing the sqrt(D) factor from the GP-UCB bound of Srinivas et al. 2012) without any low-dimensional structure assumption on the objective.

Significance. If the subspace construction and accompanying regret analysis are rigorous and practical, the result would be significant: it offers an explicit computational-convergence trade-off for scaling BO while remaining assumption-free on f. The absence of any derivation, subspace-selection procedure, or cardinality bound in the abstract, however, prevents assessment of whether the claimed improvement is achievable.

major comments (2)
  1. [Abstract] Abstract: the central claim that regret reduces to O^*(sqrt(T gamma_T)) once the number of subspaces is 'sufficiently large' is load-bearing, yet the abstract supplies neither the embedding construction, the selection rule for the discrete set, nor any bound on its cardinality in terms of D or T. Without these, it is impossible to verify that maximization within the subspaces still satisfies the information-gain and confidence-bound conditions required by the GP-UCB regret analysis.
  2. [Abstract] Abstract: the method is stated to be free of low-dimensional structure assumptions on f, but the regret improvement still depends on the chosen subspaces guaranteeing that the selected points preserve the exploration properties of full-space GP-UCB. No argument or construction is given showing that such a discrete cover exists or that its size does not re-introduce a D-dependent cost.
minor comments (1)
  1. [Abstract] The O^* notation and the quantity gamma_T are used without definition or reference in the abstract; a brief reminder of their meaning would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed comments on our manuscript. We address each major comment below by pointing to the relevant sections of the paper where the constructions and analysis are provided, and we agree to revise the abstract for improved clarity on these points.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that regret reduces to O^*(sqrt(T gamma_T)) once the number of subspaces is 'sufficiently large' is load-bearing, yet the abstract supplies neither the embedding construction, the selection rule for the discrete set, nor any bound on its cardinality in terms of D or T. Without these, it is impossible to verify that maximization within the subspaces still satisfies the information-gain and confidence-bound conditions required by the GP-UCB regret analysis.

    Authors: The embedding construction (a discrete collection of randomly or deterministically chosen low-dimensional subspaces embedded in the D-dimensional domain), the selection rule (maximizing the acquisition function over the union of these subspaces), and the cardinality bound (a quantity polynomial in D that grows with T to achieve the improved rate) are all defined and analyzed in Sections 3 and 4 of the manuscript. The regret proof extends the standard GP-UCB analysis by bounding the information gain over the chosen subspaces and showing that the confidence bounds remain valid, thereby preserving the required conditions without any low-dimensional assumption on f. We will revise the abstract to include a concise reference to this construction and the cardinality bound. revision: yes

  2. Referee: [Abstract] Abstract: the method is stated to be free of low-dimensional structure assumptions on f, but the regret improvement still depends on the chosen subspaces guaranteeing that the selected points preserve the exploration properties of full-space GP-UCB. No argument or construction is given showing that such a discrete cover exists or that its size does not re-introduce a D-dependent cost.

    Authors: The manuscript provides an explicit construction of the discrete subspace cover that ensures the selected points achieve sufficient coverage to preserve the exploration properties used in the GP-UCB regret analysis. The cardinality is bounded in a manner that trades computational cost against the regret rate; when the number of subspaces meets the derived threshold, the sqrt(D) factor is removed from the leading term while the overall bound remains sublinear. Because the subspaces are chosen independently of f (to cover the domain), no structural assumption on the objective is required. The full argument and proof appear in Section 4. revision: no

Circularity Check

0 steps flagged

No significant circularity; regret bound extends external GP-UCB analysis

full rationale

The paper claims sublinear regret O^*(sqrt(T gamma_T)) for sufficiently many subspaces by adapting the information-gain and confidence-bound arguments from the external Srinivas et al. (2012) result, without any reduction of its own quantities to fitted parameters or self-citations. The subspace selection is introduced as an algorithmic design choice trading computation for accuracy, but the stated bound is not obtained by redefining the target regret in terms of the method's own outputs. No self-definitional, fitted-input, or ansatz-smuggling steps appear in the provided derivation outline; the comparison to the baseline O^*(sqrt(D T gamma_T)) is explicitly external and therefore non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Analysis rests on standard Gaussian process modeling and information-gain regret techniques from prior literature. The number of subspaces is a user-selected parameter for trading compute against rate rather than a fitted constant. No new physical or mathematical entities are introduced.

axioms (2)
  • domain assumption Gaussian process surrogate model with standard kernel assumptions
    Implicit foundation for acquisition function and regret analysis in Bayesian optimization.
  • domain assumption Information gain bounds used in GP-UCB analysis
    Referenced to derive the comparison with Srinivas et al. (2012).

pith-pipeline@v0.9.0 · 5821 in / 1589 out tokens · 34729 ms · 2026-05-24T15:38:06.541740+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 1 internal anchor

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...

  2. [2]

    Chlebus, E. 2009. An approximate formula for a partial sum of the divergent p-series. Appl. Math. Lett. 22:732--737

  3. [3]

    Djolonga, J.; Krause, A.; and Cevher, V. 2013. High-dimensional gaussian process bandits. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1 , NIPS'13, 1025--1033. USA: Curran Associates Inc

  4. [4]

    H.; Bindel, D.; and Wilson, A

    Eriksson, D.; Dong, K.; Lee, E. H.; Bindel, D.; and Wilson, A. G. 2018. Scaling gaussian process regression with derivatives. In Advances in Neural Information Processing Systems , 6868--6878

  5. [5]

    A.; and Hennig, P

    Garnett, R.; Osborne, M. A.; and Hennig, P. 2014. Active learning of linear embeddings for gaussian processes. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence , UAI'14, 230--239. Arlington, Virginia, United States: AUAI Press

  6. [6]

    Ghosal, S., and Roy, A. 2006. Posterior consistency of gaussian process prior for nonparametric binary regression. Ann. Statist. 34(5):2413--2429

  7. [7]

    N.; Hoang, Q

    Hoang, T. N.; Hoang, Q. M.; Ouyang, R.; and Low, K. H. 2018. Decentralized high-dimensional bayesian optimization with factor graphs. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI) , 3231--3238

  8. [8]

    Kandasamy. 2015. High dimensional bayesian optimisation and bandits via additive models. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37 , ICML'15, 295--304. JMLR.org

  9. [9]

    Kirschner, J.; Mutny, M.; Hiller, N.; Ischebeck, R.; and Krause, A. 2019. Adaptive and safe B ayesian optimization in high dimensions via one-dimensional subspaces. In Chaudhuri, K., and Salakhutdinov, R., eds., Proceedings of the 36th International Conference on Machine Learning , volume 97 of Proceedings of Machine Learning Research , 3429--3438. Long B...

  10. [10]

    LeCun, Y., and Cortes, C. 2010. MNIST handwritten digit database

  11. [11]

    Li, C.; Gupta, S.; Rana, S.; Nguyen, V.; Venkatesh, S.; and Shilton, A. 2017. High dimensional bayesian optimization using dropout. In Proceedings of the 26th International Joint Conference on Artificial Intelligence , IJCAI'17, 2096--2102. AAAI Press

  12. [12]

    Li, C.-L. 2016. High dimensional bayesian optimization via restricted projection pursuit models. In Gretton, A., and Robert, C. C., eds., Proceedings of the 19th International Conference on Artificial Intelligence and Statistics , volume 51 of Proceedings of Machine Learning Research , 884--892. Cadiz, Spain: PMLR

  13. [13]

    Mockus, J. 1974. On bayesian methods for seeking the extremum. In Proceedings of the IFIP Technical Conference , 400--404. London, UK, UK: Springer-Verlag

  14. [14]

    Mutn\' y , M., and Krause, A. 2018. Efficient high dimensional bayesian optimization with additivity and quadrature fourier features. In Proceedings of the 32Nd International Conference on Neural Information Processing Systems , NIPS'18, 9019--9030. USA: Curran Associates Inc

  15. [15]

    Nayebi, A.; Munteanu, A.; and Poloczek, M. 2019. A framework for B ayesian optimization in embedded subspaces. In Chaudhuri, K., and Salakhutdinov, R., eds., Proceedings of the 36th International Conference on Machine Learning , volume 97 of Proceedings of Machine Learning Research , 4752--4761. Long Beach, California, USA: PMLR

  16. [16]

    Newman, C. B. D., and Merz, C. 1998. UCI repository of machine learning databases

  17. [17]

    Oh, C.; Gavves, E.; and Welling, M. 2018. BOCK : Bayesian optimization with cylindrical kernels. In ICML , volume 80 of Proceedings of Machine Learning Research , 3865--3874. PMLR

  18. [18]

    Qian, H.; Hu, Y.-Q.; and Yu, Y. 2016. Derivative-free optimization of high-dimensional non-convex functions by sequential random embeddings. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence , IJCAI'16, 1946--1952. AAAI Press

  19. [19]

    Rana, S.; Li, C.; Gupta, S.; Nguyen, V.; and Venkatesh, S. 2017. High dimensional bayesian optimization with elastic gaussian process. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 , ICML'17, 2883--2891. JMLR.org

  20. [20]

    E., and Williams, C

    Rasmussen, C. E., and Williams, C. K. I. 2005. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) . The MIT Press

  21. [21]

    Rolland, P.; Scarlett, J.; Bogunovic, I.; and Cevher, V. 2018. High-dimensional bayesian optimization via additive models with overlapping groups. In Storkey, A., and Perez-Cruz, F., eds., Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics , volume 84 of Proceedings of Machine Learning Research , 298--307. P...

  22. [22]

    M.; and Seeger, M

    Srinivas, N.; Krause, A.; Kakade, S. M.; and Seeger, M. W. 2012. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Trans. Inf. Theor. 58(5):3250--3265

  23. [23]

    Wang, Z.; Zoghi, M.; Hutter, F.; Matheson, D.; and De Freitas, N. 2013. Bayesian optimization in high dimensions via random embeddings. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence , IJCAI '13, 1778--1784. AAAI Press

  24. [24]

    Wang, X. 2005. Volumes of generalized unit balls. Mathematics Magazine 78(5):390 -- 395

  25. [25]

    Zhang, M.; Li, H.; and Su, S. W. 2019. High dimensional bayesian optimization via supervised dimension reduction. CoRR abs/1907.08953