pith. sign in

arxiv: 2502.09198 · v2 · pith:HNBUC3PNnew · submitted 2025-02-13 · 💻 cs.LG

Understanding High-Dimensional Bayesian Optimization

Pith reviewed 2026-05-23 03:25 UTC · model grok-4.3

classification 💻 cs.LG
keywords high-dimensional Bayesian optimizationGaussian processesvanishing gradientsmaximum likelihood estimationlength scaleslocal searchreal-world applications
0
0 comments X

The pith

Vanishing gradients from Gaussian process initialization schemes cause most high-dimensional Bayesian optimization failures, while maximum likelihood estimation of length scales suffices for state-of-the-art performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines why simple Bayesian optimization methods succeed on high-dimensional real-world tasks despite prior expectations that they would fail. Empirical tests identify vanishing gradients triggered by standard GP initialization as a central obstacle to effective search. Approaches that encourage local search outperform those focused on global exploration, and the authors show that maximum likelihood estimation of GP length scales alone reaches top performance levels. They introduce a straightforward MSR variant of MLE that exploits these observations to deliver state-of-the-art results on a broad collection of real applications.

Core claim

Our empirical analysis shows that vanishing gradients caused by Gaussian process (GP) initialization schemes play a major role in the failures of high-dimensional Bayesian optimization (HDBO) and that methods that promote local search behaviors are better suited for the task. We find that maximum likelihood estimation (MLE) of GP length scales suffices for state-of-the-art performance. Based on this, we propose a simple variant of MLE called MSR that leverages these findings to achieve state-of-the-art performance on a comprehensive set of real-world applications.

What carries the argument

Vanishing gradients induced by common Gaussian process initialization schemes, countered by maximum likelihood estimation of length scales that favors local search behavior.

If this is right

  • Maximum likelihood estimation of GP length scales alone reaches state-of-the-art results in high-dimensional settings.
  • Methods that promote local search outperform those that emphasize global exploration.
  • The MSR variant of MLE attains state-of-the-art performance on diverse real-world applications.
  • Targeted experiments can isolate and confirm the contribution of vanishing gradients to HDBO failures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • High-dimensional problems may reward focused local exploitation more than broad exploration strategies.
  • Similar initialization and length-scale adjustments could simplify other Gaussian-process-based optimizers.
  • MSR might be tested on synthetic high-dimensional functions with known global optima to separate local-search benefits from benchmark-specific effects.
  • The results suggest that many reported failures of high-dimensional BO may be fixable with standard tools rather than requiring entirely new algorithms.

Load-bearing premise

The performance gaps observed across the tested real-world applications arise primarily from the vanishing-gradient mechanism rather than from other unexamined elements of the experimental design or benchmark selection.

What would settle it

A controlled trial that alters only the GP initialization to eliminate vanishing gradients while holding all other factors fixed, after which the performance advantage of MLE-based local-search methods disappears.

Figures

Figures reproduced from arXiv: 2502.09198 by Leonard Papenmeier, Luigi Nardi, Matthias Poloczek.

Figure 1
Figure 1. Figure 1: Maximum MLE gradient magnitude for the 50 first gra￾dient steps initialized with different initial length scales (y-axis) and problem dimensionalities (x-axis). With short initial length scales, the gradients vanish even for low dimensions. 3. Facets of the Curse of Dimensionality This section discusses how the curse of dimensionality im￾pacts high-dimensional Bayesian optimization (HDBO) and techniques to… view at source ↗
Figure 3
Figure 3. Figure 3: Average distances between the initial and the final can￾didates of LogEI for various model length scales and dimension￾alities without RAASP sampling. Values in the gray region are numerically zero. In high dimensions, the gradient of the AF van￾ishes, causing no movement of the gradient-based optimizer. optimized with gradient-based approaches. Thus, these ‘flat’ areas of the AF also lead to vanishing gra… view at source ↗
Figure 4
Figure 4. Figure 4: Left: Average distances between the initial and the final candidates of LogEI with RAASP sampling. The vanishing gradi￾ent issue decreases. Right: Fraction of multi-start GD candidates originating from the RAASP samples when evaluating LogEI on random samples. In high dimensions, RAASP samples are increas￾ingly more likely to get picked, even for longer length scales. create several candidates evaluated on… view at source ↗
Figure 6
Figure 6. Figure 6: Average length scales (y-axis) obtained by MLE (blue) and MAP (orange) for different numbers of randomly sampled observations (x-axis) for a 10- and for a 50-dimensional GP prior sample. The obtained length scales differ substantially for the higher dimensional function if few points have been observed. MLE exhibits a higher variance and sensitivity to noise, particularly when fitting a model in high-dimen… view at source ↗
Figure 8
Figure 8. Figure 8: BO with the ‘scaled’ initialization of MLE performs com￾parably to the state-of-the-art in HDBO. simplifying our analysis compared to wider priors, which reduce the difference between MLE and MAP. Compared to MLE, the MAP estimates vary less but exhibit significant bias. This is pronounced for the 50-dimensional GP sample, where the MAP estimates for the length scales revert to the prior mode for 100, 200,… view at source ↗
Figure 10
Figure 10. Figure 10: DSP exhibits the least exploration. MLE with fixed initial length scales performs like random search on Ant and Humanoid. the lower-dimensional benchmarks, being in line with our analysis of the bias-variance trade-off in Sec. 3.3. At the beginning of its execution, the BO algorithm that uses MLE with scaled initial length scales (‘MLE (scaled)’) uses longer length scales than all other methods. The resul… view at source ↗
Figure 11
Figure 11. Figure 11: Distribution of EI values for GPs in various dimensionalities. When conditioning on the same amount of data points and maintaining the length scale as the dimensionality grows, the distribution of EI values becomes more peaked. As discussed by (Ament et al., 2024), EI often suffers from vanishing gradients, which only worsens in high-dimensional spaces due to the plethora of flat regions. This is shown in… view at source ↗
Figure 12
Figure 12. Figure 12: Number of gradient updates for the AF optimization for MSR, and with and without RAASP sampling. RAASP sampling reduces the number of gradient updates. C. Additional Experiments C.1. Ranking of Optimization Algorithms MSR DSP Bounce MLE (scaled) MLE (ℓ = ln 2) Mopta08 (d = 124) 1 2 5 3 4 Lasso-DNA (d = 180) 4 1 5 3 2 Ant (d = 888) 2 4 3 1 5 Humanoid (d = 6392) 2 1 - 3 4 [PITH_FULL_IMAGE:figures/full_fig_… view at source ↗
Figure 13
Figure 13. Figure 13: Mean absolute value of the gradients for the different MLE methods, including the proposed MSR. The constant length scale initialization exhibits vanishing gradients for the high-dimensional Ant and Humanoid problems. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗
Figure 15
Figure 15. Figure 15: OTSD (solid lines) and performance curves (dashed lines) of the 100-dimensional Levy function Figs. 15 and 16 show the OTSD and performance plots for the 100-dimensional Levy and Griewank functions [PITH_FULL_IMAGE:figures/full_fig_p017_15.png] view at source ↗
Figure 14
Figure 14. Figure 14: OTSD (solid lines) and performance curves (dashed lines) of the 100-dimensional Schwefel function 0 500 1000 iteration 10 1 10 2 10 3 best value Griewank100 0 200 400 OTSD CMA-ES best value DSP OTSD TuRBO [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗
Figure 16
Figure 16. Figure 16: OTSD (solid lines) and performance curves (dashed lines) of the 100-dimensional Griewank function 10 5 0 5 x1 10 10 5 0 5 10 x2 0 20 40 60 80 Levy 500 0 x1 500 500 0 500 x2 0 50 100 150 Griewank 500 250 0 250 x1 500 500 250 0 250 500 x2 0 500 1000 1500 Schwefel [PITH_FULL_IMAGE:figures/full_fig_p018_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: The two-dimensional versions of the Levy, Griewank, and Schwefel benchmark functions used above. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: LogEI run on a two-dimensional GP prior sample for 100 evaluations. The right panel shows the posterior mean at the end of the optimization. For highly multimodal benchmarks, EI reverts to a local search behavior and does not obtain a global optimum (red cross). 19 [PITH_FULL_IMAGE:figures/full_fig_p019_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: The mean average length scales of “dominant” and “secondary” dimensions for the Mopta08 (left) and Lasso-DNA (right) benchmarks for DSP. 0 250 500 750 1000 Iteration 0.0 0.2 0.4 0.6 0.8 frac. params. at border Mopta08 0 250 500 750 1000 Iteration 0.0 0.2 0.4 0.6 frac. params. at border Lasso DNA [PITH_FULL_IMAGE:figures/full_fig_p021_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Fraction of dimensions set to a value at the border (0 or 1) by DSP. The shaded area shows the standard error of the mean across 15 repetitions. 0 200 400 600 800 1000 Iteration 0.0 0.2 0.4 0.6 0.8 frac. params. at border Mopta08 0 200 400 600 800 1000 Iteration 0.0 0.1 0.2 0.3 0.4 0.5 0.6 frac. params. at border Lasso-DNA [PITH_FULL_IMAGE:figures/full_fig_p021_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Fraction of dimensions set to a value at the border (0 or 1) by our MLE method. The shaded area shows the standard error of the mean across 15 repetitions. indicates that the GP model actually makes use of the specific characteristics of these benchmarks. Figs. 20 and 21 further show that BO consistently evaluates a large share of the parameters at the border [PITH_FULL_IMAGE:figures/full_fig_p021_21.png] view at source ↗
read the original abstract

Recent work reported that simple Bayesian optimization (BO) methods perform well for high-dimensional real-world tasks, seemingly contradicting prior work and tribal knowledge. This paper investigates why. We identify underlying challenges that arise in high-dimensional BO and explain why recent methods succeed. Our empirical analysis shows that vanishing gradients caused by Gaussian process (GP) initialization schemes play a major role in the failures of high-dimensional Bayesian optimization (HDBO) and that methods that promote local search behaviors are better suited for the task. We find that maximum likelihood estimation (MLE) of GP length scales suffices for state-of-the-art performance. Based on this, we propose a simple variant of MLE called MSR that leverages these findings to achieve state-of-the-art performance on a comprehensive set of real-world applications. We present targeted experiments to illustrate and confirm our findings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper investigates why simple Bayesian optimization methods succeed on high-dimensional real-world tasks despite prior expectations. It identifies vanishing gradients from Gaussian process initialization schemes as a primary cause of high-dimensional BO failures, argues that methods promoting local search are better suited, shows that MLE of GP length scales suffices for strong performance, and proposes a simple MLE variant called MSR that achieves state-of-the-art results on real-world applications, supported by targeted experiments.

Significance. If the empirical findings hold after improved controls, the work offers a clear mechanistic explanation for recent HDBO observations and a practical, low-complexity method (MSR) that matches or exceeds more elaborate approaches. The emphasis on initialization effects and local-search promotion provides a useful lens for diagnosing and designing future high-dimensional optimizers. The targeted experiments are a positive step toward reproducibility in this empirical domain.

major comments (1)
  1. [section on targeted experiments and empirical analysis] The central attribution of performance gaps to vanishing gradients from GP initialization requires explicit isolation experiments that toggle only the initialization scheme (or length-scale handling) while fixing acquisition-function optimization, length-scale constraints, random seeds, and benchmark selection. The described targeted experiments do not appear to include such controls, leaving open the possibility that observed differences arise from other unablated factors (see skeptic note on weakest assumption).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the opportunity to clarify our experimental design. We respond to the single major comment below.

read point-by-point responses
  1. Referee: [section on targeted experiments and empirical analysis] The central attribution of performance gaps to vanishing gradients from GP initialization requires explicit isolation experiments that toggle only the initialization scheme (or length-scale handling) while fixing acquisition-function optimization, length-scale constraints, random seeds, and benchmark selection. The described targeted experiments do not appear to include such controls, leaving open the possibility that observed differences arise from other unablated factors (see skeptic note on weakest assumption).

    Authors: We appreciate the referee's emphasis on rigorous isolation. Our targeted experiments (detailed in the section on empirical analysis) were designed to vary only the GP initialization scheme and length-scale handling: we compared standard initialization (which induces vanishing gradients) against MLE-based length-scale estimation while holding fixed the acquisition-function optimizer, length-scale constraints (e.g., bounds and positivity), random seeds, and the exact set of benchmark tasks. All other algorithmic components remained identical across runs. This isolates the contribution of initialization-induced gradient issues. We will revise the manuscript to add an explicit paragraph and table footnote enumerating these fixed factors, thereby making the isolation protocol unambiguous. revision: partial

Circularity Check

0 steps flagged

No circularity: claims rest on targeted experiments, not self-referential definitions or fitted inputs

full rationale

The paper presents an empirical investigation into HDBO failures, attributing them to vanishing gradients from GP initialization via targeted experiments, and proposes MSR as a simple MLE variant. No equations define quantities in terms of themselves, no predictions are fitted inputs renamed, and no load-bearing self-citations or uniqueness theorems reduce the central claims to prior author work by construction. The analysis is driven by experimental comparisons on real-world tasks, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields limited visibility into parameters or assumptions; no explicit free parameters or invented entities are stated.

axioms (1)
  • domain assumption Gaussian processes provide a suitable surrogate model for the unknown objective in Bayesian optimization
    Standard modeling assumption invoked throughout Bayesian optimization literature.

pith-pipeline@v0.9.0 · 5662 in / 1098 out tokens · 32896 ms · 2026-05-23T03:25:53.836530+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Our empirical analysis shows that vanishing gradients caused by Gaussian process (GP) initialization schemes play a major role in the failures of high-dimensional Bayesian optimization (HDBO) and that methods that promote local search behaviors are better suited for the task. We find that maximum likelihood estimation (MLE) of GP length scales suffices for state-of-the-art performance.

  • IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We propose a simple variant of MLE called MSR that leverages these findings to achieve state-of-the-art performance on a comprehensive set of real-world applications.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Active Learning for Gaussian Process Regression Under Self-Induced Boltzmann Weights

    cs.LG 2026-05 unverdicted novelty 7.0

    AB-SID-iVAR enables Gaussian process active learning for self-induced Boltzmann distributions by closed-form approximation of the target, with high-probability error vanishing guarantees and empirical gains on PES and...

  2. Do We Really Need to Approach the Entire Pareto Front in Many-Objective Bayesian Optimisation?

    cs.AI 2026-04 unverdicted novelty 7.0

    Proposes SPMO framework with ESPI acquisition function to find one high-quality single solution in many-objective BO under limited budgets instead of approximating the entire Pareto front.

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 2 Pith papers · 2 internal anchors

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  2. [2]

    Unexpected improvements to expected improvement for bayesian optimization

    Ament, S., Daulton, S., Eriksson, D., Balandat, M., and Bakshy, E. Unexpected improvements to expected improvement for bayesian optimization . Advances in Neural Information Processing Systems, 36, 2024

  3. [3]

    G., and Bakshy, E

    Balandat, M., Karrer, B., Jiang, D., Daulton, S., Letham, B., Wilson, A. G., and Bakshy, E. BoTorch: A framework for efficient Monte-Carlo Bayesian optimization . Advances in neural information processing systems, 33: 0 21524--21538, 2020. URL https://github.com/pytorch/botorch/tree/v0.12.0. Last access: Jan 16, 2025

  4. [4]

    Relaxing the additivity constraints in decentralized no-regret high-dimensional bayesian optimization

    Bardou, A., Thiran, P., and Begin, T. Relaxing the additivity constraints in decentralized no-regret high-dimensional bayesian optimization. In The Twelfth International Conference on Learning Representations, 2024

  5. [5]

    and Po s \'i k, P

    Baudi s , P. and Po s \'i k, P. Online Black-Box Algorithm Portfolios for Continuous Optimization . In Parallel Problem Solving from Nature -- PPSN XIII, pp.\ 40--49, Cham, 2014. Springer International Publishing

  6. [6]

    and Wycoff, N

    Binois, M. and Wycoff, N. A Survey on High-dimensional Gaussian Process Modeling with Application to Bayesian Optimization . ACM Trans. Evol. Learn. Optim., 2 0 (2), aug 2022. doi:10.1145/3545611

  7. [7]

    A., Bartoli, N., Regis, R

    Bouhlel, M. A., Bartoli, N., Regis, R. G., Otsmane, A., and Morlier, J. Efficient global optimization for high-dimensional constrained problems by using the K riging models combined with the partial least squares method. Engineering Optimization, 50 0 (12): 0 2038--2053, 2018

  8. [8]

    Calandra, R., Seyfarth, A., Peters, J., and Deisenroth, M. P. Bayesian optimization for learning gaits under uncertainty . Annals of Mathematics and Artificial Intelligence, 76 0 (1): 0 5--23, 2016

  9. [9]

    Semi-supervised E mbedding L earning for H igh-dimensional B ayesian O ptimization

    Chen, J., Zhu, G., Yuan, C., and Huang, Y. Semi-supervised E mbedding L earning for H igh-dimensional B ayesian O ptimization. arXiv preprint arXiv:2005.14601, 2020

  10. [10]

    R., and Eriksson, D

    Deshwal, A., Ament, S., Balandat, M., Bakshy, E., Doppa, J. R., and Eriksson, D. Bayesian optimization over high-dimensional combinatorial spaces via dictionary-based embeddings . In International Conference on Artificial Intelligence and Statistics, pp.\ 7021--7039. PMLR, 2023

  11. [11]

    K., Nickisch, H., and Rasmussen, C

    Duvenaud, D. K., Nickisch, H., and Rasmussen, C. Additive gaussian processes . Advances in neural information processing systems, 24, 2011

  12. [12]

    and Jankowiak, M

    Eriksson, D. and Jankowiak, M. High-dimensional Bayesian optimization with sparse axis-aligned subspaces . In Uncertainty in Artificial Intelligence, pp.\ 493--503. PMLR, 2021

  13. [13]

    D., and Poloczek, M

    Eriksson, D., Pearce, M., Gardner, J., Turner, R. D., and Poloczek, M. Scalable global optimization via local Bayesian optimization . Advances in neural information processing systems, 32, 2019

  14. [14]

    Frazier, P. I. A tutorial on Bayesian optimization . arXiv preprint arXiv:1807.02811, 2018

  15. [15]

    Discovering and exploiting additive structure for B ayesian optimization

    Gardner, J., Guo, C., Weinberger, K., Garnett, R., and Grosse, R. Discovering and exploiting additive structure for B ayesian optimization . In International Conference on Artificial Intelligence and Statistics, pp.\ 1311--1319, 2017

  16. [16]

    High-dimensional Bayesian optimization via tree-structured additive models

    Han, E., Arora, I., and Scarlett, J. High-dimensional Bayesian optimization via tree-structured additive models . In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.\ 7630--7638, 2021

  17. [17]

    O., Hvarfner, C., Papenmeier, L., and Nardi, L

    Hellsten, E. O., Hvarfner, C., Papenmeier, L., and Nardi, L. High-dimensional Bayesian Optimization with Group Testing . arXiv preprint arXiv:2310.03515, 2023

  18. [18]

    u gamer, D., H \

    Herrmann, M., Lange, F. J. D., Eggensperger, K., Casalicchio, G., Wever, M., Feurer, M., R \"u gamer, D., H \"u llermeier, E., Boulesteix, A.-L., and Bischl, B. Position: Why We Must Rethink Empirical Research in Machine Learning . In Forty-first International Conference on Machine Learning, 2024

  19. [19]

    N., Hoang, Q

    Hoang, T. N., Hoang, Q. M., Ouyang, R., and Low, K. H. Decentralized high-dimensional bayesian optimization with factor graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018

  20. [20]

    O., and Nardi, L

    Hvarfner, C., Hellsten, E. O., and Nardi, L. Vanilla B ayesian optimization performs great in high dimensions. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pp.\ 2079...

  21. [21]

    Jones, D. R. A taxonomy of global optimization methods based on response surfaces. Journal of global optimization, 21: 0 345--383, 2001

  22. [22]

    Jones, D. R. Large-Scale Multi-Disciplinary Mass Optimization in the Auto Industry . In MOPTA 2008 Conference (20 August 2008), 2008

  23. [23]

    R., Schonlau, M., and Welch, W

    Jones, D. R., Schonlau, M., and Welch, W. J. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13: 0 455--492, 1998

  24. [24]

    High dimensional Bayesian optimisation and bandits via additive models

    Kandasamy, K., Schneider, J., and P \'o czos, B. High dimensional Bayesian optimisation and bandits via additive models . In International conference on machine learning, pp.\ 295--304. PMLR, 2015

  25. [25]

    and Oates, C

    Karvonen, T. and Oates, C. J. Maximum likelihood estimation in Gaussian process regression is ill-posed . Journal of Machine Learning Research, 24 0 (120): 0 1--47, 2023

  26. [26]

    The curse of dimensionality

    K \"o ppen, M. The curse of dimensionality . In 5th online world conference on soft computing in industrial applications (WSC5), volume 1, pp.\ 4--8, 2000

  27. [27]

    Lam, R., Poloczek, M., Frazier, P., and Willcox, K. E. Advances in Bayesian optimization with applications in aerospace engineering . In 2018 AIAA Non-Deterministic Approaches Conference, pp.\ 1656, 2018

  28. [28]

    Re-examining linear embeddings for high-dimensional Bayesian optimization

    Letham, B., Calandra, R., Rai, A., and Bakshy, E. Re-examining linear embeddings for high-dimensional Bayesian optimization . Advances in neural information processing systems, 33: 0 1546--1558, 2020

  29. [29]

    J., Wang, T., Bowling, M

    Lizotte, D. J., Wang, T., Bowling, M. H., Schuurmans, D., et al. Automatic Gait Optimization With G aussian Process Regression. In IJCAI, volume 7, pp.\ 944--949, 2007

  30. [30]

    W., Constantine, P., Palacios, F., and Alonso, J

    Lukaczyk, T. W., Constantine, P., Palacios, F., and Alonso, J. J. Active subspaces for shape optimization . In 10th AIAA multidisciplinary design optimization conference, pp.\ 1171, 2014

  31. [31]

    T., Moore, J., Kusner, M., Bradshaw, J., and Gardner, J

    Maus, N., Jones, H. T., Moore, J., Kusner, M., Bradshaw, J., and Gardner, J. R. Local Latent Space Bayesian Optimization over Structured Inputs . In Advances in Neural Information Processing Systems, 2022

  32. [32]

    I., Nardi, L., and Krüger, V

    Mayr, M., Ahmad, F., Chatzilygeroudis, K. I., Nardi, L., and Krüger, V. Skill-based Multi-objective Reinforcement Learning of Industrial Robot Tasks with Planning and Knowledge Integration . CoRR, abs/2203.10033, 2022

  33. [33]

    The Bayesian approach to global optimization

    Mockus, J. The Bayesian approach to global optimization . In System Modeling and Optimization: Proceedings of the 10th IFIP Conference New York City, USA, August 31--September 4, 1981, pp.\ 473--481. Springer, 2005

  34. [34]

    P., and Sesh Kumar, K

    Moriconi, R., Deisenroth, M. P., and Sesh Kumar, K. High-dimensional Bayesian optimization using low-dimensional feature spaces . Machine Learning, 109: 0 1925--1943, 2020

  35. [35]

    and Krause, A

    Mutny, M. and Krause, A. Efficient high dimensional B ayesian optimization with additivity and quadrature F ourier features . Advances in Neural Information Processing Systems, 31, 2018

  36. [36]

    A framework for Bayesian optimization in embedded subspaces

    Nayebi, A., Munteanu, A., and Poloczek, M. A framework for Bayesian optimization in embedded subspaces . In International Conference on Machine Learning, pp.\ 4752--4761. PMLR, 2019

  37. [37]

    M., Frazier, P

    Negoescu, D. M., Frazier, P. I., and Powell, W. B. The Knowledge-Gradient Algorithm for Sequencing Experiments in Drug Discovery . INFORMS Journal on Computing, 23 0 (3): 0 346--363, 2011

  38. [38]

    Combinatorial bayesian optimization using the graph cartesian product

    Oh, C., Tomczak, J., Gavves, E., and Welling, M. Combinatorial bayesian optimization using the graph cartesian product . Advances in Neural Information Processing Systems, 32, 2019

  39. [39]

    Increasing the scope as you learn: Adaptive bayesian optimization in nested subspaces

    Papenmeier, L., Nardi, L., and Poloczek, M. Increasing the scope as you learn: Adaptive bayesian optimization in nested subspaces . Advances in Neural Information Processing Systems, 35: 0 11586--11601, 2022

  40. [40]

    Bounce: Reliable high-dimensional bayesian optimization for combinatorial and mixed spaces

    Papenmeier, L., Nardi, L., and Poloczek, M. Bounce: Reliable high-dimensional bayesian optimization for combinatorial and mixed spaces. In Thirty-seventh Conference on Neural Information Processing Systems, 2023

  41. [41]

    Exploring Exploration in Bayesian Optimization

    Papenmeier, L., Cheng, N., Becker, S., and Nardi, L. Exploring exploration in bayesian optimization. arXiv preprint arXiv:2502.08208, 2025

  42. [42]

    and Ng, S

    Pedrielli, G. and Ng, S. H. G-STAR: A new kriging-based trust region method for global optimization . In 2016 Winter Simulation Conference (WSC), pp.\ 803--814. IEEE, 2016

  43. [43]

    Bayesian optimization using domain knowledge on the ATRIAS biped

    Rai, A., Antonova, R., Song, S., Martin, W., Geyer, H., and Atkeson, C. Bayesian optimization using domain knowledge on the ATRIAS biped . In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp.\ 1771--1778. IEEE, 2018

  44. [44]

    Cylindrical Thompson Sampling for High-Dimensional Bayesian Optimization

    Rashidi, B., Johnstonbaugh, K., and Gao, C. Cylindrical Thompson Sampling for High-Dimensional Bayesian Optimization . In International Conference on Artificial Intelligence and Statistics, pp.\ 3502--3510. PMLR, 2024

  45. [45]

    Regis, R. G. Trust regions in Kriging-based optimization with expected improvement . Engineering optimization, 48 0 (6): 0 1037--1059, 2016

  46. [46]

    Regis, R. G. and Shoemaker, C. A. Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization. Engineering Optimization, 45 0 (5): 0 529--555, 2013

  47. [47]

    Lassobench: A high-dimensional hyperparameter optimization benchmark suite for lasso

    S ehi \'c , K., Gramfort, A., Salmon, J., and Nardi, L. Lassobench: A high-dimensional hyperparameter optimization benchmark suite for lasso . In International Conference on Automated Machine Learning, pp.\ 2--1. PMLR, 2022

  48. [48]

    Monte carlo tree search based variable selection for high dimensional bayesian optimization

    Song, L., Xue, K., Huang, X., and Qian, C. Monte carlo tree search based variable selection for high dimensional bayesian optimization . Advances in Neural Information Processing Systems, 35: 0 28488--28501, 2022

  49. [49]

    Gaussian process optimization in the bandit setting: no regret and experimental design

    Srinivas, N., Krause, A., Kakade, S., and Seeger, M. Gaussian process optimization in the bandit setting: no regret and experimental design . In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML'10, pp.\ 1015–1022, Madison, WI, USA, 2010. Omnipress. ISBN 9781605589077

  50. [50]

    Tripp, A., Daxberger, E., and Hern\' a ndez-Lobato, J. M. Sample- E fficient O ptimization in the L atent S pace of D eep G enerative M odels via W eighted R etraining. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems (NeurIPS), volume 33, pp.\ 11259--11272. Curran Associates, I...

  51. [51]

    Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search

    Wang, L., Fonseca, R., and Tian, Y. Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search . Advances in Neural Information Processing Systems, 33: 0 19511--19522, 2020

  52. [52]

    Bayesian optimization in a billion dimensions via random embeddings

    Wang, Z., Hutter, F., Zoghi, M., Matheson, D., and De Feitas, N. Bayesian optimization in a billion dimensions via random embeddings . Journal of Artificial Intelligence Research, 55: 0 361--387, 2016

  53. [53]

    Batched large-scale Bayesian optimization in high-dimensional spaces

    Wang, Z., Gehring, C., Kohli, P., and Jegelka, S. Batched large-scale Bayesian optimization in high-dimensional spaces . In International Conference on Artificial Intelligence and Statistics, pp.\ 745--754. PMLR, 2018

  54. [54]

    Williams, C. K. and Rasmussen, C. E. Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006

  55. [55]

    Wolpert, D. H. and Macready, W. G. No free lunch theorems for optimization. IEEE transactions on evolutionary computation, 1 0 (1): 0 67--82, 1997

  56. [56]

    and Zhe, S

    Xu, Z. and Zhe, S. Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization . arXiv preprint arXiv:2402.02746v3, 2024

  57. [57]

    Ziomek, J. K. and Ammar, H. B. Are random decompositions all we need in high dimensional Bayesian optimisation? In International Conference on Machine Learning, pp.\ 43347--43368. PMLR, 2023